---
title: "Bayesian Transmission Modeling"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{bayestransmission}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(bayestransmission)
```

# Introduction

This package provides a Bayesian framework for transmission modeling on
an individual patient level. Modeling is conducted through Markov Chain
Monte Carlo (MCMC) methods.This document will explain the basic usage of
the package, specification of parameters, and the output of the model.

# Data Structure

The algorithms expect a longitudinal data set with the following
columns: \* `facility`: The facility where the event occurred. \*
`unit`: The unit within the facility where the event occurred. \*
`time`: The time at which the event occurred. \* `patient`: The patient
involved in the event. \* `type`: The type of event.

The package includes a simulated dataset, `simulated.data`.

```{r}
pillar::glimpse(simulated.data)
```

There are 12 different types of events that can be specified in the
`type` column. These are, expected numerical codes shown in parentheses:

-   Admission (0) and Discharge (3)
-   Surveillance Testing Results
    -   Negative Test (1)
    -   Positive Test (2)
-   Clinical Testing Results
    -   Negative Test (4)
    -   Positive Test (5)
-   Generic Testing
    -   Negative Test (7)
    -   Positive Test (8)
-   Antibiotic Use
    -   single dose (9)
    -   Start (10)
    -   Stop (11)
-   Isolation Procedures
    -   Start (6)
    -   Stop (7)

Not all events need to be used in every data set, but the model selected
should reflect the data that is available. Care should be taken to
correctly code the data. The `EventToCode` and `CodeToEvent` functions
can be used to convert between.

```{r}
table(CodeToEvent(simulated.data$type))
```

# Model Specifiction

## Model Choice

Since the model is implemented in C++ for speed and efficiency, only the
specified models can be used. The currently implemented models are:

-   `"LinearAbxModel"`, A Linear model with antibiotic use as a
    covariate.
-   `"MixedModel"`, and
-   `"LogNormalModel"`

Model specification and all parameters are controlled through
constructor functions of the same name, or generically through the
`LogNormalModelParams()` function.

For all models there is the choice of either a 2 state (susceptible and
colonized) or 3 state (susceptible, colonized, and recovered or latent)
model. Number of states is set through the `nstates` parameter, and the
number of states in the model overrides what may be specified in any
individual component.

## Parameters

The remainder of the parameters are grouped into the following
categories:

-   `Abx`, Antibiotic use,
-   `AbxRate`, Antibiotic rates,
-   `InUnit`, In unit infection rates,
-   `OutOfUnitInfection`, Out of unit infection rates,
-   `Insitu`, In situ parameters,
-   Testing:
    -   `SurveilenceTest`, Surveillance testing,
    -   `ClinicalTest`, Clinical testing.

Unless otherwise specified the parameters are all distributed gamma with
specified shape and rate parameters. Each parameter can also be left as
fixed or be sampled at each iteration of the MCMC.

### Specifying parameters

Parameters for the model may be specified by the `Param()` function.
This function takes up to four arguments:

```         
1. `init`, is the initial value of the parameter.
2. `weight`, is the weight of the prior distribution in updates.
3. `update`, a flag of if the parameter should be sampled in the MCMC algorithm. 
    `FALSE` indicates that the parameter should be fixed, and is by default `TRUE` when `weight` is greater than zero.
4. `prior`, the mean of the prior distribution.  Taken with the weight will fully parameterize the distribution.
```

**Important:** Always explicitly specify `init` values for parameters to ensure 
reproducibility and avoid potential numerical issues.

```{r paramexamples, results='hide'}
# Fully specified parameter.
Param(init = 0, weight = 1, update = TRUE, prior = 0.5)
# Fixed parameter with explicit init
# Weight = 0 implies update=FALSE and prior is ignored.
Param(init = 0, weight = 0)
# Updated parameter that starts at zero.
Param(init = 0, weight = 1, update = TRUE)
# Short form for fixed parameter
Param(init = 0, weight = 0)
```

## `Abx` Antibiotic use

Antibiotic use is specified by the `Abx` parameter. This parameter is a
list constructed with the `AbxParams()` function with the following
components:

-   `onoff`, If antibiotics are being used or not. The two following
    parameters are only used if `onoff` is `TRUE`.
-   `delay`, the delay for the antibiotic to take effect.
-   `life`, the duration where the antibiotic to be effective.

```{r}
abx <- AbxParams(onoff = 0, delay = 0.0, life = 2.0)
```

Currently, all antibiotics are assumed to be equally effective and have
the same duration of effectiveness. Note: `onoff = 0` means antibiotics are
turned off for this example (set to 1 to enable).

## `AbxRate` Antibiotic rates

The `AbxRate` parameter control the antibiotic administration rates.

```{r}
abxrate <- AbxRateParams(
  # Fixed parameters when antibiotics are off
  uncolonized = Param(init = 1.0, weight = 0),
  colonized = Param(init = 1.0, weight = 0)
)
```

Here since both parameters are non-zero both will be updated. A rate of
zero for either would indicate that group would never be on antibiotics.
When antibiotics are turned off (`Abx$onoff = 0`), these parameters are 
typically fixed (weight = 0).

## `InUnit` In unit infection rate

Transmission within unit is the main defining characteristic that
differentiates models. For example the linear antibiotic model,
`LinearAbxModel()`, is differentiated from the log normal model,
`LogNormalModelParams()` by the use of a `ABXInUnitParams()` for the
`InUnit` argument rather than the `LogNormalInUnitAcquisition()` which
does not take into account antibiotic use. All in unit transmission is
defined in terms of acquisition, progression, and clearance.



### Aqcuisition Model

In the base log normal antibiotic model,
`LogNormalABXInUnitParameters()` log acquisition probability is a linear
function.

$$
\log(P(\mathrm{Acq(t)})) =  % lognormal_logNormalAbxICP.cpp:68-74
    \beta_0 +               % logbeta_acq_constant
    \beta_t (t-t_0) +       % logbeta_acq_time
    \beta_c N_c(t)  +       % logbeta_acq_col() * ncol +
    \beta_{ca} N_{ca}(t) +  % logbeta_acq_abx_col() * ncolabx +
    \beta_A A_i(t) +        % logbeta_acq_onabx() * onabx +
    \beta_E E_i(t)          % logbeta_acq_everabx() * everabx;
$$ 

Where $\beta_\star$ represents the coefficient corresponding to the
amounts, $N_c(t)$ represent the total number of colonized patients at
time $t$, $N_{ca}(t)$ the number of colonized on antibiotics, and
$A_i(t)$ and $E_i(t)$ represents if patient $i$ is currently or ever on
antibiotics.

The linear antibiotic (`LinearAbxAcquisitionParams`) takes a more
complicated form for the acquisition model.

$$
P(\mathrm{Acq(t)}) =
  \left[e^{\beta_\mathrm{time}(t-t_0)}\right]
  \left\{e^{\beta_0}
         \left[
            \left(\frac{\beta_\mathrm{freq}}{P(t)}+(1 - e^{\beta_\mathrm{freq}})\right)
            e^{\beta_\mathrm{mass}}\left(
                (N_c(t) - N_{ca}(t)) + e^{\beta_\mathrm{col\_abx}}N_{ca}(t)
            \right)
          + 1 - e^{\beta_\mathrm{mass}}
        \right]
  \right\}
  \left[
    N_S(t) - N_E(t) + e^{\beta_\mathrm{suss\_ever}}\left(\left(E_i(t)-A_i(t)\right) +A_i(t)e^{\beta_\mathrm{suss\_abx}}\right)
  \right]
$$

```{r}
acquisition <- LinearAbxAcquisitionParams(
    base = Param(init = 0.001, weight = 1),     #< Base acquisition rate (Updated)
    time = Param(init = 1.0, weight = 0),       #< Time effect (Fixed)
    mass = Param(init = 1.0, weight = 1),       #< Mass Mixing (Updated)
    freq = Param(init = 1.0, weight = 1),       #< Frequency/Density effect (Updated)
    col_abx = Param(init = 1.0, weight = 0),    #< Colonized on antibiotics (Fixed)
    suss_abx = Param(init = 1.0, weight = 0),   #< Susceptible on antibiotics (Fixed)
    suss_ever = Param(init = 1.0, weight = 0)   #< Ever on antibiotics (Fixed)  
)
```

### Progression Model

In the 3 state model there is a latent state and the progression model
controls how patient transition out of the latent state. The base rate
can be affected by currently being on antiboitics or ever being on
antbiotcs.

$$
\log(P(\mathrm{progression})) = \delta_0+\delta_AA_i(t) + \delta_EE_i(t)
$$ 
the linear antibiotic model is: 

$$
P(\mathrm{progression}) = e^{\delta_0}\left[1-E_i(t)+e^{\delta_2}\left(E_i(t)-A_i(t)+e^{\delta_1}A_i(t)\right)\right]
$$

Where here we use $\delta$ for the coefficients, but the notation is the
same.

```{r}
progression <- ProgressionParams(
    rate = Param(init = 0.0, weight = 0),      #< Base progression rate (Fixed for 2-state)
    abx  = Param(init = 1.0, weight = 0),      #< Currently on antibiotics (Fixed)
    ever_abx = Param(init = 1.0, weight = 0)   #< Ever on antibiotics (Fixed)
)
```

### Clearance Model

The clearance model is the same as the progression model in both the log
normal and the linear cases, the coefficients however are independent.

```{r}
clearance <- ClearanceParams(
    rate = Param(init = 0.01, weight = 1),     #< Base clearance rate (Updated)
    abx  = Param(init = 1.0, weight = 0),      #< Currently on antibiotics (Fixed)
    ever_abx = Param(init = 1.0, weight = 0)   #< Ever on antibiotics (Fixed)
)
```

```{r}
inunit <- ABXInUnitParams(
  acquisition = acquisition,
  progression = progression,
  clearance   = clearance
)
```

## Out of Unit Importation

The out of unit parameters control the rate at which admissions come in,
and which state they enter in.

$$
\log(P(\mathrm{state}_i \rightarrow \mathrm{state}_j)|t) = P_j - Q_{i,j} e^{-t \sum_i r_i}
$$

```{r}
outcol <- OutOfUnitInfectionParams(
  acquisition = Param(init = 0.001, weight = 1),
  clearance = Param(init = 0.01, weight = 0),
  progression = Param(init = 0.0, weight = 0)
)
```

## In Situ

I'm not sure what these parameters do. It's a set of gamma distributed
parameters one for each state. The updates and probabilities are not
time dependent.

When updating the rates for each state are sampled from a
$gamma(N_i, 1)$ distribution. Then all three are normalized to sum to 1.

```{r}
insitu <- InsituParams(
  # Starting 90/10 split uncolonized to colonized
  # For 2-state model, latent probability is 0
  probs = c(uncolonized = 0.90,
            latent = 0.0,
            colonized = 0.10),
  # Prior values for Bayesian updating  
  priors = c(1, 1, 1),
  # Which states to update (latent is fixed at 0 for 2-state model)
  doit = c(TRUE, FALSE, TRUE)
)
```

## Testing

There are two types of testing, surveillance, which is conducted
routinely at regular intervals such as on admission then every 3 days
after, and clinical, where the testing is precipitated by staff, and
thus the timing is informative.

### Surveillance Testing

The timing of surveillance testing is assumed to not be informative.
Therefore, surveillance testing is only parameterized in terms of
probability of a positive test given the underlying status. Surveillance
test parameters are updated with a sample from a
$Beta(N_{s,1}, N_{s,0})$ distribution where $N_{s,1}$ and $N_{s,0}$ are
the number of positive and negative tests respectively for state $s$.

```{r}
surv <- SurveillanceTestParams(
    # Probability of a positive test when uncolonized (false positive rate)
    # IMPORTANT: Must be > 0 to avoid -Inf likelihood. Use small value like 1e-10.
    # Setting to exactly 0.0 causes log(0) = -Inf if any uncolonized patient tests positive.
    uncolonized = Param(init = 1e-10, weight = 0),
    # Probability of a positive test when colonized (true positive rate/sensitivity)
    # Starting at 0.8, will be updated during MCMC
    colonized = Param(init = 0.8, weight = 1),
    # Latent state (for 2-state model, this is not used but must be specified)
    latent = Param(init = 0.0, weight = 0)
)

```

### Clinical Testing

Since clinical testing time is informative, clinical testing is assumed
to be at random within infection stage. The rate of testing within each
stage is sampled from a gamma distribution. Sensitivity/Specificity are
handled the same as surveillance testing and the likelihood is
multiplicative between rate and effectiveness.

```{r}
clin <- RandomTestParams(
    # Rate of testing when uncolonized
    uncolonized = ParamWRate(
      param = Param(init = 0.5, weight = 0), 
      rate = Param(init = 1.0, weight = 0)
    ),
    # Rate of testing when colonized  
    colonized = ParamWRate(
      param = Param(init = 0.5, weight = 0), 
      rate = Param(init = 1.0, weight = 0)
    ),
    # Latent state (for 2-state model, not used but must be specified)
    latent = ParamWRate(
      param = Param(init = 0.5, weight = 0), 
      rate = Param(init = 1.0, weight = 0)
    )
)
```


### All Together

```{r}
params <- LinearAbxModel(
  nstates = 2,
  Insitu = insitu,
  SurveillanceTest = surv,
  ClinicalTest = clin,
  OutOfUnitInfection = outcol,
  InUnit = inunit,
  Abx = abx,
  AbxRate = abxrate
)
```



# Running the Model

The model is run through the `runMCMC()` function. The function takes the following arguments:

- `data`: The data frame with patient event data
- `modelParameters`: The model specification (created above)
- `nsims`: Number of MCMC samples to collect after burn-in
- `nburn`: Number of burn-in iterations (default: 100)
- `outputparam`: Whether to save parameter values at each iteration (default: TRUE)
- `outputfinal`: Whether to save the final model state (default: FALSE)
- `verbose`: Whether to print progress messages (default: FALSE)

```{r}
system.time(
  results <- runMCMC(
    data = simulated.data_sorted,
    modelParameters = params,
    nsims = 100,
    nburn = 10,
    outputparam = TRUE,
    outputfinal = TRUE,
    verbose = FALSE
  )
)
```

# Analyzing MCMC Results

## Converting Parameters to Data Frame

The `results$Parameters` object contains the MCMC chain of all model parameters. 
To create trace plots and posterior distributions, we use the `mcmc_to_dataframe()`
function to convert this nested list structure into a tidy data frame format.

```{r}
# Convert parameters to data frame using package function
param_df <- mcmc_to_dataframe(results)

# Display first few rows
head(param_df)
```

## Trace Plots

Trace plots show the evolution of parameters across MCMC iterations, helping
to assess convergence.

```{r, fig.width=10, fig.height=8}
library(ggplot2)
library(tidyr)

# Select key parameters for trace plots
trace_params <- param_df[, c("iteration", "insitu_colonized", "surv_test_col_pos", 
                              "outunit_acquisition", "inunit_base", 
                              "abxrate_colonized", "loglikelihood")]

# Convert to long format
trace_long <- pivot_longer(trace_params, 
                           cols = -iteration,
                           names_to = "parameter",
                           values_to = "value")

# Create trace plots
ggplot(trace_long, aes(x = iteration, y = value)) +
  geom_line() +
  facet_wrap(~parameter, scales = "free_y", ncol = 2) +
  theme_minimal() +
  labs(title = "MCMC Trace Plots",
       x = "Iteration",
       y = "Parameter Value")
```

## Posterior Distributions

Posterior distributions show the estimated distribution of each parameter
after the MCMC sampling.

```{r, fig.width=10, fig.height=8}
# Remove burn-in if needed (here we already set nburn in the MCMC call)
# For demonstration, let's use all samples since nburn=0 was specified

# Create density plots for posterior distributions
ggplot(trace_long, aes(x = value)) +
  geom_density(fill = "steelblue", alpha = 0.5) +
  geom_vline(aes(xintercept = mean(value, na.rm = TRUE)), 
             color = "red", linetype = "dashed") +
  facet_wrap(~parameter, scales = "free", ncol = 2) +
  theme_minimal() +
  labs(title = "Posterior Distributions",
       subtitle = "Red dashed line shows posterior mean",
       x = "Parameter Value",
       y = "Density")
```

## Summary Statistics

```{r}
# Calculate summary statistics for each parameter
library(dplyr)

summary_stats <- trace_long %>%
  group_by(parameter) %>%
  summarise(
    mean = mean(value, na.rm = TRUE),
    median = median(value, na.rm = TRUE),
    sd = sd(value, na.rm = TRUE),
    q025 = quantile(value, 0.025, na.rm = TRUE),
    q975 = quantile(value, 0.975, na.rm = TRUE),
    .groups = "drop"
  )

print(summary_stats)
```