--- title: "Causal Meta Analysis for Aggregated Data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Causal Meta Analysis for Aggregated Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(CaMeA) ``` ## Description A package for causal meta-analysis estimation of a binary response and binary treatment using aggregated data, based on Berenfeld et al. (https://arxiv.org/abs/2505.20168). The output is the aggregated estimate and its standard deviation, the package allows for simple and fast computation of different measures such as Risk Difference (RD), Risk Ratio (RR), Odds Ratio (OR) and Survival Ratio (SR). The package consists of the function `camea`. This function can take the measures Risk Difference, Risk Ratio and log-Risk Ratio, Odds Ratio and log-Odds Ratio, Survival Ratio and log-Survival Ratio. The data can be input in the following formats: vectors, data.frame, matrix or sparse matrix. The output of the function is a list of two elements: the first element is a dataframe that has, for each study, its related estimate and standard error; the second element of the list is the aggregated causal-meta analysis estimate and its standard error. If `plot = TRUE` the function `camea` prints a forestplot that displays the estimate of interest for each study and the aggregated estimate(s). All estimates are shown with their associated 95%-level CI. If `random.effects = TRUE`, the output and plot will include both the causal meta-analysis estimate and the random-effects meta-analysis estimate. If `log.scale = TRUE`, the meta-analysis will be performed on the logarithmic transformation of the chosen measure. ## How to install The latest release of the package can be installed through CRAN (soon): ```{r, eval = FALSE} install.packages("CaMeA") ``` The development version can be installed from github ```{r, eval = FALSE} devtools::install_github("cberenfeld/camea", build_vignettes = TRUE) ``` Another installation possibility is to clone the repo, and then within the r-package folder run ```{r, eval = FALSE} Rscript build_package.R ``` ## Usage Example ```{r} require(CaMeA) # Example 1: With vectors of total numbers and events in treated and control # Generate data n <- 10 treated_total <- sample.int(1000, n) control_total <- sample.int(1000, n) treated_events <- rbinom(n,treated_total,0.5) control_events <- rbinom(n,control_total,0.5) # Choose measure: RD measure <- "RD" # Apply function result <- camea(measure = measure, ai = treated_events, n1i = treated_total, ci = control_events, n2i = control_total) # Example 2: With contingency tables entries in data.frame format # Generate data n <- 10 treated_total <- sample.int(1000, n) control_total <- sample.int(1000, n) treated_events <- rbinom(n,treated_total,0.5) control_events <- rbinom(n,control_total,0.5) treated_negatives <- treated_total - treated_events control_negatives <- control_total - control_events dat <- data.frame(treated_events,control_events,treated_negatives,control_negatives) # Choose measure: log-OR measure <- "OR" # Apply function result <- camea(measure = measure, ai = treated_events, bi = treated_negatives, ci = control_events, di = control_negatives, data = dat, log.scale = TRUE) ``` ## Plot If `plot = TRUE` it is possible to print the forestplot of the meta-analysis. ```{r my-forest-plot, fig.width=9, fig.height=6, out.width='100%'} # apply the same function but print the plot result <- camea(measure = measure, ai = treated_events, n1i = treated_total, ci = control_events, n2i = control_total, log.scale = TRUE, plot = TRUE) ``` ## Notation We consider a set of $K \in \mathbb{N}$ randomized controlled trials measuring the same binary outcome $Y \in \{0,1\}$ and the same binary treatment $A \in \{0,1\}$. We assume that each study provides a table of the positive and negative outcomes among both the treated and untreated groups, as follows: ```{r, echo=FALSE} if (requireNamespace("kableExtra", quietly = TRUE)) { library(kableExtra) } df <- data.frame(Cat = c("A=1","A=0"), "Y=1" = c("$n_{11}(k)$","$n_{10}(k)$"), "Y=0" = c("$n_{01}(k)$","$n_{00}(k)$")) kable(df, col.names = c("","Y=1","Y=0"), escape = F) %>% kable_styling(latex_options = "hold_position") ``` where $n_{ay}(k)$ denotes the number of individuals in the study $k \in [K]$ with treatment status $A = a$ and observed outcome $Y=y$. Let $Y(a)$ be the counterfactual outcome under $A = a$ and $H \in [K]$ be the study variable. ## Computation of confidence intervals Let $\hat{\psi}_a (k) = n_{a1}(k) / n_a(k)$ be the natural estimator of $\psi_a (k) = \mathbb{E}[Y(a)|H=k]$ with $a \in \{0,1\}$. Let $\hat{\psi}_a = \sum_{k=1}^K \hat{\pi}_k \hat{\psi}_a(k)$ the estimator of $\psi_a = \mathbb{E}[Y(a)]$ where $\pi_k = \mathbb{P}(H = k)$ and $\hat{\pi}_k = n_a(k)/n_k$. The estimator is asymptotically normal, converging to the following distribution: \begin{equation*} \sqrt{n} \: ((\hat{\psi}_1,\hat{\psi}_0) - (\psi_1,\psi_0)) \xrightarrow{L} N\biggr{(}0, \begin{bmatrix} 0 & \sigma^2_1 \\ \sigma^2_0 & 0 \end{bmatrix} \biggr{)} \end{equation*} where the asymptotic variances are given by: \begin{equation*} \sigma^2_a = \sum_{k=1}^K \pi_k \frac{\psi_a (k) (1 - \psi_a (k))}{e^a(k) (1 - e(k))^{1-a}} + \sum_{k=1}^K \pi_k \psi_a(k)^2 - \psi_a^2 \end{equation*} with $e(k) = \mathbb{P}(A = 1 | H = k)$ and $a \in \{ 0,1 \}$. Then, by applying the delta method, we find that for every first-order continuously differentiable contrast $\phi \: : \: (\psi_1,\psi_0) \mapsto \phi(\psi_1,\psi_0)$ we have: \begin{equation*} \sqrt{n} \: (\phi(\hat{\psi}_1,\hat{\psi}_0) - \phi(\psi_1,\psi_0)) \xrightarrow{L} N ( 0, \eta^2 ) \end{equation*} where \begin{equation*} \eta^2 = \sigma_1^2 \: \partial_1 \phi(\psi_1,\psi_0)^2 + \sigma_0^2 \: \partial_0 \phi(\psi_1,\psi_0)^2 \end{equation*} and $\partial_a \phi(\psi_1,\psi_0)$ denotes the partial derivative of $\phi$ with respect to $\psi_a$. We will use the above approximation to estimate the variance of the measures: - Risk Difference: $\phi(\psi_1,\psi_0) = \psi_1 - \psi_0$ - Risk Ratio: $\phi(\psi_1,\psi_0) = \frac{\psi_1}{\psi_0}$ - Odds Ratio: $\phi(\psi_1,\psi_0) = \frac{\psi_1}{1-\psi_1} \frac{1-\psi_0}{\psi_0}$ - Survival Ratio: $\phi(\psi_1,\psi_0) = \frac{1- \psi_1}{1 - \psi_0}$. We also provide estimates and confidence intervals for the Risk Ratio, Odds Ratio, and Survival Ratio on the logarithmic scale. In that case, the required partial derivatives for $a \in {0,1}$ are given by: \begin{equation*} \partial_a \log \phi(\psi_1,\psi_0) = \frac{1}{\phi(\psi_1,\psi_0)} \partial_a \phi(\psi_1,\psi_0) \end{equation*} ## Use of metafor package If the selected effect measure is Risk Difference (RD), Risk Ratio (RR) or Odds Ratio (OR) and the option `random.effects = TRUE` is set, the output and the forest plot will display two summary estimates: the primary causal meta-analysis estimate and the conventional random-effects estimate. The random-effects estimate is calculated via the package `metafor` by W. Viechtbauer (2010), which is a well known tool for conducting meta-analysis on R. It is important to notice that the package `metafor` only provides estimates of the Risk Ratio and Odds Ratio on the logarithmic scale. Our package automatically converts these log-scale estimates back to the natural scale by applying the exponential function to both the point estimate and the bounds of the confidence interval. The following excerpt from the `camea` function illustrates how we integrate `metafor` for our computation: ```{r metafor, eval = FALSE} # Create forest plot with single study results metafor::forest(x = results$thetai, ci.lb = results$ci.lb, ci.ub = results$ci.ub, header = c("Study",paste(measure_name,"[95% CI]")), top=2, ylim=c(-2, n_study+2), refline = refline, cex = 0.8) # Computation of random-effects analysis and display in forest plot, if # option "random.effects = TRUE" dat <- metafor::escalc(measure=measure, ai=ai_vals, n1i=n1i_vals, ci=ci_vals, n2i=n2i_vals, data=dat) metafor::addpoly(res_random_effects, row = -1, mlab="Random-effects model", cex = 0.8) # Display causal-meta analysis in forest plot res <- metafor::rma(yi = result$estimate, sei = result$se, method="FE") metafor::addpoly(res, row = 0, mlab="Causal meta-analysis", cex = 0.8) ``` ## References Berenfeld, C., Boughdiri, A., Colnet, B., van Amsterdam, W. A. C., Bellet, A., Khellaf, R., Scornet, E., & Josse, J. (2025). Causal Meta-Analysis: Rethinking the Foundations of Evidence-Based Medicine. arXiv:2505.20168. Viechtbauer, W. (2010). Conducting Meta-Analyses in R with the metafor Package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03