--- title: "Sample size re-estimation example" output: rmarkdown::html_vignette bibliography: gsDesignNB.bib vignette: > %\VignetteIndexEntry{Sample size re-estimation example} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, message=FALSE, warning=FALSE} library(gsDesignNB) library(gsDesign) library(data.table) library(MASS) library(gt) library(gt) ``` ## Introduction This vignette demonstrates a group sequential trial with a single interim analysis where unblinded sample size re-estimation (SSR) is performed. We simulate a scenario where the **control event rate is lower than assumed** and the **dispersion is higher than assumed**. Both factors lead to slower information accumulation (or higher variance) than planned. We will show how to: 1. Monitor information accumulation at an interim analysis. 2. Re-estimate the required sample size (or duration) to achieve the target power. 3. Adjust the final analysis bounds if the final information differs from the plan. ## Trial setup and initial design **Planned parameters:** * Control rate ($\lambda_1$): 0.1 events/month * Experimental rate ($\lambda_2$): 0.075 events/month (Hazard Ratio = 0.75) * Dispersion ($k$): 0.5 * Power: 90% * One-sided Type I error ($\alpha$): 0.025 * Enrollment: 20 patients/month for 12 months (Total N = 240) * Study duration: 24 months **Actual parameters (simulation truth):** * Control rate ($\lambda_1$): 0.08 events/month (Lower than planned) * Experimental rate ($\lambda_2$): 0.06 events/month (HR = 0.75 maintained) * Dispersion ($k$): 0.65 (Higher than planned) ### Initial sample size calculation ```{r initial_design} # Planned parameters lambda1_plan <- 0.1 lambda2_plan <- 0.075 k_plan <- 0.5 power_plan <- 0.9 alpha_plan <- 0.025 accrual_rate_plan <- 20 accrual_dur_plan <- 12 trial_dur_plan <- 24 # Calculate sample size design_plan <- sample_size_nbinom( lambda1 = lambda1_plan, lambda2 = lambda2_plan, dispersion = k_plan, power = power_plan, alpha = alpha_plan, accrual_rate = accrual_rate_plan, accrual_duration = accrual_dur_plan, trial_duration = trial_dur_plan, max_followup = 12 ) # Convert to group sequential design gs_plan <- design_plan |> gsNBCalendar( k = 2, test.type = 4, # Non-binding futility alpha = alpha_plan, sfu = sfHSD, sfupar = -2, # Moderately aggressive alpha-spending sfl = sfHSD, sflpar = 1, # Pocock-like futility spending analysis_times = c(accrual_dur_plan - 2, trial_dur_plan) ) |> gsDesignNB::toInteger() # Round to integer sample size ``` ```{r} summary(gs_plan) ``` ```{r} gsBoundSummary(gs_plan, deltaname = "RR", logdelta = TRUE, Nname = "Information", timename = "Month", digits = 4, ddigits = 2) |> gt() |> tab_header( title = "Group Sequential Design Bounds for Negative Binomial Outcome", subtitle = paste0( "N = ", ceiling(gs_plan$n_total[gs_plan$k]), ", Expected events = ", round(gs_plan$nb_design$total_events, 1) ) ) ``` ## Simulation We simulate the trial with the same rate ratio, but lower actual rates and a higher dispersion. We will limit the maximum sample size to 40% more than the planned size (i.e., max N = ) to avoid unbounded increases. ```{r simulation} set.seed(1234) # Actual parameters lambda1_true <- 0.08 # Lower event rates, same rr lambda2_true <- 0.06 k_true <- 0.65 # Higher dispersion # Enrollment and rates for simulation # We simulate a larger pool to allow for potential sample size increase enroll_rate <- data.frame(rate = accrual_rate_plan, duration = accrual_dur_plan * 2) fail_rate <- data.frame( treatment = c("Control", "Experimental"), rate = c(lambda1_true, lambda2_true), dispersion = k_true ) dropout_rate <- data.frame( treatment = c("Control", "Experimental"), rate = c(0, 0), duration = c(100, 100) ) sim_data <- nb_sim( enroll_rate = enroll_rate, fail_rate = fail_rate, dropout_rate = dropout_rate, max_followup = trial_dur_plan, n = 600 ) # Limit to planned enrollment for the initial cut # We will "open" more enrollment if needed sim_data_planned <- sim_data[1:ceiling(design_plan$n_total), ] ``` ## Interim analysis We perform an interim analysis at **Month 10**, which is 2 months prior to the end of planned enrollment (Month 12). ```{r interim_cut} interim_time <- 10 interim_data <- cut_data_by_date(sim_data_planned, cut_date = interim_time) # Summary table(interim_data$treatment) sum(interim_data$events) mean(interim_data$tte) ``` ### Information computation We calculate the statistical information accumulated so far. **Blinded Information:** Calculated assuming a common rate and dispersion, often used for monitoring without unblinding. **Unblinded Information:** Calculated using the observed rates and dispersion in each group. This is the true Fisher information for the log rate ratio. $$ \mathcal{I} = \left( \frac{1}{\text{Var}(\hat{\beta}_{trt})} \right) $$ ```{r info_calc} # Blinded SSR blinded_res <- blinded_ssr( data = interim_data, ratio = 1, lambda1_planning = lambda1_plan, lambda2_planning = lambda2_plan, power = power_plan, alpha = alpha_plan, accrual_rate = accrual_rate_plan, accrual_duration = accrual_dur_plan, trial_duration = trial_dur_plan ) # Unblinded SSR ssr_res <- unblinded_ssr( data = interim_data, ratio = 1, lambda1_planning = lambda1_plan, lambda2_planning = lambda2_plan, power = power_plan, alpha = alpha_plan, accrual_rate = accrual_rate_plan, accrual_duration = accrual_dur_plan, trial_duration = trial_dur_plan ) cat("Blinded SSR N:", ceiling(blinded_res$n_total_blinded), "\n") cat("Unblinded SSR N:", ceiling(ssr_res$n_total_unblinded), "\n") print(ssr_res) ``` The estimated control rate is `r round(ssr_res$lambda1_unblinded, 3)`, which is lower than the planned `r lambda1_plan`. The information fraction is `r round(ssr_res$info_fraction, 3)`. ### Sample size re-estimation Since the control rate is lower and dispersion is higher, the information accumulates slower than planned. To maintain power, we need to increase the sample size or follow-up. The `unblinded_ssr` function estimates the required total sample size to be `r ceiling(ssr_res$n_total_unblinded)`. We ensure the sample size is at least as large as the planned sample size. ```{r update_design} n_planned <- ceiling(design_plan$n_total) n_estimated <- ceiling(ssr_res$n_total_unblinded) # Ensure we don't decrease sample size n_final <- max(n_planned, n_estimated) if (n_final > n_planned) { cat("Increasing sample size from", n_planned, "to", n_final, "\n") # Calculate new durations based on constant accrual rate # We extend enrollment to reach the new target N new_accrual_dur <- n_final / accrual_rate_plan # We maintain the planned max follow-up of 12 months # So the trial duration extends by the same amount as the accrual duration new_trial_dur <- new_accrual_dur + 12 cat("New Accrual Duration:", round(new_accrual_dur, 2), "months\n") cat("New Trial Duration:", round(new_trial_dur, 2), "months\n") } else { cat("No sample size increase needed.\n") new_accrual_dur <- accrual_dur_plan new_trial_dur <- trial_dur_plan n_final <- n_planned } ``` ## Final analysis We proceed to the final analysis at Month `r round(new_trial_dur, 1)`. We include the additional patients if the sample size was increased. ```{r final_analysis} # Update the dataset to include the additional patients sim_data_final <- sim_data[1:n_final, ] # Cut at final analysis time final_data <- cut_data_by_date(sim_data_final, cut_date = new_trial_dur) # Average exposure mean(final_data$tte) # Fit final model fit_final <- glm.nb(events ~ treatment + offset(log(tte)), data = final_data) summary(fit_final) # Calculate final information var_beta <- vcov(fit_final)[2, 2] final_info <- 1 / var_beta cat("Final Information:", final_info, "\n") cat("Target Information:", ssr_res$target_info, "\n") ``` ### Updating bounds If the final information differs from the target (e.g., it is lower), we need to adjust the critical values to ensure we spend exactly $\alpha$. We use the `gsDesign` package with `usTime` (user-specified information fraction) to update the bounds. ```{r update_bounds} # Information fraction at final analysis # If final_info < target_info, fraction < 1. # However, for the final analysis, we typically want to spend all remaining alpha. # We can treat the current info as the "maximum" info for the spending function. # Example group sequential design (O'Brien-Fleming spending) # We assume the interim was at the observed information fraction info_fractions <- c(ssr_res$info_fraction, final_info / ssr_res$target_info) # If final fraction < 1, we can force it to 1 to spend all alpha, # effectively accepting the lower power. # Or we can use the actual information fractions in a spending function approach. # Let's assume we want to spend all alpha at the final analysis regardless of info. # We set the final timing to 1. info_fractions_spending <- c(ssr_res$info_fraction, 1) # Update bounds gs_update <- gsDesign( k = 2, test.type = 1, alpha = alpha_plan, sfu = sfLDOF, usTime = info_fractions_spending, n.I = c(ssr_res$unblinded_info, final_info) ) gsBoundSummary(gs_update, deltaname = "RR", logdelta = TRUE, Nname = "Information", timename = "Month", digits = 4, ddigits = 2) |> gt() |> tab_header( title = "Updated Group Sequential Design Bounds", subtitle = paste0( "Final Information = ", round(final_info, 2) ) ) # Test statistic z_score <- coef(fit_final)["treatmentExperimental"] / sqrt(var_beta) # Note: gsDesign assumes Z > 0 for efficacy. # Our model gives log(RR) < 0 for efficacy. # So Z_stat = - (log(RR)) / SE z_stat <- -coef(fit_final)["treatmentExperimental"] / sqrt(var_beta) cat("Z-statistic:", z_stat, "\n") cat("Final Bound:", gs_update$upper$bound[2], "\n") if (z_stat > gs_update$upper$bound[2]) { cat("Reject Null Hypothesis\n") } else { cat("Fail to Reject Null Hypothesis\n") } ``` ## References