--- title: "Basic Usage" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Basic Usage} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(poissonsuperlearner) library(riskRegression) ``` # Introduction This vignette gives fast examples of the Poisson Super Learner workflow after the refactoring. It focuses on: * fitting `Superlearner()` with one learner library shared by all causes; * fitting `Superlearner()` with one learner library per cause; * using `summary()` for the fitted super learner; * using `predictRisk()` for the same model selectors. The examples use small simulated data, two folds, and simple `glmnet` learners with fixed `lambda` values so the vignette remains quick to run during package checks. # Data We simulate a small competing-risks data set. The observed follow-up time is stored in `time` and the event indicator in `event`, where `0` denotes censoring, `1` denotes cardiovascular disease, and `2` denotes death without prior cardiovascular disease. ```{r} d <- simulateStenoT1( n = 45, scenario = "alpha", competing_risks = TRUE, seed = 1 ) d <- d[, .( id, time, event, sex, age, diabetes_duration, value_LDL, value_Smoking )] head(d) ``` # One Learner Library For All Causes A learner library is a list of initialized learner objects. If a single library is supplied in a competing-risks analysis, the same library is used for all causes. ```{r} shared_library <- list( simple = Learner_glmnet( covariates = c("sex", "diabetes_duration"), cross_validation = FALSE, lambda = 0 ), shrink = Learner_glmnet( covariates = c("sex", "age", "value_LDL"), cross_validation = FALSE, lambda = 0.05, alpha = 1 ) ) fit_shared <- Superlearner( data = d, id = "id", status = "event", event_time = "time", learners = shared_library, number_of_nodes = 3, nfold = 2 ) ``` `summary()` gives a compact overview of the fitted super learner, including the number of causes, retained learner labels, cross-validated deviances, and meta-learner coefficients when a meta-learner was fitted. ```{r} summary(fit_shared) ``` `predictRisk()` returns one row per subject and one column per requested prediction time. The `model` argument uses the same selectors: `"sl"` for the stacked ensemble, `"discrete_sl"` for the best cross-validated base learner per cause, or learner labels such as `"simple"` and `"shrink"` for models stored in the ensemble. ```{r} newdata <- d[1:2] times <- c(1, 2) risk_shared_sl <- predictRisk( fit_shared, newdata = newdata, times = times, cause = 1, model = "sl" ) risk_shared_discrete <- predictRisk( fit_shared, newdata = newdata, times = times, cause = 1, model = "discrete_sl" ) risk_shared_simple <- predictRisk( fit_shared, newdata = newdata, times = times, cause = 1, model = "simple" ) risk_shared_shrink <- predictRisk( fit_shared, newdata = newdata, times = times, cause = 1, model = "shrink" ) list( sl = risk_shared_sl, discrete_sl = risk_shared_discrete, simple = risk_shared_simple, shrink = risk_shared_shrink ) ``` # One Learner Library Per Cause For competing risks, `learners` can also be a list with one learner library per cause. This allows different covariates, tuning parameters, or labels for each cause. ```{r} libraries_by_cause <- list( cvd = list( cvd_simple = Learner_glmnet( covariates = c("sex", "diabetes_duration"), cross_validation = FALSE, lambda = 0 ), cvd_shrink = Learner_glmnet( covariates = c("age", "value_LDL"), cross_validation = FALSE, lambda = 0.05, alpha = 1 ) ), death = list( death_simple = Learner_glmnet( covariates = c("sex", "age"), cross_validation = FALSE, lambda = 0 ), death_shrink = Learner_glmnet( covariates = c("diabetes_duration", "value_Smoking"), cross_validation = FALSE, lambda = 0.05, alpha = 1 ) ) ) fit_by_cause <- Superlearner( data = d, id = "id", status = "event", event_time = "time", learners = libraries_by_cause, number_of_nodes = 3, nfold = 2 ) ``` The fitted object can be summarized in the same way. ```{r} summary(fit_by_cause) ``` The stacked and discrete super learner selectors still work as scalar model selectors for prediction. ```{r} risk_by_cause_sl <- predictRisk( fit_by_cause, newdata = newdata, times = times, cause = 1, model = "sl" ) risk_by_cause_discrete <- predictRisk( fit_by_cause, newdata = newdata, times = times, cause = 1, model = "discrete_sl" ) list( sl = risk_by_cause_sl, discrete_sl = risk_by_cause_discrete ) ``` When selecting learners from cause-specific libraries, provide one selector per cause. The first entry selects the learner for cause 1 and the second entry selects the learner for cause 2. ```{r} cause_specific_model <- c("cvd_simple", "death_shrink") cause_specific_model_alt <- c("cvd_shrink", "death_simple") risk_by_cause_selected <- predictRisk( fit_by_cause, newdata = newdata, times = times, cause = 1, model = cause_specific_model ) risk_by_cause_selected_alt <- predictRisk( fit_by_cause, newdata = newdata, times = times, cause = 1, model = cause_specific_model_alt ) list( selected_learners = risk_by_cause_selected, selected_learners_alt = risk_by_cause_selected_alt ) ``` Integer selectors are also supported. For cause-specific libraries, an integer vector can choose different retained learner positions for different causes. ```{r} risk_by_cause_indexed <- predictRisk( fit_by_cause, newdata = newdata, times = times, cause = 1, model = c(1, 2) ) risk_by_cause_indexed ```