---
title: "Basic Usage"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Basic Usage}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(poissonsuperlearner)
library(riskRegression)
```
# Introduction
This vignette gives fast examples of the Poisson Super Learner workflow after
the refactoring. It focuses on:
* fitting `Superlearner()` with one learner library shared by all causes;
* fitting `Superlearner()` with one learner library per cause;
* using `summary()` for the fitted super learner;
* using `predictRisk()` for the same model selectors.
The examples use small simulated data, two folds, and simple `glmnet` learners
with fixed `lambda` values so the vignette remains quick to run during package
checks.
# Data
We simulate a small competing-risks data set. The observed follow-up time is
stored in `time` and the event indicator in `event`, where `0` denotes
censoring, `1` denotes cardiovascular disease, and `2` denotes death without
prior cardiovascular disease.
```{r}
d <- simulateStenoT1(
n = 45,
scenario = "alpha",
competing_risks = TRUE,
seed = 1
)
d <- d[, .(
id,
time,
event,
sex,
age,
diabetes_duration,
value_LDL,
value_Smoking
)]
head(d)
```
# One Learner Library For All Causes
A learner library is a list of initialized learner objects. If a single library
is supplied in a competing-risks analysis, the same library is used for all
causes.
```{r}
shared_library <- list(
simple = Learner_glmnet(
covariates = c("sex", "diabetes_duration"),
cross_validation = FALSE,
lambda = 0
),
shrink = Learner_glmnet(
covariates = c("sex", "age", "value_LDL"),
cross_validation = FALSE,
lambda = 0.05,
alpha = 1
)
)
fit_shared <- Superlearner(
data = d,
id = "id",
status = "event",
event_time = "time",
learners = shared_library,
number_of_nodes = 3,
nfold = 2
)
```
`summary()` gives a compact overview of the fitted super learner, including the
number of causes, retained learner labels, cross-validated deviances, and
meta-learner coefficients when a meta-learner was fitted.
```{r}
summary(fit_shared)
```
`predictRisk()` returns one row per subject and one column per requested
prediction time. The `model` argument uses the same selectors: `"sl"` for the
stacked ensemble, `"discrete_sl"` for the best cross-validated base learner per
cause, or learner labels such as `"simple"` and `"shrink"` for models stored in
the ensemble.
```{r}
newdata <- d[1:2]
times <- c(1, 2)
risk_shared_sl <- predictRisk(
fit_shared, newdata = newdata, times = times, cause = 1, model = "sl"
)
risk_shared_discrete <- predictRisk(
fit_shared, newdata = newdata, times = times, cause = 1, model = "discrete_sl"
)
risk_shared_simple <- predictRisk(
fit_shared, newdata = newdata, times = times, cause = 1, model = "simple"
)
risk_shared_shrink <- predictRisk(
fit_shared, newdata = newdata, times = times, cause = 1, model = "shrink"
)
list(
sl = risk_shared_sl,
discrete_sl = risk_shared_discrete,
simple = risk_shared_simple,
shrink = risk_shared_shrink
)
```
# One Learner Library Per Cause
For competing risks, `learners` can also be a list with one learner library per
cause. This allows different covariates, tuning parameters, or labels for each
cause.
```{r}
libraries_by_cause <- list(
cvd = list(
cvd_simple = Learner_glmnet(
covariates = c("sex", "diabetes_duration"),
cross_validation = FALSE,
lambda = 0
),
cvd_shrink = Learner_glmnet(
covariates = c("age", "value_LDL"),
cross_validation = FALSE,
lambda = 0.05,
alpha = 1
)
),
death = list(
death_simple = Learner_glmnet(
covariates = c("sex", "age"),
cross_validation = FALSE,
lambda = 0
),
death_shrink = Learner_glmnet(
covariates = c("diabetes_duration", "value_Smoking"),
cross_validation = FALSE,
lambda = 0.05,
alpha = 1
)
)
)
fit_by_cause <- Superlearner(
data = d,
id = "id",
status = "event",
event_time = "time",
learners = libraries_by_cause,
number_of_nodes = 3,
nfold = 2
)
```
The fitted object can be summarized in the same way.
```{r}
summary(fit_by_cause)
```
The stacked and discrete super learner selectors still work as scalar model
selectors for prediction.
```{r}
risk_by_cause_sl <- predictRisk(
fit_by_cause, newdata = newdata, times = times, cause = 1, model = "sl"
)
risk_by_cause_discrete <- predictRisk(
fit_by_cause, newdata = newdata, times = times, cause = 1, model = "discrete_sl"
)
list(
sl = risk_by_cause_sl,
discrete_sl = risk_by_cause_discrete
)
```
When selecting learners from cause-specific libraries, provide one selector per
cause. The first entry selects the learner for cause 1 and the second entry
selects the learner for cause 2.
```{r}
cause_specific_model <- c("cvd_simple", "death_shrink")
cause_specific_model_alt <- c("cvd_shrink", "death_simple")
risk_by_cause_selected <- predictRisk(
fit_by_cause, newdata = newdata, times = times, cause = 1,
model = cause_specific_model
)
risk_by_cause_selected_alt <- predictRisk(
fit_by_cause, newdata = newdata, times = times, cause = 1,
model = cause_specific_model_alt
)
list(
selected_learners = risk_by_cause_selected,
selected_learners_alt = risk_by_cause_selected_alt
)
```
Integer selectors are also supported. For cause-specific libraries, an integer
vector can choose different retained learner positions for different causes.
```{r}
risk_by_cause_indexed <- predictRisk(
fit_by_cause, newdata = newdata, times = times, cause = 1, model = c(1, 2)
)
risk_by_cause_indexed
```