version 2.2.4
SVEMnet
implements Self-Validated Ensemble Models (SVEM,
Lemkus et al. 2021) and the SVEM whole model test (Karl 2024) using
Elastic Net regression via the glmnet
package Friedman et
al. (2010). This vignette provides an overview of the package’s
functionality and usage.
library(SVEMnet)
# Example data
data <- iris
svem_model <- SVEMnet(Sepal.Length ~ ., data = data, relaxed=FALSE,glmnet_alpha=c(1),nBoot = 50)
coef(svem_model)
## Percent of Bootstraps Nonzero
## Sepal.Width 100
## Petal.Length 100
## Petal.Width 96
## Speciesversicolor 92
## Speciesvirginica 90
Generate a plot of actual versus predicted values:
Predict outcomes for new data using the predict()
function:
## [1] 5.000157 4.738925 4.767222 4.867366 5.052403 5.379482 4.918787 5.024105
## [9] 4.686678 4.896489 5.180845 5.100301 4.768047 4.539462 5.108998 5.488324
## [17] 5.074701 4.971034 5.356359 5.203968 5.176496 5.122599 4.747622 5.036881
## [25] 5.328886 4.891315 5.042055 5.076352 4.947910 4.995808 4.943561 4.965860
## [33] 5.418953 5.365881 4.867366 4.691027 4.923961 5.081526 4.662729 5.024105
## [41] 4.894839 4.267882 4.767222 5.036056 5.479626 4.709802 5.309286 4.843417
## [49] 5.180845 4.895664 6.479761 6.298248 6.550782 5.505300 6.165458 6.147508
## [57] 6.473762 5.111549 6.275950 5.608968 5.054953 5.965170 5.540422 6.323022
## [65] 5.513998 6.198929 6.193755 5.877849 5.775784 5.591843 6.439465 5.766532
## [73] 6.237304 6.329021 6.047364 6.146683 6.346971 6.516486 6.141509 5.368432
## [81] 5.463402 5.416329 5.667213 6.465064 6.193755 6.373618 6.398392 5.810081
## [89] 5.947220 5.609793 5.995943 6.299073 5.691162 5.059302 5.866676 6.052538
## [97] 5.971169 6.047364 4.906087 5.842727 6.965101 6.140603 6.848658 6.655194
## [105] 6.743340 7.382024 5.637184 7.188561 6.598599 7.198035 6.372712 6.292993
## [113] 6.543877 5.930792 6.047235 6.437734 6.631245 7.847068 7.343378 5.919667
## [121] 6.742515 6.011336 7.382849 6.017335 6.853007 7.116714 5.993386 6.174074
## [129] 6.515579 6.918077 6.954801 7.676728 6.486456 6.309340 6.614946 6.942802
## [137] 6.741689 6.683492 6.097879 6.519928 6.584950 6.233097 6.140603 6.894905
## [145] 6.736515 6.257046 5.959914 6.344414 6.618422 6.326465
This is the serial version of the significance test. It is slower but the code is less complicated to read than the faster parallel version.
test_result <- svem_significance_test(Sepal.Length ~ ., data = data)
print(test_result)
plot(test_result)
SVEM Significance Test p-value:
[1] 0
Whole model test result
Note that there is a parallelized version that runs much faster
# Simulate data
set.seed(1)
n <- 25
X1 <- runif(n)
X2 <- runif(n)
X3 <- runif(n)
X4 <- runif(n)
X5 <- runif(n)
#y only depends on X1 and X2
y <- 1 + X1 + X2 + X1 * X2 + X1^2 + rnorm(n)
data <- data.frame(y, X1, X2, X3, X4, X5)
# Perform the SVEM significance test
test_result <- svem_significance_test_parallel(
y ~ (X1 + X2 + X3)^2 + I(X1^2) + I(X2^2) + I(X3^2),
data = data
)
# View the p-value
print(test_result)
SVEM Significance Test p-value:
[1] 0.009399093
test_result2 <- svem_significance_test_parallel(
y ~ (X1 + X2 )^2 + I(X1^2) + I(X2^2),
data = data
)
# View the p-value
print(test_result2)
SVEM Significance Test p-value:
[1] 0.006475736
#note that the response does not depend on X4 or X5
test_result3 <- svem_significance_test_parallel(
y ~ (X4 + X5)^2 + I(X4^2) + I(X5^2),
data = data
)
# View the p-value
print(test_result3)
SVEM Significance Test p-value:
[1] 0.8968502
# Plot the Mahalanobis distances
plot(test_result,test_result2,test_result3)
Whole Model Test Results for Example 2
Newly added wrapper for cv.glmnet() to compare performance of SVEM to glmnet’s native CV implementation.
Simulations show improved behavior from a relaxed grid search that allows the model to apply a lighter penalty to parameteres retained from the initial elastic net fit. This option tends to hurt RMSE on holdout data for cross validated glmnet, but the SVEM bootstraps average over the addtional variability introduced by this option and produce smaller RMSE on holdout data.
SVEMnet implements Self-Validated Ensemble Models (SVEM) using
elastic-net (lasso/ridge) base learners via glmnet
. SVEM
averages predictions from bootstrap-resampled fits chosen by an internal
validation scheme, and exposes helpers for (i) deterministic
factor-space expansion, (ii) whole-model significance testing, and (iii)
multi-response random-search optimization with optional
mixture constraints.
This vignette walks through a minimal workflow and ends with an end-to-end lipid formulation example across three responses with a mixture constraint and an optimization step.
# Simulate simple mixed-type data
set.seed(1)
n <- 120
X1 <- runif(n)
X2 <- runif(n)
F <- factor(sample(c("lo","hi"), n, replace = TRUE))
y1 <- 1 + 1.5*X1 - 0.8*X2 + 0.4*(F=="hi") + rnorm(n, 0, 0.2)
y2 <- 0.7 + 0.4*X1 + 0.4*X2 - 0.2*(F=="hi") + rnorm(n, 0, 0.2)
dat <- data.frame(y1, y2, X1, X2, F)
# Fit two SVEM models (keep defaults modest in vignettes)
m1 <- SVEMnet(y1 ~ X1 + X2 + F, dat, nBoot = 30)
m2 <- SVEMnet(y2 ~ X1 + X2 + F, dat, nBoot = 30)
# Predict
head(predict(m1, newdata = dat))
head(predict(m2, newdata = dat))
The serial significance test draws many evaluation points in the factor space, refits SVEM repeatedly on original and permuted responses, and compares distance distributions via a parametric reference fit.
Use svem_optimize_random()
to define goals per response
and pick a best recipe plus k_candidates
diverse high
scorers (PAM medoids; real sampled rows).
objs <- list(y1 = m1, y2 = m2)
goals <- list(
y1 = list(goal = "max", weight = 0.6),
y2 = list(goal = "target", weight = 0.4, target = 0.9)
)
opt_out <- svem_optimize_random(
objects = objs,
goals = goals,
n = 3000,
agg = "mean",
debias = FALSE,
ci = TRUE,
level = 0.95,
k_candidates = 5,
top_frac = 0.02,
verbose = TRUE
)
opt_out$best_x
opt_out$best_pred
head(opt_out$candidates)
In this section we use the bundled lipid_screen
dataset
and demonstrate the complete flow over three responses
(Potency
, Size
, PDI
) with a
mixture constraint on composition.
Key columns
PEG
, Helper
, Ionizable
,
Cholesterol
Ionizable_Lipid_Type
N_P_ratio
, flow_rate
Potency
, Size
,
PDI
Mixture constraints
PEG ∈ [0.01, 0.05]
, and Helper
,
Ionizable
, Cholesterol ∈ [0.10, 0.60]
Keep debias = FALSE
in examples per our package
convention.
test_pot <- svem_significance_test(form_pot, lipid_screen, nPoint=2000, nSVEM=10, nPerm=150,
nBoot=80, glmnet_alpha=1, relaxed=FALSE, verbose=TRUE)
test_siz <- svem_significance_test(form_siz, lipid_screen, nPoint=2000, nSVEM=10, nPerm=150,
nBoot=80, glmnet_alpha=1, relaxed=FALSE, verbose=TRUE)
test_pdi <- svem_significance_test(form_pdi, lipid_screen, nPoint=2000, nSVEM=10, nPerm=150,
nBoot=80, glmnet_alpha=1, relaxed=FALSE, verbose=TRUE)
c(Potency = test_pot$p_value, Size = test_siz$p_value, PDI = test_pdi$p_value)
svem_significance_test_parallel()
operates on one response at a time. Use it separately for each response and then combine the visuals withplot.svem_significance_test()
by passing multiple results.
# Parallel runs (example; adjust nCore to your machine)
par_pot <- svem_significance_test_parallel(form_pot, lipid_screen,
nPoint=3000, nSVEM=10, nPerm=150,
nCore = max(1L, parallel::detectCores()-1L),
seed = 123, verbose=TRUE)
par_siz <- svem_significance_test_parallel(form_siz, lipid_screen,
nPoint=3000, nSVEM=10, nPerm=150,
nCore = max(1L, parallel::detectCores()-1L),
seed = 123, verbose=TRUE)
par_pdi <- svem_significance_test_parallel(form_pdi, lipid_screen,
nPoint=3000, nSVEM=10, nPerm=150,
nCore = max(1L, parallel::detectCores()-1L),
seed = 123, verbose=TRUE)
# Plot all three together
plot(par_pot, par_siz, par_pdi, labels = c("Potency","Size","PDI"))
We construct the mixture constraint and optimize a weighted multi-response goal, returning the best design and 5 diverse high-score existing rows (PAM medoids).
goals <- list(
Potency = list(goal = "max", weight = 0.7),
Size = list(goal = "min", weight = 0.2),
PDI = list(goal = "min", weight = 0.1)
)
mixL <- list(list(
vars = c("Cholesterol","PEG","Ionizable","Helper"),
lower = c(0.10, 0.01, 0.10, 0.10),
upper = c(0.60, 0.05, 0.60, 0.60),
total = 1
))
opt_out <- svem_optimize_random(
objects = objs,
goals = goals,
n = 10000,
mixture_groups = mixL,
agg = "mean",
debias = FALSE,
ci = TRUE,
level = 0.95,
k_candidates = 5,
top_frac = 0.01,
verbose = TRUE
)
opt_out$best_x
opt_out$best_pred
opt_out$best_ci
opt_out$candidates
Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L.
(2021). Self-Validated Ensemble Models for Elastic Net
Regression.
Chemometrics and Intelligent Laboratory Systems, 219,
104439.
DOI: 10.1016/j.chemolab.2021.104439
Karl, A. T. (2024). A Randomized Permutation
Whole-Model Test for SVEM.
Chemometrics and Intelligent Laboratory Systems, 249,
105122.
DOI: 10.1016/j.chemolab.2024.105122
Friedman, J. H., Hastie, T., & Tibshirani, R.
(2010). Regularization Paths for Generalized Linear Models
via Coordinate Descent.
Journal of Statistical Software, 33(1), 1–22.
DOI: 10.18637/jss.v033.i01
Gotwalt, C., & Ramsey, P. (2018). Model
Validation Strategies for Designed Experiments Using Bootstrapping
Techniques With Applications to Biopharmaceuticals.
JMP Discovery Conference.
Link
Ramsey, P., Gaudard, M., & Levin, W. (2021).
Accelerating Innovation with Space-Filling Mixture Designs, Neural
Networks, and SVEM.
JMP Discovery Conference.
Link
Ramsey, P., & Gotwalt, C. (2018). Model
Validation Strategies for Designed Experiments Using Bootstrapping
Techniques With Applications to Biopharmaceuticals.
JMP Discovery Summit Europe.
Link
Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C.
(2021). SVEM: A Paradigm Shift in Design and Analysis of
Experiments.
JMP Discovery Summit Europe.
Link
Ramsey, P., & McNeill, P. (2023). CMC,
SVEM, Neural Networks, DOE, and Complexity: It’s All About
Prediction.
JMP Discovery Conference.
Karl, A., Wisnowski, J., & Rushing, H.
(2022). JMP Pro 17 Remedies for Practical Struggles with
Mixture Experiments.
JMP Discovery Conference.
Link
Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker,
W. Q. (2020). Applications of the Fractional-Random-Weight
Bootstrap.
The American Statistician, 74(4), 345–358.
Link
Karl, A. T. (2024). SVEMnet: Self-Validated
Ensemble Models with Elastic Net Regression.
R package
JMP Help Documentation Overview of
Self-Validated Ensemble Models.
Link