Simulate virtual pediatric subjects using anthropometric growth charts
You can install the stable version of SimKid from CRAN with:
install.packages("SimKid")
You can install the development version of SimKid from GitHub with:
# install.packages("devtools")
::install_github("Andy00000000000/SimKid") devtools
There are many input options not used in this example. Please refer
to the documentation (r ?SimKid::sim_kid()
) for
descriptions of all available options.
Simulate 100 virtual subjects per age bin and per sex using CDC growth charts.
library(SimKid)
<- sim_kid(num = 100, agedistr = "nperage", masterseed = 123) demo0
Next, allow those virtual subjects to grow for 3 months with monthly recording of height and weight. Note, this assumes that each subject remains at the same respective percentiles of height- and weight-for-age-and-sex as at baseline.
<- grow_kid(data = demo0, grow_time = 3, tstep = 1) demo
Finally, visually validate that the virtual population is reflective of the CDC growth chart data.
library(ggplot2)
<- validate_kid(data = demo, overlay_percentile = 0.50, alpha = 0.2)
plots
<- theme(
theme_readme axis.text.x = element_text(size = 6, angle = -60, hjust = -0.1),
plot.caption = element_text(size = 6)
)
print(plots[[1]]+theme_readme) # height for age by sex
print(plots[[2]]+theme_readme+coord_cartesian(ylim = c(NA, 125))) # weight for age by sex
print(plots[[3]]+theme_readme+coord_cartesian(xlim = c(NA, 125), ylim = c(NA, 45))) # weight for height by sex
print(plots[[4]]+theme_readme+coord_cartesian(ylim = c(NA, 40))) # BMI for age by sex
Please feel free to report problems using the Issue Tracker or to reach out for help on the Discussion Board.
citation("SimKid")
#> To cite package 'SimKid' in publications use:
#>
#> Santulli AR (2025). _SimKid: Simulate Virtual Pediatrics using
#> Anthropometric Growth Charts_.
#> <https://github.com/Andy00000000000/SimKid>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {SimKid: Simulate Virtual Pediatrics using Anthropometric Growth Charts},
#> author = {Andrew R Santulli},
#> year = {2025},
#> url = {https://github.com/Andy00000000000/SimKid},
#> }
Enhanced Pharmacodynamics, LLC (ePD) is a contract research organization that assists clients with the design and implementation of model-informed drug development strategies in a broad range of therapeutic areas. The executive management team is led by Dr. Donald E. Mager and Dr. Scott A. Van Wart.
SimKid was developed at ePD by Andrew Santulli.
Title: SimKid: An R package for Simulation of Virtual Pediatric Subjects
Authors: Andrew R Santulli1, Sarah F. Cook1, Donald E. Mager1,2, Scott Van Wart1
Institutions: 1Enhanced Pharmacodynamics, LLC, Buffalo, NY, USA 2Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University at Buffalo, The State University of New York, Buffalo, NY, USA
Objective: Modeling and simulation for pharmacometrics often requires creation of virtual subjects, with distributions of model covariates and correlations between them that are reflective of the clinical trial or overall population characteristics. Body weight or body surface area (BSA) are commonly statistically significant and clinically relevant pharmacokinetic model covariates that are used for dosage calculations, especially during scale-down into pediatric populations. Creation of representative virtual subjects is paramount to accurate model-informed drug development when simulating models in pediatric populations. In this work, anthropometric growth chart data1-16 were incorporated into a pharmacometrics-oriented R package with the goal of facilitating the simulation of virtual pediatric populations.
Methods: Publicly available CDC, WHO, and Fenton growth chart data1-16 were collated into a standard format and saved as R data objects (.rda) for easy use and distribution. For CDC data for ages 2 to 20 years, weight-for-length LMS parameters were unavailable. Where the LMS parameters are the median (M), generalized coefficient of variation (S), and power in the Box-Cox transformation (L)17. Therefore, to ensure realistic correlations between weight and height, the correlation between z-score of height and z-score of weight was optimized by sex and 1-year age bin using a simulation approach with sum of squares criteria of fit between simulated and reported percentiles of BMI-for-age6. The LMS parameters for Fenton preterm growth of weight-for-gestational-age13,14 were also unavailable and were fit using a similar approach. The LMS parameters, respective equations used to obtain height and weight, optimized correlations between height and weight, and variability in z-scores were incorporated into an R package that allows for easy and rapid creation of virtual pediatric subjects. For each simulation, the user can specify the independent variables, such as the proportion of female subjects, the assumed distribution of age (uniform or truncated normal), and relevant age statistics (mean, standard deviation, and range).
Results: The SimKid R package can create virtual pediatric subjects with ages ranging from birth to 20 years. In order to validate that the virtual populations were reflective of anthropometric growth chart distributions and correlations, simulated percentiles of weight-for-age, length-for-age, and weight-for-length (for ages up to 2 years) or BMI-for-age (for ages greater than 2 years) were overlaid upon the respective observed percentiles obtained from anthropometric growth charts1-16. Validation figures confirmed that the SimKid package performed as expected, and a validation module was built into the package for user convenience and confidence. The package generates a data frame of virtual subject characteristics that includes age, sex, weight, height, BMI, and various calculations of BSA.
Conclusion: The SimKid R package simulates virtual pediatric subject demographics that are representative of real-world data based upon published growth chart data. Use of this R package can help simplify and potentially standardize the process of simulating virtual pediatric populations.
References:
For young ages (birth to 36 months), CDC weight-for-stature growth charts can be used during simulations to enforce realistic correlations between height and weight. For later ages, such growth charts are unavailable. Therefore, to ensure realistic correlations between weight and height, the correlation between z-score of height and z-score of weight was optimized by sex and 1-year age bin.
A simulation approach wherein 1000 males and 1000 females per CDC
growth chart age bin (monthly for ages of 25 to 239 months) were created
using the respective CDC LMS parameters. The z-scores for height and
weight were sampled for each virtual subject from a truncated
multivariate normal distribution using
r tmvtnorm::rtmvnorm()
(truncated to a range of 0.1 to 99.9
percentiles). The correlation used by the truncated multivariate normal
distribution was optimized separately by sex and by 1-year age bin using
sum of squares criteria of fit between simulated and reported
percentiles of BMI-for-age (3rd, 10th, 25th, 50th, 75th, 90th, and 97th
percentiles). The R r stats::optimize()
function was used
for the optimization. To reduce the impact of stochastic sampling from
the multivariate normal distributions, the simulation-optimization
procedure was repeated 10 times and the average correlations were
calculated. The optimization codes
(cdc_ages2to20yr_correlations_by_sex_htcm_wtkg_allreplicates.R and
cdc_ages2to20yr_correlations_by_sex_htcm_wtkg_summarized.R) are provided
in the data-raw folder.
The 10x replicate correlations (black) and averaged correlations (red) are shown below:
The average correlations, by default, are used when simulating
virtual subjects for ages 25 to 239 months. The user can specify
r age2to20yr_correlate_htwt = FALSE
when calling
r SimKid::sim_kid()
to avoid using the optimized
correlations. This would assume no correlation between height and weight
(ages 25 to 239 months), which is unrealistic but would not impact
subsequent use cases if only weight was to be utilized.
The below simulations demonstrate the impact of including versus excluding the height and weight correlations. Note the discrepancy in simulated versus reported BMI percentiles for ages >2 years when simulating without height and weight correlations. Without the height and weight correlations, there is a discontinuity at 24 months when the simulation methodology changes away from using CDC weight-for-stature charts and the lower ribbon of the simulated data extends to the 10th reported percentile rather than the 25th.
<- sim_kid(num = 100, agedistr = "nperage", masterseed = 123, age2to20yr_correlate_htwt = TRUE)
demo_cor <- sim_kid(num = 100, agedistr = "nperage", masterseed = 123, age2to20yr_correlate_htwt = FALSE)
demo_nocor
<- validate_kid(demo_cor, overlay_percentile = 0.50)
val_cor <- validate_kid(demo_nocor, overlay_percentile = 0.50)
val_nocor
print(val_cor[[4]] + labs(title = "Correlated Height and Weight") + theme_readme + coord_cartesian(ylim = c(NA, 45)))
print(val_nocor[[4]]+labs(title = "Uncorrelated Height and Weight")+theme_readme + coord_cartesian(ylim = c(NA, 45)))