| Type: | Package |
| Title: | Publication-Ready Summary Tables and Forest Plots |
| Version: | 0.11.3 |
| Description: | A comprehensive framework for descriptive statistics and regression analysis that produces publication-ready tables and forest plots. Provides a unified interface from descriptive statistics through multivariable modeling, with support for linear models, generalized linear models, Cox proportional hazards, and mixed-effects models. Also includes univariable screening, multivariate regression, model comparison, and export to multiple formats including PDF, DOCX, PPTX, 'LaTeX', HTML, and RTF. Built on 'data.table' for computational efficiency. |
| License: | GPL (≥ 3) |
| Encoding: | UTF-8 |
| LazyData: | true |
| URL: | https://phmcc.github.io/summata/, https://github.com/phmcc/summata |
| BugReports: | https://github.com/phmcc/summata/issues |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.2.0) |
| Imports: | data.table, survival, ggplot2, stats, grDevices |
| Suggests: | MASS, coxme, flextable, knitr, lme4, MuMIn, officer, pROC, ragg, ResourceSelection, rmarkdown, stringr, systemfonts, tinytex, withr, xtable |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-03-04 01:18:49 UTC; paul |
| Author: | Paul Hsin-ti McClelland
|
| Maintainer: | Paul Hsin-ti McClelland <PaulHMcClelland@protonmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-08 10:20:02 UTC |
Add padding to exported table headers
Description
Adds LaTeX vertical spacing rules to column headers for proper vertical alignment in PDF/LaTeX exports.
Usage
add_header_padding(col_names)
Arguments
col_names |
Character vector of column names. |
Value
Character vector with LaTeX padding rules added.
Add p-value column to result table
Description
Adds formatted p-value column to the survtable result.
Usage
add_pvalue_column(result, p_value, p_digits, marks = NULL)
Arguments
result |
Data.table result. |
p_value |
Numeric p-value. |
p_digits |
Integer decimal places for p-value. |
marks |
List with |
Value
Data.table with p-value column added.
Add raw statistics to row
Description
Appends raw numeric statistics to a data.table row for downstream processing. Used to preserve underlying values alongside formatted display strings.
Usage
add_raw_stats(row, col, stats, stat_type)
Arguments
row |
Data.table row to modify. |
col |
Character string column name for the statistics. |
stats |
Named list of numeric statistics (mean, sd, median, etc.). |
stat_type |
Character string indicating which statistic is displayed. |
Value
Modified row (by reference).
Add padding to exported table variables
Description
Inserts blank padding rows between variable groups in exported tables for improved visual separation.
Usage
add_variable_padding(df)
Arguments
df |
Data.table with Variable column. |
Value
Data.table with padding rows inserted between variable groups.
Apply locale decimal mark to a sprintf-formatted string
Description
Replaces the period decimal mark in a sprintf-formatted string with
the locale-appropriate decimal mark, and fixes negative-zero artefacts.
Usage
apply_decimal_mark(x, marks)
Arguments
x |
Character string (already formatted with |
marks |
List with |
Value
Character string with corrected decimal marks and no negative zeros.
Apply zebra stripes with proper variable group detection for indented tables
Description
Applies alternating background colors to variable groups in flextable objects. Handles both indented tables (detects groups by leading whitespace) and non-indented tables (uses pre-identified groups).
Usage
apply_zebra_stripes_ft(ft, df, var_groups)
Arguments
ft |
flextable object. |
df |
The source data.table used to create the flextable. |
var_groups |
List of row index vectors for variable groups. |
Value
Flextable object with zebra stripe formatting applied.
Create Forest Plot with Automatic Model Detection
Description
A convenience wrapper function that automatically detects the input type and routes to the appropriate specialized forest plot function. This eliminates the need to remember which forest function to call for different model types or analysis objects, making it ideal for exploratory analysis and rapid prototyping.
Usage
autoforest(x, data = NULL, title = NULL, ...)
Arguments
x |
One of the following:
|
data |
Data frame or data.table containing the original data. Required
when |
title |
Character string for plot title. If
|
... |
Additional arguments passed to the specific forest plot function. Common arguments include:
See the documentation for the specific forest function for all available options. |
Details
This function provides a convenient wrapper around the specialized forest plot functions, automatically routing to the appropriate function based on the model class or result type. All parameters are passed through to the underlying function, so the full range of options remains available.
For model-specific advanced features, individual forest functions may be called directly.
Automatic Detection Logic:
The function uses the following priority order for detection:
-
uniscreen results: Detected by class
"uniscreen_result"or presence of attributesoutcome,predictors,model_type, andmodel_scope = "Univariable". Routes touniforest(). -
multifit results: Detected by presence of attributes
predictor,outcomes,model_type, andraw_data. Routes tomultiforest(). -
Cox models: Classes
coxphorclogit. Routes tocoxforest(). -
GLM models: Class
glm. Routes toglmforest(). -
Linear models: Class
lm(but notglm). Routes tolmforest().
Value
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly:
print(plot)Saved to file:
ggsave("forest.pdf", plot, width = 12, height = 8)Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
- width
Numeric. Recommended plot width in specified units
- height
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
See Also
glmforest for GLM forest plots,
coxforest for Cox model forest plots,
lmforest for linear model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
fit for single-model regression,
fullfit for combined univariable/multivariable regression,
uniscreen for univariable screening,
multifit for multi-outcome analysis
Other visualization functions:
coxforest(),
glmforest(),
lmforest(),
multiforest(),
uniforest()
Examples
data(clintrial)
data(clintrial_labels)
library(survival)
# Create example model
glm_model <- glm(surgery ~ age + sex + bmi + smoking,
family = binomial, data = clintrial)
# Example 1: Logistic regression model
p <- autoforest(glm_model, data = clintrial)
# Automatically detects GLM and routes to glmforest()
# Example 2: Cox proportional hazards model
cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + treatment + stage,
data = clintrial)
plot2 <- autoforest(cox_model, data = clintrial)
# Automatically detects coxph and routes to coxforest()
# Example 3: Linear regression model
lm_model <- lm(biomarker_x ~ age + sex + bmi + treatment, data = clintrial)
plot3 <- autoforest(lm_model, data = clintrial)
# Automatically detects lm and routes to lmforest()
# Example 4: With custom labels and formatting options
plot4 <- autoforest(
cox_model,
data = clintrial,
labels = clintrial_labels,
title = "Prognostic Factors for Overall Survival",
zebra_stripes = TRUE,
indent_groups = TRUE
)
# Example 5: From fit() result - data and labels extracted automatically
fit_result <- fit(
data = clintrial,
outcome = "surgery",
predictors = c("age", "sex", "bmi", "treatment"),
labels = clintrial_labels
)
plot5 <- autoforest(fit_result)
# No need to pass data or labels - extracted from fit_result
# Save with recommended dimensions
dims <- attr(plot5, "rec_dims")
ggplot2::ggsave(file.path(tempdir(), "forest.pdf"),
plot5, width = dims$width, height = dims$height)
Export Table with Automatic Format Detection
Description
Automatically detects the output format based on file extension and exports the table using the appropriate specialized function. Provides a unified interface for table export across all supported formats.
Usage
autotable(table, file, ...)
Arguments
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output filename. The file extension determines the export format:
|
... |
Additional arguments passed to the format-specific function. See the documentation for individual functions for available parameters:
Common parameters across formats include:
|
Details
This function provides a convenient wrapper around format-specific export functions, automatically routing to the appropriate function based on the file extension. All parameters are passed through to the underlying function, so the full range of format-specific options remains available.
For format-specific advanced features, you may prefer to use the individual export functions directly:
PDF exports support orientation, paper size, margins, and auto-sizing
DOCX/PPTX/RTF support font customization and flextable formatting
HTML supports CSS styling, responsive design, and custom themes
TeX generates standalone LaTeX source with booktabs styling
Value
Invisibly returns the file path. Called primarily for its side effect of creating the output file.
See Also
table2pdf, table2docx, table2pptx,
table2html, table2rtf, table2tex
Other export functions:
table2docx(),
table2html(),
table2pdf(),
table2pptx(),
table2rtf(),
table2tex()
Examples
# Create example data
data(clintrial)
data(clintrial_labels)
tbl <- desctable(clintrial, by = "treatment",
variables = c("age", "sex"), labels = clintrial_labels)
# Auto-detect format from extension
if (requireNamespace("xtable", quietly = TRUE)) {
autotable(tbl, file.path(tempdir(), "example.html"))
}
# Load example data
data(clintrial)
data(clintrial_labels)
# Create a regression table
results <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment"),
labels = clintrial_labels
)
# Test that LaTeX can actually compile (needed for PDF export)
has_latex <- local({
if (!nzchar(Sys.which("pdflatex"))) return(FALSE)
test_tex <- file.path(tempdir(), "summata_latex_test.tex")
writeLines(c("\\documentclass{article}",
"\\usepackage{booktabs}",
"\\begin{document}", "test",
"\\end{document}"), test_tex)
result <- tryCatch(
system2("pdflatex", c("-interaction=nonstopmode",
paste0("-output-directory=", tempdir()), test_tex),
stdout = FALSE, stderr = FALSE),
error = function(e) 1L)
result == 0L
})
# Export automatically detects format from extension
autotable(results, file.path(tempdir(), "results.html")) # Creates HTML file
autotable(results, file.path(tempdir(), "results.docx")) # Creates Word document
autotable(results, file.path(tempdir(), "results.pptx")) # Creates PowerPoint slide
autotable(results, file.path(tempdir(), "results.tex")) # Creates LaTeX source
autotable(results, file.path(tempdir(), "results.rtf")) # Creates RTF document
if (has_latex) {
autotable(results, file.path(tempdir(), "results.pdf")) # Creates PDF
}
# Pass format-specific parameters
if (has_latex) {
autotable(results, file.path(tempdir(), "results.pdf"),
orientation = "landscape",
paper = "a4",
font_size = 10)
}
autotable(results, file.path(tempdir(), "results.docx"),
caption = "Table 1: Logistic Regression Results",
font_family = "Times New Roman",
condense_table = TRUE)
autotable(results, file.path(tempdir(), "results.html"),
zebra_stripes = TRUE,
dark_header = TRUE,
bold_significant = TRUE)
# Works with any summata table output
desc <- desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"))
if (has_latex) {
autotable(desc, file.path(tempdir(), "demographics.pdf"))
}
comparison <- compfit(
data = clintrial,
outcome = "os_status",
model_list = list(
base = c("age", "sex"),
full = c("age", "sex", "treatment", "stage")
)
)
autotable(comparison, file.path(tempdir(), "model_comparison.docx"))
Bold significant p-values in DOCX
Description
Applies bold formatting to significant p-values in flextable objects by detecting values below threshold or "< 0.001" patterns.
Usage
bold_pvalues_ft(ft, df, p_threshold = 0.05)
Arguments
ft |
flextable object. |
df |
The source data.table. |
p_threshold |
Numeric p-value threshold for significance. |
Value
Flextable object with significant p-values bolded.
Build row for failed model
Description
Creates a comparison table row with appropriate NA values for a model that failed to fit.
Usage
build_failed_model_row(model_name, n, n_predictors, model_type)
Arguments
model_name |
Character string name of the model. |
n |
Integer sample size. |
n_predictors |
Integer number of predictors attempted. |
model_type |
Character string indicating model type. |
Value
Data.table with single row of NA metrics and "Failed" convergence.
Build comparison row for successfully fitted model
Description
Creates a comparison table row with extracted metrics for a successfully fitted model.
Usage
build_model_row(
model_name,
n_predictors,
converged,
metrics,
model_type,
marks = NULL
)
Arguments
model_name |
Character string name of the model. |
n_predictors |
Integer number of predictors in the model. |
converged |
Character string convergence status. |
metrics |
Named list of extracted model metrics. |
model_type |
Character string indicating model type. |
Value
Data.table with single row of formatted metrics.
Calculate scores for mixed-effects Cox models
Description
Computes component scores for coxme models based on concordance, pseudo-R-squared, and ICC metrics.
Usage
calculate_coxme_scores(comparison, weights, scores, n_models)
Arguments
comparison |
Data.table with model comparison metrics. |
weights |
Named list of scoring weights. |
scores |
List of initialized score vectors. |
n_models |
Integer number of models being compared. |
Value
Updated scores list with calculated values and total.
Calculate table layout for forest plots
Description
Computes column widths and positions for the table portion of a forest plot. Determines spacing based on content width, font size, and desired table/forest proportion. Returns positions in log-scale units for plot coordinate system.
Usage
calculate_forest_layout(
to_show_exp_clean,
show_n,
show_events,
indent_groups,
condense_table,
effect_label,
ref_label,
font_size,
table_width = 0.6,
rangeb,
center_padding
)
Arguments
to_show_exp_clean |
Data.table with formatted display data for the plot. |
show_n |
Logical whether to include sample size column. |
show_events |
Logical whether to include events column. |
indent_groups |
Logical whether groups are indented (affects level column). |
condense_table |
Logical whether table is condensed (affects level column). |
effect_label |
Character string describing effect measure type. |
ref_label |
Character string label for reference categories. |
font_size |
Numeric font size for width calculations. |
table_width |
Numeric proportion of total width for table (0-1). |
rangeb |
Numeric vector of length 2 with plot x-axis range. |
center_padding |
Numeric additional padding for effect column. |
Value
List with table_width, forest_width, positions, rangeplot_start, total_width, and effect_abbrev components.
Calculate scores for generalized linear mixed-effects models
Description
Computes component scores for glmer models based on concordance, marginal R-squared, and ICC metrics.
Usage
calculate_glmer_scores(comparison, weights, scores, n_models)
Arguments
comparison |
Data.table with model comparison metrics. |
weights |
Named list of scoring weights. |
scores |
List of initialized score vectors. |
n_models |
Integer number of models being compared. |
Value
Updated scores list with calculated values and total.
Calculate scores for linear mixed-effects models
Description
Computes component scores for lmer models based on marginal R-squared, conditional R-squared, and ICC metrics.
Usage
calculate_lmer_scores(comparison, weights, scores, n_models)
Arguments
comparison |
Data.table with model comparison metrics. |
weights |
Named list of scoring weights. |
scores |
List of initialized score vectors. |
n_models |
Integer number of models being compared. |
Value
Updated scores list with calculated values and total.
Calculate Composite Mean Scores (CMS) for model comparison
Description
Computes composite Score based on weighted combination of model quality metrics. Weights vary by model type to reflect academic consensus on important metrics for each model class.
Usage
calculate_model_scores(comparison, model_type, scoring_weights = NULL)
Arguments
comparison |
Data.table with model comparison metrics. |
model_type |
Character string indicating model type. |
scoring_weights |
Optional named list of custom weights. If |
Value
Data.table with CMS column added, sorted by score.
Calculate table layout for multiforest plots
Description
Computes column widths and positions for the table portion of a multiforest plot.
Usage
calculate_multiforest_layout(
to_show_exp_clean,
show_n,
show_events,
indent_predictor,
show_predictor = TRUE,
effect_abbrev,
font_size,
table_width = 0.6,
rangeb,
center_padding,
ci_pct = 95
)
Calculate table width based on paper size and orientation
Description
Computes usable table width in inches based on paper dimensions and orientation, accounting for standard 1-inch margins.
Usage
calculate_table_width(paper, orientation)
Arguments
paper |
Character string paper size ("letter", "a4", "legal"). |
orientation |
Character string page orientation ("portrait", "landscape"). |
Value
Numeric usable width in inches.
Calculate table layout for uniforest plots
Description
Internal function to determine column positions and widths for forest plot table section. Positions are calculated in the same units as the data (log scale for OR/HR/RR, linear for coefficients).
Usage
calculate_uniforest_layout(
to_show_exp_clean,
show_n,
show_events,
indent_groups,
table_width,
center_padding,
effect_abbrev,
font_size,
log_scale,
rangeb,
ci_pct = 95
)
Arguments
to_show_exp_clean |
Data.table with formatted data for plotting. |
show_n |
Logical whether to include n column. |
show_events |
Logical whether to include events column. |
indent_groups |
Logical whether levels are indented. |
table_width |
Proportion of width for table. |
center_padding |
Padding between table and forest. |
effect_abbrev |
Effect type abbreviation. |
font_size |
Font size multiplier. |
log_scale |
Logical whether using log scale. |
rangeb |
Numeric vector with plot range bounds (in data units). |
Value
List with column positions, widths, and layout parameters.
Check model convergence
Description
Checks convergence status for various model types including standard regression models and mixed-effects models.
Usage
check_convergence(model)
Arguments
model |
Fitted model object. |
Value
Character string: "Yes", "No", "Suspect", or "Failed"
Check LaTeX installation
Description
Verifies that a LaTeX distribution (pdflatex or xelatex) is available on the system for PDF compilation.
Usage
check_latex()
Value
Logical TRUE if LaTeX is available, FALSE otherwise.
Check required packages for model type
Description
Verifies that necessary packages are installed for the specified model type. Stops with informative error if required packages are missing.
Usage
check_required_packages(model_type)
Arguments
model_type |
Character string indicating model type. |
Value
NULL (invisibly). Stops execution if packages missing.
Simulated Clinical Trial Dataset
Description
A simulated dataset from a hypothetical multi-center oncology clinical trial comparing two experimental drugs against control. Designed to demonstrate the full capabilities of descriptive and regression analysis functions.
Usage
clintrial
Format
A data frame with 850 observations and 32 variables:
- patient_id
Unique patient identifier (character)
- age
Age at enrollment in years (numeric: 18-90)
- sex
Biological sex (factor: Female, Male)
- race
Self-reported race (factor: White, Black, Asian, Other)
- ethnicity
Hispanic ethnicity (factor: Non-Hispanic, Hispanic)
- bmi
Body mass index in kg/m
^2(numeric)- smoking
Smoking history (factor: Never, Former, Current)
- hypertension
Hypertension diagnosis (factor: No, Yes)
- diabetes
Diabetes diagnosis (factor: No, Yes)
- ecog
ECOG performance status (factor: 0, 1, 2, 3)
- creatinine
Baseline creatinine in mg/dL (numeric)
- hemoglobin
Baseline hemoglobin in g/dL (numeric)
- biomarker_x
Serum biomarker A in ng/mL (numeric)
- biomarker_y
Serum biomarker B in U/L (numeric)
- site
Enrolling site (factor: Site Alpha through Site Kappa)
- grade
Tumor grade (factor: Well/Moderately/Poorly differentiated)
- stage
Disease stage at diagnosis (factor: I, II, III, IV)
- treatment
Randomized treatment (factor: Control, Drug A, Drug B)
- surgery
Surgical resection (factor: No, Yes)
- any_complication
Any post-operative complication (factor: No, Yes)
- wound_infection
Post-operative wound infection (factor: No, Yes)
- icu_admission
ICU admission required (factor: No, Yes)
- readmission_30d
Hospital readmission within 30 days (factor: No, Yes)
- pain_score
Pain score at discharge (numeric: 0-10)
- recovery_days
Days to functional recovery (numeric)
- los_days
Hospital length of stay in days (numeric)
- ae_count
Adverse event count (integer). Overdispersed count suitable for negative binomial or quasipoisson regression.
- fu_count
Follow-up visit count (integer). Equidispersed count suitable for standard Poisson regression.
- pfs_months
Progression-Free Survival Time (months)
- pfs_status
Progression or Death Event
- os_months
Overall survival time in months (numeric)
- os_status
Death indicator (numeric: 0=censored, 1=death)
Details
This dataset includes realistic correlations between variables:
- Survival is worse with higher stage, ECOG, age, and biomarker_x
- Treatment effects show Drug B > Drug A > Control
- ae_count is overdispersed (variance > mean) for negative binomial demos
- fu_count is equidispersed (variance \approx mean) for Poisson demos
- Approximately 2% of values are missing at random
- Median follow-up is approximately 30 months
Source
Simulated data for demonstration purposes
See Also
Other sample data:
clintrial_labels
Examples
data(clintrial)
data(clintrial_labels)
# Descriptive statistics by treatment arm
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "stage", "ecog",
"biomarker_x", "Surv(os_months, os_status)"),
labels = clintrial_labels)
# Poisson regression for equidispersed counts
fit(clintrial,
outcome = "fu_count",
predictors = c("age", "stage", "treatment"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels)
# Negative binomial for overdispersed counts
fit(clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes"),
model_type = "negbin",
labels = clintrial_labels)
# Complete analysis pipeline
fullfit(clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "stage", "grade", "ecog",
"smoking", "biomarker_x", "biomarker_y", "treatment"),
method = "screen",
p_threshold = 0.20,
model_type = "coxph",
labels = clintrial_labels)
Variable Labels for Clinical Trial Dataset
Description
A named character vector providing descriptive labels for all variables in the clinical_trial dataset. Use with labels parameter in functions.
Usage
clintrial_labels
Format
Named character vector with 24 elements
See Also
Other sample data:
clintrial
Combine coefficient tables from multiple models
Description
Merges coefficient tables from multiple fitted models into a single data.table with a Model identifier column.
Usage
combine_coefficient_tables(coef_list, model_names)
Arguments
coef_list |
List of data.tables containing coefficient information. |
model_names |
Character vector of model names corresponding to coef_list. |
Value
Combined data.table with Model column, or NULL if empty.
Combine multivariate results
Description
Internal helper to combine unadjusted and/or adjusted results from multiple outcomes into a single data.table.
Usage
combine_multifit_results(all_results, columns)
Arguments
all_results |
List of results from fit_one_outcome calls. |
columns |
Character specifying which columns to include. |
Value
Combined data.table.
Compare Multiple Regression Models
Description
Fits multiple regression models and provides a comprehensive comparison table with model quality metrics, convergence diagnostics, and selection guidance. Computes a composite score combining multiple quality metrics to facilitate rapid model comparison and selection.
Usage
compfit(
data,
outcome,
model_list,
model_names = NULL,
interactions_list = NULL,
random = NULL,
model_type = "auto",
family = "binomial",
conf_level = 0.95,
p_digits = 3,
include_coefficients = FALSE,
scoring_weights = NULL,
labels = NULL,
number_format = NULL,
verbose = NULL,
...
)
Arguments
data |
Data frame or data.table containing the dataset. |
outcome |
Character string specifying the outcome variable. For survival
analysis, use |
model_list |
List of character vectors, each containing predictor names for one model. Can also be a single character vector to auto-generate nested models. |
model_names |
Character vector of names for each model. If |
interactions_list |
List of character vectors specifying interaction
terms for each model. Each element corresponds to one model in model_list.
Use |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
model_type |
Character string specifying model type. If
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Common options include:
For negative binomial, use |
conf_level |
Numeric confidence level for intervals. Default is 0.95. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
include_coefficients |
Logical. If TRUE, includes a second table with coefficient estimates. Default is FALSE. |
scoring_weights |
Named list of scoring weights. Each weight should be
between 0 and 1, and they should sum to 1. Available metrics depend on model
type. If |
labels |
Named character vector providing custom display labels for
variables. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to model fitting functions. |
Details
This function fits all specified models and computes comprehensive quality metrics for comparison. It generates a Composite Model Score (CMS) that combines multiple metrics: lower AIC/BIC (information criteria), higher concordance (discrimination), and model convergence status.
For GLMs, McFadden's pseudo-R-squared is calculated as 1 - (logLik/logLik_null). For survival models, the global p-value comes from the log-rank test.
Models that fail to converge are flagged and penalized in the composite score.
Interaction Terms:
When interactions_list is provided, each element specifies the
interaction terms for the corresponding model in model_list. This is
particularly useful for testing whether adding interactions improves model fit:
Use
NULLfor models without interactionsSpecify interactions using colon notation:
c("age:treatment", "sex:stage")Main effects for all variables in interactions must be in the predictor list
Common pattern: Compare main effects model vs model with interactions
Scoring weights can be customized based on model type:
GLM:
"convergence","aic","concordance","pseudo_r2","brier"Cox:
"convergence","aic","concordance","global_p"Linear:
"convergence","aic","pseudo_r2","rmse"
Default weights emphasize discrimination (concordance) and model fit (AIC).
The composite score is designed as a tool to quickly rank models by their quality metrics. It should be used alongside traditional model selection criteria rather than as a definitive model selection method.
Value
A data.table with class "compfit_result" containing:
- Model
Model name/identifier
- CMS
Composite Model Score for model selection (higher is better)
- N
Sample size
- Events
Number of events (for survival/logistic)
- Predictors
Number of predictors
- Converged
Whether model converged properly
- AIC
Akaike Information Criterion
- BIC
Bayesian Information Criterion
- R
^2/ Pseudo-R^2 McFadden pseudo-R-squared (GLM)
- Concordance
C-statistic (logistic/survival)
- Brier Score
Brier accuracy score (logistic)
- Global p
Overall model p-value
Attributes include:
- models
List of fitted model objects
- coefficients
Coefficient comparison table (if requested)
- best_model
Name of recommended model
See Also
fit for individual model fitting,
fullfit for automated variable selection,
table2pdf for exporting results
Other regression functions:
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
Examples
# Load example data
data(clintrial)
data(clintrial_labels)
# Example 1: Compare nested logistic regression models
models <- list(
base = c("age", "sex"),
clinical = c("age", "sex", "smoking", "diabetes"),
full = c("age", "sex", "smoking", "diabetes", "stage", "ecog")
)
comparison <- compfit(
data = clintrial,
outcome = "os_status",
model_list = models,
model_names = c("Base", "Clinical", "Full")
)
comparison
# Example 2: Compare Cox survival models
library(survival)
surv_models <- list(
simple = c("age", "sex"),
clinical = c("age", "sex", "stage", "grade")
)
surv_comparison <- compfit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
model_list = surv_models,
model_type = "coxph"
)
surv_comparison
# Example 3: Test effect of adding interaction terms
interaction_models <- list(
main = c("age", "treatment", "sex"),
interact = c("age", "treatment", "sex")
)
interaction_comp <- compfit(
data = clintrial,
outcome = "os_status",
model_list = interaction_models,
model_names = c("Main Effects", "With Interaction"),
interactions_list = list(
NULL,
c("treatment:sex")
)
)
interaction_comp
# Example 4: Include coefficient comparison table
detailed <- compfit(
data = clintrial,
outcome = "os_status",
model_list = models,
include_coefficients = TRUE,
labels = clintrial_labels
)
# Access coefficient table
coef_table <- attr(detailed, "coefficients")
coef_table
# Example 5: Access fitted model objects
fitted_models <- attr(comparison, "models")
names(fitted_models)
# Example 6: Get best model recommendation
best <- attr(comparison, "best_model")
cat("Recommended model:", best, "\n")
Condense quantitative variable rows only
Description
Collapses multi-row continuous and survival variables into single rows while preserving all categorical variable rows (including binary). Only applies to descriptive tables from desctable().
Usage
condense_quantitative_rows(df, indent_groups = TRUE)
Arguments
df |
Data.table or data frame |
indent_groups |
Logical. Whether to apply indentation formatting. |
Value
A data.table with condensed continuous/survival rows
Condense table rows for more compact display
Description
Collapses multi-row variables into single rows for compact tables. Continuous variables show only the first statistic row, binary categorical variables show only the non-reference category, and survival variables show only the median row.
Usage
condense_table_rows(df, indent_groups = TRUE)
Arguments
df |
Data.table with Variable and Group columns. |
indent_groups |
Logical whether indentation will be applied (affects processing). |
Value
Data.table with condensed rows.
Convert between units
Description
Converts measurements between different unit systems commonly used in graphics (inches, centimeters, millimeters, pixels, points).
Usage
convert_units(value, from = "in", to = "in", dpi = 96)
Arguments
value |
Numeric value to convert. |
from |
Character string specifying source unit ("in", "cm", "mm", "px", "pt"). |
to |
Character string specifying target unit ("in", "cm", "mm", "px", "pt"). |
dpi |
Integer dots per inch for pixel conversions (default 96). |
Value
Numeric value in target units.
Create Forest Plot for Cox Proportional Hazards Models
Description
Generates a publication-ready forest plot that combines a formatted data table with a graphical representation of hazard ratios from a Cox proportional hazards survival model. The plot integrates variable names, group levels, sample sizes, event counts, hazard ratios with confidence intervals, p-values, and model diagnostics in a single comprehensive visualization designed for manuscripts and presentations.
Usage
coxforest(
x,
data = NULL,
title = "Cox Proportional Hazards Model",
effect_label = "Hazard Ratio",
digits = 2,
p_digits = 3,
conf_level = 0.95,
font_size = 1,
annot_size = 3.88,
header_size = 5.82,
title_size = 23.28,
plot_width = NULL,
plot_height = NULL,
table_width = 0.6,
show_n = TRUE,
show_events = TRUE,
indent_groups = FALSE,
condense_table = FALSE,
bold_variables = FALSE,
center_padding = 4,
zebra_stripes = TRUE,
ref_label = "reference",
labels = NULL,
color = "#8A61D8",
qc_footer = TRUE,
units = "in",
number_format = NULL
)
Arguments
x |
Either a fitted Cox model object (class |
data |
Data frame or data.table containing the original data used to
fit the model. If |
title |
Character string specifying the plot title displayed at the top.
Default is |
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
digits |
Integer specifying the number of decimal places for hazard ratios and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6 (60% table, 40% forest plot). |
show_n |
Logical. If |
show_events |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
ref_label |
Character string to display for reference categories.
Default is |
labels |
Named character vector providing custom display labels for
variables. Example: |
color |
Character string specifying the color for hazard ratio point
estimates in the forest plot. Default is |
qc_footer |
Logical. If |
units |
Character string specifying units for plot dimensions:
|
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Details
Survival-Specific Features:
The Cox forest plot includes several survival analysis-specific components:
-
Event counts: Number of events (deaths, failures) shown for each predictor category, critical for assessing statistical power
-
Hazard ratios: Always exponentiated coefficients (never raw), interpreted as the multiplicative change in hazard
-
Log scale: Forest plot uses log scale for HR (reference line at 1)
-
Model diagnostics: Includes concordance (C-index), global log-rank test p-value, and AIC
Plot Components:
-
Title: Centered at top
-
Data Table (left): Contains:
Variable and Group columns
n: Sample sizes by group
Events: Event counts by group (critical for survival)
aHR (95% CI); p-value: Adjusted hazard ratios with CIs and p-values
-
Forest Plot (right):
Point estimates (squares sized by sample size)
95% confidence intervals
Reference line at HR = 1
Log scale for hazard ratios
-
Model Statistics (footer):
Events analyzed (with percentage of total)
Global log-rank test p-value
Concordance (C-index) with standard error
AIC
Interpreting Hazard Ratios:
-
HR = 1: No effect on hazard (reference)
-
HR > 1: Increased hazard (worse survival)
-
HR < 1: Decreased hazard (better survival)
Example: HR = 2.0 means twice the hazard of the event at any time
Event Counts:
The "Events" column is particularly important in survival analysis:
Indicates the number of actual events (not censored observations) in each group
Essential for assessing statistical power
Categories with very few events may have unreliable HR estimates
The footer shows total events analyzed and percentage of all events in the original data
Concordance (C-index):
The concordance statistic displayed in the footer indicates discrimination:
Range: 0.5 to 1.0
0.5 = random prediction (coin flip)
0.7-0.8 = acceptable discrimination
> 0.8 = excellent discrimination
Standard error provided for confidence interval calculation
Global Log-Rank Test:
The global p-value tests the null hypothesis that all coefficients are zero:
Significant p-value (< 0.05) indicates the model as a whole predicts survival
Non-significant global test doesn't preclude significant individual predictors
Based on the score (log-rank) test
Stratification and Clustering:
If the model includes stratification (strata()) or clustering
(cluster()):
Stratified variables are not shown in the forest plot (they don't have HRs)
Clustering affects standard errors but not point estimates
Both are handled automatically by the function
Proportional Hazards Assumption:
The forest plot assumes proportional hazards (constant HR over time). Users should verify this assumption using:
-
cox.zph(model)for testing Stratification for variables violating the assumption
Time-dependent coefficients if needed
Value
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly:
print(plot)Saved to file:
ggsave("forest.pdf", plot, width = 12, height = 8)Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
- width
Numeric. Recommended plot width in specified units
- height
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
See Also
autoforest for automatic model detection,
glmforest for logistic/GLM forest plots,
lmforest for linear model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
coxph for fitting Cox models,
fit for regression modeling
Other visualization functions:
autoforest(),
glmforest(),
lmforest(),
multiforest(),
uniforest()
Examples
data(clintrial)
data(clintrial_labels)
library(survival)
# Create example model
model1 <- coxph(
survival::Surv(os_months, os_status) ~ age + sex + treatment,
data = clintrial)
# Example 1: Basic Cox model forest plot
p <- coxforest(model1, data = clintrial)
old_width <- options(width = 180)
# Example 2: With custom labels and title
plot2 <- coxforest(
x = model1,
data = clintrial,
title = "Prognostic Factors for Overall Survival",
labels = clintrial_labels
)
# Example 3: Comprehensive model with indented layout
model3 <- coxph(
Surv(os_months, os_status) ~ age + sex + bmi + smoking +
treatment + stage + grade,
data = clintrial
)
plot3 <- coxforest(
x = model3,
data = clintrial,
labels = clintrial_labels,
indent_groups = TRUE,
zebra_stripes = TRUE
)
# Example 4: Condensed layout for many binary predictors
model4 <- coxph(
Surv(os_months, os_status) ~ age + sex + smoking +
hypertension + diabetes + surgery,
data = clintrial
)
plot4 <- coxforest(
x = model4,
data = clintrial,
condense_table = TRUE,
labels = clintrial_labels
)
# Example 5: Stratified Cox model
model5 <- coxph(
Surv(os_months, os_status) ~ age + sex + treatment + strata(site),
data = clintrial
)
plot5 <- coxforest(
x = model5,
data = clintrial,
title = "Stratified by Study Site",
labels = clintrial_labels
)
# Example 6: Save with recommended dimensions
dims <- attr(plot5, "rec_dims")
ggplot2::ggsave(file.path(tempdir(), "survival_forest.pdf"),
plot5, width = dims$width, height = dims$height)
options(old_width)
Create Publication-Ready Descriptive Statistics Tables
Description
Generates comprehensive descriptive statistics tables with automatic variable type detection, group comparisons, and appropriate statistical testing. This function is designed to create "Table 1"-style summaries commonly used in clinical and epidemiological research, with full support for continuous, categorical, and time-to-event variables.
Usage
desctable(
data,
by = NULL,
variables,
stats_continuous = c("median_iqr"),
stats_categorical = "n_percent",
digits = 1,
p_digits = 3,
conf_level = 0.95,
p_per_stat = FALSE,
na_include = FALSE,
na_label = "Unknown",
na_percent = FALSE,
test = TRUE,
test_continuous = "auto",
test_categorical = "auto",
total = TRUE,
total_label = "Total",
labels = NULL,
number_format = NULL,
...
)
Arguments
data |
Data frame or data.table containing the dataset to summarize. Automatically converted to a data.table for efficient processing. |
by |
Character string specifying the column name of the grouping
variable for stratified analysis (e.g., treatment arm, exposure
status). When |
variables |
Character vector of variable names to summarize. Can
include standard column names for continuous or categorical variables,
and survival expressions using |
stats_continuous |
Character vector specifying which statistics to compute for continuous variables. Multiple values create separate rows for each variable. Options:
Default is |
stats_categorical |
Character string specifying the format for categorical variable summaries:
|
digits |
Integer specifying the number of decimal places for continuous statistics. Default is 1. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals in survival variable summaries (median survival time with CI). Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
p_per_stat |
Logical. If |
na_include |
Logical. If |
na_label |
Character string used to label the missing values row when
|
na_percent |
Logical. Controls how percentages are calculated for
categorical variables when
Only affects categorical variables. Default is |
test |
Logical. If |
test_continuous |
Character string specifying the statistical test for continuous variables:
|
test_categorical |
Character string specifying the statistical test for categorical variables:
|
total |
Logical or character string controlling the total column:
|
total_label |
Character string for the total column header.
Default is |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names (or |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
... |
Additional arguments passed to the underlying statistical test
functions (e.g., |
Details
Variable Type Detection:
The function automatically detects variable types and applies appropriate summaries:
-
Continuous: Numeric variables (integer or double) receive statistics specified in
stats_continuous -
Categorical: Character, factor, or logical variables receive frequency counts and percentages
-
Time-to-Event: Variables specified as
Surv(time, event)display median survival with confidence intervals (level controlled byconf_level)
Statistical Testing:
When test = TRUE and by is specified:
-
Continuous with "auto": Parametric tests (t-test, ANOVA) for mean-based statistics; non-parametric tests (Wilcoxon, Kruskal-Wallis) for median-based statistics
-
Categorical with "auto": Fisher exact test when any expected cell frequency < 5;
\chi^2test otherwise -
Survival: Log-rank test for comparing survival curves
-
Range statistics: No p-value computed (ranges are descriptive)
Missing Data Handling:
Missing values are handled differently by variable type:
-
Continuous: NAs excluded from calculations; optionally shown as count when
na_include = TRUE -
Categorical: NAs can be included as a category when
na_include = TRUE. Thena_percentparameter controls whether percentages are calculated with or without NAs in the denominator -
Survival: NAs in time or event excluded from analysis
Formatting Conventions:
All numeric output respects the number_format parameter. Separators
within ranges and confidence intervals adapt automatically to avoid
ambiguity:
Mean
\pmSD:"45.2 \eqn{\pm} 12.3"(US) or"45,2 \eqn{\pm} 12,3"(EU)Median [IQR]:
"38.0 [28.0-52.0]"(US) or"38,0 [28,0-52,0]"(EU, en-dash separator)Range:
"18.0-75.0"(positive, US),"-5.0 to 10.0"(when bounds are negative)Survival:
"24.5 (21.2-28.9)"(US) or"24,5 (21,2-28,9)"(EU)Counts
\ge1000:"1,234"(US) or"1.234"(EU)-
p-values:
"< 0.001"(US) or"< 0,001"(EU)
Value
A data.table with S3 class "desctable" containing formatted
descriptive statistics. The table structure includes:
- Variable
Variable name or label (from
labels)- Group
For continuous variables: statistic type (e.g., "Mean
\pmSD", "Median [IQR]"). For categorical variables: category level. Empty for variable name rows.- Total
Statistics for the total sample (if
total = TRUE)- Group columns
Statistics for each group level (when
byis specified). Column names match group levels.- p-value
Formatted p-values from statistical tests (when
test = TRUEandbyis specified)
The first row always shows sample sizes for each column. All numeric
output (counts, statistics, p-values) respects the
number_format setting for locale-appropriate formatting.
The returned object includes the following attributes accessible via
attr():
- raw_data
A data.table containing unformatted numeric values suitable for further statistical analysis or custom formatting. Includes additional columns for standard deviations, quartiles, etc.
- by_variable
The grouping variable name used (value of
by)- variables
The variables analyzed (value of
variables)
See Also
survtable for detailed survival summary tables,
fit for regression modeling,
table2pdf for PDF export,
table2docx for Word export,
table2html for HTML export
Other descriptive functions:
print.survtable(),
survtable()
Examples
# Load example clinical trial data
data(clintrial)
# Example 1: Basic descriptive table without grouping
desctable(clintrial,
variables = c("age", "sex", "bmi"))
# Example 2: Grouped comparison with default tests
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "race", "bmi"))
# Example 3: Customize continuous statistics
desctable(clintrial,
by = "treatment",
variables = c("age", "bmi", "creatinine"),
stats_continuous = c("median_iqr", "range"))
# Example 4: Change categorical display format
desctable(clintrial,
by = "treatment",
variables = c("sex", "race", "smoking"),
stats_categorical = "n") # Show counts only
# Example 5: Include missing values
desctable(clintrial,
by = "treatment",
variables = c("age", "smoking", "hypertension"),
na_include = TRUE,
na_label = "Missing")
# Example 6: Disable statistical testing
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"),
test = FALSE)
# Example 7: Force specific tests
desctable(clintrial,
by = "surgery",
variables = c("age", "sex"),
test_continuous = "t", # t-test instead of auto
test_categorical = "fisher") # Fisher test instead of auto
# Example 8: Adjust decimal places
desctable(clintrial,
by = "treatment",
variables = c("age", "bmi"),
digits = 2, # 2 decimals for continuous
p_digits = 4) # 4 decimals for p-values
# Example 9: Custom variable labels
labels <- c(
age = "Age (years)",
sex = "Sex",
bmi = "Body Mass Index (kg/m\u00b2)",
treatment = "Treatment Arm"
)
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"),
labels = labels)
# Example 10: Position total column last
desctable(clintrial,
by = "treatment",
variables = c("age", "sex"),
total = "last")
# Example 11: Exclude total column
desctable(clintrial,
by = "treatment",
variables = c("age", "sex"),
total = FALSE)
# Example 12: Survival analysis
desctable(clintrial,
by = "treatment",
variables = "Surv(os_months, os_status)")
# Example 13: Multiple survival endpoints
desctable(clintrial,
by = "treatment",
variables = c(
"Surv(pfs_months, pfs_status)",
"Surv(os_months, os_status)"
),
labels = c(
"Surv(pfs_months, pfs_status)" = "Progression-Free Survival",
"Surv(os_months, os_status)" = "Overall Survival"
))
# Example 14: Mixed variable types
desctable(clintrial,
by = "treatment",
variables = c(
"age", "sex", "race", # Demographics
"bmi", "creatinine", # Labs
"smoking", "hypertension", # Risk factors
"Surv(os_months, os_status)" # Survival
))
# Example 15: Three or more groups
desctable(clintrial,
by = "stage", # Assuming stage has 3+ levels
variables = c("age", "sex", "bmi"))
# Automatically uses ANOVA/Kruskal-Wallis and chi-squared
# Example 16: Access raw unformatted data
result <- desctable(clintrial,
by = "treatment",
variables = c("age", "bmi"))
raw_data <- attr(result, "raw_data")
print(raw_data)
# Raw data includes unformatted numbers, SDs, quartiles, etc.
# Example 17: Check which grouping variable was used
result <- desctable(clintrial,
by = "treatment",
variables = c("age", "sex"))
attr(result, "by_variable") # "treatment"
# Example 18: NA percentage calculation options
# Include NAs in percentage denominator (all sum to 100%)
desctable(clintrial,
by = "treatment",
variables = "smoking",
na_include = TRUE,
na_percent = TRUE)
# Exclude NAs from denominator (non-missing sum to 100%)
desctable(clintrial,
by = "treatment",
variables = "smoking",
na_include = TRUE,
na_percent = FALSE)
# Example 19: Passing additional test arguments
# Equal variance t-test
desctable(clintrial,
by = "sex",
variables = "age",
test_continuous = "t",
var.equal = TRUE)
# Example 20: European number formatting
desctable(clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"),
number_format = "eu")
# Example 21: Complete Table 1 for publication
table1 <- desctable(
data = clintrial,
by = "treatment",
variables = c(
"age", "sex", "race", "ethnicity", "bmi",
"smoking", "hypertension", "diabetes",
"ecog", "creatinine", "hemoglobin",
"site", "stage", "grade",
"Surv(os_months, os_status)"
),
labels = clintrial_labels,
stats_continuous = c("median_iqr", "range"),
total = TRUE,
na_include = FALSE
)
print(table1)
Detect if model is univariable or multivariable
Description
Determines whether a model contains one predictor (univariable) or multiple predictors (multivariable) by analyzing coefficient names and factor structure. Handles interactions and random effects appropriately.
Usage
detect_model_type(model)
Arguments
model |
Fitted model object. |
Value
Character string: "Univariable" or "Multivariable".
Auto-detect model type based on outcome and random effects
Description
Determines the appropriate model type based on outcome variable characteristics and presence of random effects.
Usage
detect_model_type_auto(data, outcome, has_random_effects, family = "binomial")
Arguments
data |
Data.frame or data.table containing the outcome variable. |
outcome |
Character string specifying the outcome variable or Surv() expression. |
has_random_effects |
Logical indicating if random effects are specified. |
family |
Character string for GLM family (default "binomial"). |
Value
Character string indicating detected model type.
Detect outcome type from data
Description
Automatically determines whether an outcome variable is binary, continuous, or count-based by examining the data values. Used for automatic model type selection and validation. Binary outcomes have 2 unique values, continuous have many values or non-integers, counts have integers >= 0.
Usage
detect_outcome_type(data, outcome)
Arguments
data |
Data frame or data.table containing the outcome variable. |
outcome |
Character string naming the outcome variable. |
Value
Character string: "binary", "continuous", "count", or "unknown".
Detect available sans-serif font for plots
Description
Checks for commonly available sans-serif fonts in order of preference (Helvetica, Arial, Helvetica Neue) and returns the first available one. Falls back to "sans" if none are found or if systemfonts is unavailable.
Usage
detect_plot_font()
Details
When ragg is being used as the graphics device (detected via options or knitr settings), font detection works in non-interactive sessions since ragg handles font rendering independently of the R graphics system.
Value
Character string with the font family name to use.
Determine alignment for exported tables
Description
Creates column alignment string for LaTeX tables. Variable and Group columns are left-aligned; all others are centered.
Usage
determine_alignment(df)
Arguments
df |
Data.frame or data.table to determine alignment for. |
Value
Character string with alignment codes (e.g., "rlcc").
Determine Effect Type Label
Description
Identifies the appropriate effect measure label (OR, HR, RR, Coefficient, aOR, aHR, aRR, Adj. Coefficient) based on model type, exponentiation setting, and whether the estimate is adjusted (multivariable) or unadjusted (univariable).
Usage
determine_effect_type(uni_raw, multi_raw, exponentiate, adjusted = FALSE)
Arguments
uni_raw |
Raw univariable data.table containing coefficient columns.
Used to detect effect type when |
multi_raw |
Raw multivariable data.table containing coefficient columns.
Used to detect effect type when |
exponentiate |
Logical or
|
adjusted |
Logical. If |
Value
Character string with the effect measure label:
Extract metrics for mixed-effects Cox models (coxme)
Description
Extracts quality metrics specific to mixed-effects Cox proportional hazards models including concordance, pseudo-R-squared, and ICC.
Usage
extract_coxme_metrics(model, raw_data, metrics)
Arguments
model |
Fitted coxme object from coxme package. |
raw_data |
Data.table with raw model information. |
metrics |
Named list of initialized metrics to populate. |
Value
Updated metrics list with coxme-specific values.
Extract metrics for generalized linear mixed-effects models (glmer)
Description
Extracts quality metrics specific to generalized linear mixed-effects models including concordance, R-squared measures, ICC, and Brier score for binomial.
Usage
extract_glmer_metrics(model, raw_data, metrics)
Arguments
model |
Fitted glmerMod object from lme4. |
raw_data |
Data.table with raw model information. |
metrics |
Named list of initialized metrics to populate. |
Value
Updated metrics list with glmer-specific values.
Extract metrics for linear mixed-effects models (lmer)
Description
Extracts quality metrics specific to linear mixed-effects models including R-squared measures, ICC, and global significance tests.
Usage
extract_lmer_metrics(model, raw_data, metrics)
Arguments
model |
Fitted lmerMod object from lme4. |
raw_data |
Data.table with raw model information. |
metrics |
Named list of initialized metrics to populate. |
Value
Updated metrics list with lmer-specific values.
Extract comprehensive model metrics based on academic consensus
Description
Extracts quality control metrics from fitted models for comparison. Supports GLM, Cox, linear, and mixed-effects models.
Usage
extract_model_metrics(model, raw_data, model_type)
Arguments
model |
Fitted model object. |
raw_data |
Data.table with raw model information. |
model_type |
Character string indicating model type. |
Value
Named list of metrics.
Extract predictor effects from a fitted model
Description
Internal helper function that extracts only the predictor variable's coefficient(s) from a fitted model, ignoring intercept and covariates. Supports standard models (glm, lm, coxph) and mixed effects models (glmer, lmer, coxme).
Usage
extract_predictor_effects(
model,
predictor,
outcome,
conf_level = 0.95,
adjusted = FALSE,
terms_to_extract = NULL
)
Arguments
model |
Fitted model object. |
predictor |
Character string of the predictor variable name. |
outcome |
Character string of the outcome variable name. |
conf_level |
Numeric confidence level. |
adjusted |
Logical indicating if this is an adjusted model. |
terms_to_extract |
Character vector of terms to extract (predictor and optionally interaction terms involving the predictor). |
Value
data.table with predictor effect information.
Finalize Column Names for Display
Description
Renames internal column names (uni_effect, uni_p, multi_effect, multi_p) to publication-ready display names with appropriate effect measure labels (OR, HR, RR, aOR, aHR, aRR, Coefficient, etc.).
Usage
finalize_column_names(
result,
uni_raw,
multi_raw,
exponentiate,
columns,
metrics,
conf_level = 0.95
)
Arguments
result |
Data.table with columns to rename. Expected to contain some combination of: uni_effect, uni_p, multi_effect, multi_p. |
uni_raw |
Raw univariable data.table used to determine effect type by checking for presence of OR, HR, RR, or Coefficient columns. |
multi_raw |
Raw multivariable data.table used to determine adjusted effect type. |
exponentiate |
Logical or |
columns |
Character string indicating which columns are present
( |
metrics |
Character vector of metrics being displayed ( |
Value
The input data.table with columns renamed
Find non-reference row for binary variable condensing
Description
Identifies the non-reference row in a binary categorical variable by checking for NA estimates (reference rows have NA). This is more robust than assuming row position or matching specific strings like "Yes" or "Positive".
Usage
find_non_reference_row(var_rows, estimate_col = "estimate")
Arguments
var_rows |
Data.table containing rows for a single variable. |
estimate_col |
Character string naming the estimate column (e.g., "estimate", "coef"). Default is "estimate". |
Value
Integer index of the non-reference row within var_rows, or NULL if
cannot be determined (e.g., no NA estimates found, or multiple non-NA rows).
Fit Regression Model with Publication-Ready Output
Description
Provides a unified interface for fitting various types of regression models with automatic formatting of results for publication. Supports generalized linear models, linear models, survival models, and mixed-effects models with consistent syntax and output formatting. Handles both univariable and multivariable models automatically.
Usage
fit(
data = NULL,
outcome = NULL,
predictors = NULL,
model = NULL,
model_type = "glm",
family = "binomial",
random = NULL,
interactions = NULL,
strata = NULL,
cluster = NULL,
weights = NULL,
conf_level = 0.95,
reference_rows = TRUE,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
keep_qc_stats = TRUE,
exponentiate = NULL,
number_format = NULL,
verbose = NULL,
...
)
Arguments
data |
Data frame or data.table containing the analysis dataset. Required for formula-based workflow; optional for model-based workflow (extracted from model if not provided). |
outcome |
Character string specifying the outcome variable name. For
survival analysis, use |
predictors |
Character vector of predictor variable names to include in
the model. All predictors are included simultaneously (multivariable model).
For univariable models, provide a single predictor. Can include continuous,
categorical (factor), or binary variables. Required for formula-based
workflow; ignored if |
model |
Optional pre-fitted model object to format. When provided,
|
model_type |
Character string specifying the type of regression model.
Ignored if
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
interactions |
Character vector of interaction terms using colon
notation (e.g., |
strata |
For Cox or conditional logistic models, character string naming
the stratification variable. Creates separate baseline hazards for each
stratum level without estimating stratum effects. Default is |
cluster |
For Cox models, character string naming the variable for
robust clustered standard errors. Accounts for within-cluster correlation
(e.g., patients within hospitals). Default is |
weights |
Character string naming the weights variable in |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names, values are display
labels. Default is |
keep_qc_stats |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients. Default
is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting
function ( |
Details
Model Scope Detection:
The function automatically detects whether the model is:
-
Univariable: Single predictor (e.g.,
predictors = "age"). Effect estimates are labeled as unadjusted ("OR", "HR", etc.), representing crude (unadjusted) association -
Multivariable: Multiple predictors (e.g.,
predictors = c("age", "sex", "treatment")) Effect estimates are labeled as adjusted ("aOR", "aHR", etc.), representing associations adjusted for confounding
Interaction Terms:
Interactions are specified using colon notation and added to the model:
-
interactions = c("age:treatment")creates interaction between age and treatment Main effects for both variables are automatically included
Multiple interactions can be specified:
c("age:sex", "treatment:stage")For interactions between categorical variables, separate terms are created for each combination of levels
Stratification (Cox/Conditional Logistic):
The strata parameter creates separate baseline hazards:
Allows baseline hazard to vary across strata without estimating stratum effects
Useful when proportional hazards assumption violated across strata
Example:
strata = "center"for multicenter studiesStratification variable is not included as a predictor
Clustering (Cox Models):
The cluster parameter computes robust standard errors:
Accounts for within-cluster correlation (e.g., multiple observations per patient)
Uses sandwich variance estimator
Does not change point estimates, only standard errors and p-values
Weighting:
The weights parameter enables weighted regression:
For survey data with sampling weights
Inverse probability weighting for causal inference
Frequency weights for aggregated data
Weights should be in a column of
data
Mixed-Effects Models (lmer/glmer/coxme):
Mixed effects models handle hierarchical or clustered data:
Use
model_type = "lmer"for continuous/normal outcomesUse
model_type = "glmer"with appropriatefamilyfor GLM outcomesUse
model_type = "coxme"for survival outcomes with clusteringRandom effects are specified in predictors using lme4 syntax:
-
"(1|site)"- Random intercepts by site -
"(treatment|site)"- Random slopes for treatment by site -
"(1 + treatment|site)"- Both random intercepts and slopes
-
Include random effects as part of the predictors vector
Example:
predictors = c("age", "treatment", "(1|site)")
Effect Measures by Model Type:
-
Logistic (
family = "binomial"/"quasibinomial"): Odds ratios (OR/aOR) -
Cox (
model_type = "coxph"): Hazard ratios (HR/aHR) -
Poisson/Count (
family = "poisson"/"quasipoisson"): Rate ratios (RR/aRR) -
Negative binomial (
model_type = "negbin"): Rate ratios (RR/aRR) -
Gamma/Log-link: Ratios (multiplicative effects)
-
Linear/Gaussian: Raw coefficient estimates (additive effects)
Confidence Intervals:
Confidence intervals are computed using the Wald method. This is the standard approach for GLM, Cox, and mixed-effects regression and is appropriate for standard sample sizes. The Wald interval is computed directly from the coefficient and standard error, avoiding redundant matrix operations.
For small samples, sparse data, or parameters near boundary values, profile likelihood confidence intervals may be preferred. These can be obtained from the underlying model object:
result <- fit(data, outcome, predictors) confint(attr(result, "model"))
Value
A data.table with S3 class "fit_result" containing formatted
regression results. The table structure includes:
- Variable
Character. Predictor name or custom label
- Group
Character. For factor variables: category level. For interactions: interaction term. For continuous: typically empty
- n
Integer. Total sample size (if
show_n = TRUE)- n_group
Integer. Sample size for this factor level
- events
Integer. Total number of events (if
show_events = TRUE)- events_group
Integer. Events for this factor level
- OR/HR/RR/Coefficient or aOR/aHR/aRR/Adj. Coefficient (95% CI)
Character. Formatted effect estimate with confidence interval. Column name depends on model type and scope. Univariable models use: OR, HR, RR, Coefficient. Multivariable models use adjusted notation: aOR, aHR, aRR, Adj. Coefficient
- p-value
Character. Formatted p-value from Wald test
The returned object includes the following attributes accessible via attr():
- model
The fitted model object (glm, lm, coxph, etc.). Access for diagnostics, predictions, or further analysis
- raw_data
data.table. Unformatted numeric results with columns for coefficients, standard errors, confidence bounds, quality statistics, etc.
- outcome
Character. The outcome variable name
- predictors
Character vector. The predictor variable names
- formula_str
Character. The complete model formula as a string
- model_scope
Character. "Univariable" (one predictor) or "Multivariable" (multiple predictors)
- model_type
Character. The regression model type used
- interactions
Character vector (if interactions specified). The interaction terms included
- strata
Character (if stratification used). The stratification variable
- cluster
Character (if clustering used). The cluster variable
- weights
Character (if weighting used). The weights variable
- significant
Character vector. Names of predictors with p-value below 0.05, suitable for downstream variable selection workflows
See Also
uniscreen for univariable screening of multiple predictors,
fullfit for complete univariable-to-multivariable workflow,
compfit for comparing multiple models,
m2dt for model-to-table conversion
Other regression functions:
compfit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
Examples
# Load example data
data(clintrial)
data(clintrial_labels)
library(survival)
# Example 1: Univariable logistic regression
uni_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = "age"
)
print(uni_model)
# Labeled as "Univariable OR"
# Example 2: Multivariable logistic regression
multi_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "treatment"),
labels = clintrial_labels
)
print(multi_model)
# Example 3: Cox proportional hazards model
cox_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "coxph",
labels = clintrial_labels
)
print(cox_model)
# Example 4: Model with interaction terms
interact_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment", "sex"),
interactions = c("age:treatment"),
labels = clintrial_labels
)
print(interact_model)
# Example 5: Cox model with stratification
strat_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment"),
model_type = "coxph",
strata = "site", # Separate baseline hazards by site
labels = clintrial_labels
)
print(strat_model)
# Example 6: Cox model with clustering
cluster_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "treatment"),
model_type = "coxph",
cluster = "site", # Robust SEs accounting for site clustering
labels = clintrial_labels
)
print(cluster_model)
# Example 7: Linear regression
linear_model <- fit(
data = clintrial,
outcome = "bmi",
predictors = c("age", "sex", "smoking"),
model_type = "lm",
labels = clintrial_labels
)
print(linear_model)
# Example 8: Poisson regression for equidispersed count data
# fu_count has variance ~= mean, appropriate for standard Poisson
poisson_model <- fit(
data = clintrial,
outcome = "fu_count",
predictors = c("age", "stage", "treatment", "surgery"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels
)
print(poisson_model)
# Returns rate ratios (RR/aRR)
# Example 9: Negative binomial regression for overdispersed counts
# ae_count has variance > mean (overdispersed), use negbin or quasipoisson
if (requireNamespace("MASS", quietly = TRUE)) {
nb_result <- fit(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery"),
model_type = "negbin",
labels = clintrial_labels
)
print(nb_result)
}
# Example 10: Gamma regression for positive continuous outcomes
gamma_model <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "surgery"),
model_type = "glm",
family = Gamma(link = "log"),
labels = clintrial_labels
)
print(gamma_model)
# Example 11: Access the underlying fitted model
result <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi")
)
# Get the model object
model_obj <- attr(result, "model")
summary(model_obj)
# Model diagnostics
plot(model_obj)
# Predictions
preds <- predict(model_obj, type = "response")
# Example 12: Access raw numeric data
raw_data <- attr(result, "raw_data")
print(raw_data)
# Contains unformatted coefficients, SEs, CIs, AIC, BIC, etc.
# Example 13: Multiple interactions
complex_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "bmi"),
interactions = c("age:treatment", "sex:bmi"),
labels = clintrial_labels
)
print(complex_model)
# Example 14: Customize output columns
minimal <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment"),
show_n = FALSE,
show_events = FALSE,
reference_rows = FALSE
)
print(minimal)
# Example 15: Different confidence levels
ci90 <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment"),
conf_level = 0.90 # 90% confidence intervals
)
print(ci90)
# Example 16: Force coefficient display instead of OR
coef_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi"),
exponentiate = FALSE # Show log odds instead of OR
)
print(coef_model)
# Example 17: Check model quality statistics
result <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
keep_qc_stats = TRUE
)
raw <- attr(result, "raw_data")
cat("AIC:", raw$AIC[1], "\n")
cat("BIC:", raw$BIC[1], "\n")
cat("C-statistic:", raw$c_statistic[1], "\n")
# Example 18: Interaction effects - treatment effect modified by stage
interaction_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "treatment", "stage"),
interactions = c("treatment:stage"),
model_type = "coxph",
labels = clintrial_labels
)
print(interaction_model)
# Shows main effects plus all treatment×stage interaction terms
# Example 19: Multiple interactions in logistic regression
multi_interaction <- fit(
data = clintrial,
outcome = "readmission_30d",
predictors = c("age", "sex", "surgery", "diabetes"),
interactions = c("surgery:diabetes", "age:sex"),
labels = clintrial_labels
)
print(multi_interaction)
# Example 20: Quasipoisson for overdispersed count data
# Alternative to negative binomial when MASS not available
quasi_model <- fit(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery"),
model_type = "glm",
family = "quasipoisson",
labels = clintrial_labels
)
print(quasi_model)
# Adjusts standard errors for overdispersion
# Example 21: Quasibinomial for overdispersed binary data
quasi_logistic <- fit(
data = clintrial,
outcome = "any_complication",
predictors = c("age", "bmi", "diabetes", "surgery"),
model_type = "glm",
family = "quasibinomial",
labels = clintrial_labels
)
print(quasi_logistic)
# Example 22: Gamma regression with identity link for additive effects
gamma_identity <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "surgery", "any_complication"),
model_type = "glm",
family = Gamma(link = "identity"),
labels = clintrial_labels
)
print(gamma_identity)
# Shows additive effects (coefficients) instead of multiplicative (ratios)
# Example 23: Inverse Gaussian regression for highly skewed data
inverse_gaussian <- fit(
data = clintrial,
outcome = "recovery_days",
predictors = c("age", "surgery", "pain_score"),
model_type = "glm",
family = inverse.gaussian(link = "log"),
labels = clintrial_labels
)
print(inverse_gaussian)
# Example 24: Linear mixed effects with random intercepts
# Accounts for clustering of patients within sites
if (requireNamespace("lme4", quietly = TRUE)) {
lmer_model <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "stage", "(1|site)"),
model_type = "lmer",
labels = clintrial_labels
)
print(lmer_model)
}
# Example 25: Generalized linear mixed effects (logistic with random effects)
if (requireNamespace("lme4", quietly = TRUE)) {
glmer_model <- fit(
data = clintrial,
outcome = "readmission_30d",
predictors = c("age", "surgery", "los_days", "(1|site)"),
model_type = "glmer",
family = "binomial",
labels = clintrial_labels
)
print(glmer_model)
}
# Example 26: Cox mixed effects for clustered survival data
if (requireNamespace("coxme", quietly = TRUE)) {
coxme_model <- fit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "treatment", "stage", "(1|site)"),
model_type = "coxme",
labels = clintrial_labels
)
print(coxme_model)
}
# Example 27: Random slopes - treatment effect varies by site
if (requireNamespace("lme4", quietly = TRUE)) {
random_slopes <- fit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "stage", "(treatment|site)"),
model_type = "lmer",
labels = clintrial_labels
)
print(random_slopes)
}
# Example 28: Format a pre-fitted model (model-based workflow)
# Useful for models fitted outside of fit()
pre_fitted <- glm(os_status ~ age + sex + treatment,
family = binomial, data = clintrial)
result <- fit(model = pre_fitted,
data = clintrial,
labels = clintrial_labels)
print(result)
Fix negative zero in formatted strings
Description
Corrects floating-point rounding artifacts that produce "-0.00" or similar negative zero strings. Works on character vectors, replacing patterns like "-0.00", "-0.000", etc. with their positive equivalents, even when embedded within larger strings (e.g., "(-0.00, 1.23)" becomes "(0.00, 1.23)").
Usage
fix_negative_zero(x, marks = NULL)
Arguments
x |
Character vector of formatted numbers. |
marks |
Optional list with |
Details
When marks is supplied, also replaces the period decimal mark with
the locale-appropriate decimal mark.
Value
Character vector with negative zeros corrected.
Determine CI separator for forest plot text annotations
Description
Returns the appropriate separator string between CI lower and upper bounds in forest plot annotations. Considers whether values may be negative and the current locale's decimal mark.
Usage
forest_ci_separator(has_negatives, marks = NULL)
Arguments
has_negatives |
Logical indicating whether any CI bounds are negative. |
marks |
Optional list with |
Value
Character string separator.
Format a categorical statistic for display
Description
Converts frequency counts into formatted display strings following standard conventions (n, n (%), % only) with locale-aware decimal marks.
Usage
format_categorical_stat(n, total, stat_type, marks)
Arguments
n |
Integer count for the category. |
total |
Integer total count for percentage calculation. |
stat_type |
Character string: |
marks |
List with |
Value
Character string with the formatted statistic.
Apply formatting to column headers in exported tables (PDF/LaTeX)
Description
Formats column headers for LaTeX output by escaping special characters, italicizing 'n' and 'p', and optionally adding vertical spacing.
Usage
format_column_headers(col_names, add_header_space = TRUE)
Arguments
col_names |
Character vector of column names. |
add_header_space |
Logical whether to add vertical padding. |
Value
Character vector with LaTeX-formatted column names.
Apply formatting to column headers in exported tables (HTML)
Description
Formats column headers for HTML output by italicizing 'n' and 'p' using HTML tags.
Usage
format_column_headers_html(col_names)
Arguments
col_names |
Character vector of column names. |
Value
Character vector with HTML-formatted column names.
Format column headers with n counts (HTML)
Description
Creates HTML-formatted column headers with sample size counts displayed below the column name using line breaks.
Usage
format_column_headers_with_n_html(col_names, n_row_data)
Arguments
col_names |
Character vector of column names. |
n_row_data |
Named list or data.table row with n values for each column. |
Value
Character vector with HTML-formatted headers including n counts.
Format column headers with n counts (TeX)
Description
Creates LaTeX-formatted column headers with sample size counts displayed below the column name in a stacked format.
Usage
format_column_headers_with_n_tex(col_names, n_row_data)
Arguments
col_names |
Character vector of column names. |
n_row_data |
Named list or data.table row with n values for each column. |
Value
Character vector with LaTeX-formatted headers including n counts.
Format continuous statistic for display
Description
Converts numeric summary statistics into formatted display strings following
standard conventions (mean \pm SD, median [IQR], range, etc.).
Usage
format_continuous_stat(stats, stat_type, fmt_str, marks)
Arguments
stats |
Named list of numeric statistics (mean, sd, median, q1, q3, min, max). |
stat_type |
Character string: "mean_sd", "median_iqr", "median_range", or "range". |
fmt_str |
Character string format specification for sprintf. |
marks |
List with |
Value
Character string with formatted statistic.
Format an integer count with locale-aware thousands separator
Description
Formats integer values with a thousands separator for display in tables. Values below 1000 are returned as plain character strings.
Usage
format_count(n, marks)
Arguments
n |
Integer count value. |
marks |
List with |
Value
Character string with the formatted count.
Format an integer for forest plot annotations
Description
Applies thousands separator to an integer value. Respects locale marks when provided.
Usage
format_count_forest(x, marks = NULL)
Arguments
x |
Integer value. |
marks |
Optional list with |
Value
Character string.
Format combined fullfit output from formatted tables
Description
Merges univariable and multivariable results into a single publication-ready table with side-by-side display. Uses vectorized merge instead of per-variable loops.
Usage
format_fullfit_combined(
uni_formatted,
multi_formatted,
uni_raw,
multi_raw,
predictors,
columns,
metrics,
show_n,
show_events,
labels,
exponentiate = NULL,
conf_level = 0.95
)
Arguments
uni_formatted |
Formatted data.table from univariable screening. |
multi_formatted |
Formatted data.table from multivariable model. |
uni_raw |
Raw data.table with univariable coefficients. |
multi_raw |
Raw data.table with multivariable coefficients. |
predictors |
Character vector of all predictor names. |
columns |
Character string specifying columns to show ("both", "uni", "multi"). |
metrics |
Character vector specifying metrics to show ("effect", "p"). |
show_n |
Logical whether to include sample size column. |
show_events |
Logical whether to include events column. |
labels |
Optional named character vector of variable labels. |
exponentiate |
Optional logical for coefficient exponentiation. |
conf_level |
Numeric confidence level for CI label. |
Value
Combined data.table with aligned univariable and multivariable results.
Format headers for flextable
Description
Applies formatting to flextable headers including italicizing 'n', adding sample size counts from N row data, and bolding all headers.
Usage
format_headers_ft(ft, has_n_row, n_row_data)
Arguments
ft |
flextable object. |
has_n_row |
Logical whether source data had an N row. |
n_row_data |
Data from the N row for adding counts to headers. |
Value
Formatted flextable object.
Apply formatting to indented groups
Description
Transforms tables with Variable/Group columns into indented format where group levels appear as indented rows under variable names. Handles both regression and descriptive tables with appropriate p-value placement.
Usage
format_indented_groups(df, indent_string = " ")
Arguments
df |
Data.table with Variable and Group columns. |
indent_string |
Character string to use for indentation. |
Value
Data.table with Group column removed and levels indented under Variables.
Format interaction term for display
Description
Converts R's internal interaction term format (e.g., "treatmentDrug A:stageII") to a more readable format (e.g., "Treatment (Drug A) × Stage (II)").
Usage
format_interaction_term(term, labels = NULL)
Arguments
term |
Character string of the interaction term from model coefficients. |
labels |
Optional named vector of labels for variable names. |
Value
Formatted interaction term string.
Format model comparison table
Description
Rounds numeric columns to appropriate precision based on metric type.
Usage
format_model_comparison(comparison)
Arguments
comparison |
Data.table with comparison metrics. |
Value
Data.table with properly rounded numeric columns.
Format model results for publication-ready display
Description
Transforms raw model coefficient data into a formatted table suitable for publication. Handles effect measure formatting (OR, HR, RR, Estimate), confidence intervals, p-values, sample sizes, and variable labels. Supports interaction terms and mixed-effects models.
Usage
format_model_table(
data,
effect_col = NULL,
digits = 2,
p_digits = 3,
labels = NULL,
show_n = TRUE,
show_events = TRUE,
reference_label = "reference",
exponentiate = NULL,
conf_level = 0.95,
marks = NULL
)
Arguments
data |
Data.table containing raw model results with coefficient columns. |
effect_col |
Optional character string specifying the effect column name.
If |
digits |
Integer number of decimal places for effect estimates. |
p_digits |
Integer number of decimal places for p-values. |
labels |
Optional named character vector mapping variable names to display labels. Supports automatic labeling of interaction terms. |
show_n |
Logical whether to include sample size column. |
show_events |
Logical whether to include events column (ignored for linear models). |
reference_label |
Character string to display for reference categories. |
exponentiate |
Optional logical to force exponentiated (TRUE) or raw (FALSE)
coefficient display. If |
Value
Formatted data.table with publication-ready columns.
Format multifit results for publication
Description
Internal helper that formats raw multivariate results into publication-ready table format.
Usage
format_multifit_table(
data,
columns,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
predictor_label = NULL,
include_predictor = TRUE,
exponentiate = NULL,
conf_level = 0.95,
marks = NULL
)
Arguments
data |
Raw combined data.table from combine_multifit_results. |
columns |
Character specifying column layout. |
show_n |
Logical for sample size column. |
show_events |
Logical for events column. |
digits |
Integer decimal places for effects. |
p_digits |
Integer decimal places for p-values. |
labels |
Named vector of labels for outcomes and predictors. |
predictor_label |
Label for predictor variable. |
include_predictor |
Logical for including predictor column. |
exponentiate |
Logical or |
Value
Formatted data.table.
Format a numeric value with locale-aware separators
Description
General-purpose number formatter used by all display functions. For values
\ge 1000 (in absolute value), inserts the appropriate
thousands separator. Fixes negative-zero display artefacts.
Usage
format_num(x, fmt_str, marks)
Arguments
x |
Numeric value to format. |
fmt_str |
Character string |
marks |
List with |
Value
Character string with the formatted number.
Format numeric value with fixed decimal places
Description
Formats a numeric value to a specified number of decimal places, removing
leading/trailing whitespace and fixing negative zero display (e.g., "-0.00"
becomes "0.00"). When marks is supplied, applies locale-appropriate
decimal mark substitution.
Usage
format_number(x, digits, marks = NULL)
Arguments
x |
Numeric value to format. |
digits |
Integer number of decimal places. |
marks |
Optional list with |
Value
Character string with formatted value.
Format a p-value for forest plot annotations
Description
Returns a formatted p-value string suitable for forest plot display,
using locale-aware decimal marks when marks is provided.
Usage
format_p_forest(p, p_digits, marks = NULL)
Arguments
p |
Numeric p-value. |
p_digits |
Integer decimal places. |
marks |
Optional list with |
Value
Character string.
Format a p-value with locale-aware decimal mark
Description
Converts a numeric p-value to a display string with the correct decimal separator and threshold notation (e.g., "< 0.001" or "< 0,001").
Usage
format_pvalue(p, digits, marks)
Arguments
p |
Numeric p-value. |
digits |
Integer number of decimal places. |
marks |
List with |
Value
Character string with the formatted p-value.
Format p-value for survtable
Description
Provides p-value formatting to the survtable result.
Usage
format_pvalue_survtable(p, digits, marks = NULL)
Arguments
p |
Numeric p-value. |
digits |
Integer decimal places. |
marks |
List with |
Value
Character-formatted p-value.
Format p-values for descriptive tables
Description
Converts numeric p-values to formatted strings with appropriate precision. Handles very small p-values with threshold notation (e.g., "< 0.001").
Usage
format_pvalues_desctable(result, p_digits, marks)
Arguments
result |
Data.table with 'p_value' column to format. |
p_digits |
Integer number of decimal places for p-values. |
marks |
List with |
Value
Modified data.table with 'p_value' column (formatted strings).
Format p-values for exported tables (HTML)
Description
Applies bold formatting to significant p-values in HTML tables using the b tag.
Usage
format_pvalues_export_html(df, p_threshold = 0.05)
Arguments
df |
Data.table containing p-value columns. |
p_threshold |
Numeric threshold for significance (default 0.05). |
Value
Data.table with significant p-values wrapped in HTML bold tags.
Format p-values for exported tables
Description
Applies bold formatting to significant p-values in LaTeX tables using the textbf command.
Usage
format_pvalues_export_tex(df, p_threshold = 0.05)
Arguments
df |
Data.table containing p-value columns. |
p_threshold |
Numeric threshold for significance (default 0.05). |
Value
Data.table with significant p-values wrapped in LaTeX bold commands.
Format p-values for display
Description
Converts numeric p-values to formatted character strings using vectorized operations. Values below the threshold (determined by digits parameter) display as "< 0.001" (for digits=3), "< 0.0001" (for digits=4), etc. NA values display as "-".
Usage
format_pvalues_fit(p, digits = 3, marks = NULL)
Arguments
p |
Numeric vector of p-values. |
digits |
Integer number of decimal places. Also determines the threshold for "less than" display: threshold = 10^(-digits). Default is 3. |
marks |
Optional list with |
Value
Character vector of formatted p-values.
Format p-values for multifit display
Description
Converts numeric p-values to formatted character strings. Values below the threshold (determined by digits parameter) display as "< 0.001" (for digits=3), "< 0.0001" (for digits=4), etc. NA values display as "-".
Usage
format_pvalues_multifit(p, digits = 3, marks = NULL)
Arguments
p |
Numeric vector of p-values. |
digits |
Integer number of decimal places. Also determines the threshold for "less than" display: threshold = 10^(-digits). Default is 3. |
marks |
Optional list with |
Value
Character vector of formatted p-values.
Vectorized quantile cell formatting
Description
Formats survival quantile cells for multiple rows at once.
Usage
format_quantile_cells(est, lower, upper, fmt_str, marks = NULL)
Arguments
est |
Numeric vector of estimates. |
lower |
Numeric vector of lower CI bounds. |
upper |
Numeric vector of upper CI bounds. |
fmt_str |
Format string for numeric values. |
marks |
List with |
Value
Character vector of formatted cells.
Vectorized survival cell formatting
Description
Formats survival probability cells for multiple rows at once. Uses locale-aware decimal marks and safe CI separators that avoid ambiguity with negative values or decimal commas.
Usage
format_survival_cells(
est,
lower,
upper,
n_risk,
n_event,
stats,
fmt_est,
fmt_ci_lower,
fmt_ci_upper,
percent,
marks = NULL
)
Arguments
est |
Numeric vector of estimates. |
lower |
Numeric vector of lower CI bounds. |
upper |
Numeric vector of upper CI bounds. |
n_risk |
Integer vector of numbers at risk. |
n_event |
Integer vector of event counts. |
stats |
Character vector of statistics to include. |
fmt_est |
Format string for estimate. |
fmt_ci_lower |
Format string for lower CI bound. |
fmt_ci_upper |
Format string for upper CI bound. |
percent |
Logical whether percentages. |
marks |
List with |
Value
Character vector of formatted cells.
Format survival median with CI for display
Description
Formats a survival median estimate with confidence interval using
locale-aware decimal marks and safe CI separators. Used by
process_survival in descriptive tables.
Usage
format_survival_ci(median, lower, upper, fmt_str, marks)
Arguments
median |
Numeric median survival time. |
lower |
Numeric lower CI bound. |
upper |
Numeric upper CI bound. |
fmt_str |
Character string |
marks |
List with |
Value
Character string with formatted "median (lower-upper)".
Complete Regression Analysis Workflow
Description
Executes a comprehensive regression analysis pipeline that combines univariable screening, automatic/manual variable selection, and multivariable modeling in a single function call. This function is designed to streamline the complete analytical workflow from initial exploration to final adjusted models, with publication-ready formatted output showing both univariable and multivariable results side-by-side if desired.
Usage
fullfit(
data,
outcome,
predictors,
method = "screen",
multi_predictors = NULL,
p_threshold = 0.05,
columns = "both",
model_type = "glm",
family = "binomial",
random = NULL,
conf_level = 0.95,
reference_rows = TRUE,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
metrics = "both",
return_type = "table",
keep_models = FALSE,
exponentiate = NULL,
parallel = TRUE,
n_cores = NULL,
number_format = NULL,
verbose = NULL,
...
)
Arguments
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcome |
Character string specifying the outcome variable name. For
time-to-event analysis, use |
predictors |
Character vector of predictor variable names to analyze.
All predictors are tested in univariable models. The subset included in
the multivariable model depends on the |
method |
Character string specifying the variable selection strategy:
|
multi_predictors |
Character vector of predictors to include in the
multivariable model when |
p_threshold |
Numeric p-value threshold for automatic variable
selection when |
columns |
Character string specifying which result columns to display:
|
model_type |
Character string specifying the regression model type:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% CI). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying decimal places for effect estimates. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names, values are
display labels. Default is |
metrics |
Character specification for which statistics to display:
Can also be a character vector: |
return_type |
Character string specifying what to return:
|
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients. Default
is |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for
parallel processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to model fitting functions (e.g.,
|
Details
Analysis Workflow:
The function implements a complete regression analysis pipeline:
-
Univariable screening: Fits separate models for each predictor (outcome ~ predictor). Each predictor is tested independently to understand crude associations.
-
Variable selection: Based on the
methodparameter:-
"screen": Automatically selects predictors with univariable p\lep_threshold -
"all": Includes all predictors (no selection) -
"custom": Uses predictors specified inmulti_predictors
-
-
Multivariable modeling: Fits a single model with selected predictors (outcome ~ predictor1 + predictor2 + ...). Estimates are adjusted for all other variables in the model.
-
Output formatting: Combines results into publication-ready table with appropriate effect measures and formatting.
Variable Selection Strategies:
"Screen" Method (method = "screen"):
Uses p-value threshold for automatic selection
Liberal thresholds (e.g., 0.20) cast a wide net to avoid missing important predictors
Stricter thresholds (e.g., 0.05) focus on strongly associated predictors
Helps reduce overfitting and multicollinearity
Common in exploratory analyses and when sample size is limited
"All" Method (method = "all"):
No variable selection - includes all predictors
Appropriate when all variables are theoretically important
Risk of overfitting with many predictors relative to sample size
Useful for confirmatory analyses with pre-specified models
"Custom" Method (method = "custom"):
Manual selection based on subject matter knowledge
Runs univariable analysis for all predictors (for comparison)
Includes only specified predictors in multivariable model
Ideal for theory-driven model building
Allows comparison of unadjusted vs adjusted effects for all variables
Interpreting Results:
When columns = "both" (default), tables show:
-
Univariable columns: Crude associations, unadjusted for other variables. Labeled as "OR/HR/RR/Coefficient (95% CI)" and "Uni p"
-
Multivariable columns: Adjusted associations, accounting for all other predictors in the model. Labeled as "aOR/aHR/aRR/Adj. Coefficient (95% CI)" and "Multi p" ("a" = adjusted)
Variables not meeting selection criteria show "-" in multivariable columns
Comparing univariable and multivariable results helps identify:
-
Confounding: Large changes in effect estimates
-
Independent effects: Similar univariable and multivariable estimates
-
Mediation: Attenuated effects in multivariable model
-
Suppression: Effects that emerge only after adjustment
Sample Size Considerations:
Rule of thumb for multivariable models:
-
Logistic regression:
\ge10 events per predictor variable -
Cox regression:
\ge10 events per predictor variable -
Linear regression:
\ge10-20 observations per predictor
Use screening methods to reduce predictor count when these ratios are not met.
Value
Depends on return_type parameter:
When return_type = "table" (default): A data.table with S3 class
"fullfit_result" containing:
- Variable
Character. Predictor name or custom label
- Group
Character. Category level for factors, empty for continuous
- n/n_group
Integer. Sample sizes (if
show_n = TRUE)- events/events_group
Integer. Event counts (if
show_events = TRUE)- OR/HR/RR/Coefficient (95% CI)
Character. Unadjusted effect (if
columnsincludes "uni" andmetricsincludes "effect")- Uni p
Character. Univariable p-value (if
columnsincludes "uni" andmetricsincludes "p")- aOR/aHR/aRR/Adj. Coefficient (95% CI)
Character. Adjusted effect (if
columnsincludes "multi" andmetricsincludes "effect")- Multi p
Character. Multivariable p-value (if
columnsincludes "multi" andmetricsincludes "p")
When return_type = "model": The fitted multivariable model object
(glm, lm, coxph, etc.).
When return_type = "both": A list with two elements:
- table
The formatted results data.table
- model
The fitted multivariable model object
The table includes the following attributes:
- outcome
Character. The outcome variable name
- model_type
Character. The regression model type
- method
Character. The variable selection method used
- columns
Character. Which columns were displayed
- model
The multivariable model object (if fitted)
- uni_results
The complete univariable screening results
- n_multi
Integer. Number of predictors in multivariable model
- screened
Character vector. Names of predictors that passed univariable screening at the specified p-value threshold
- significant
Character vector. Names of variables with p < 0.05 in the multivariable model (or univariable if multivariable was not fitted)
See Also
uniscreen for univariable screening only,
fit for fitting a single multivariable model,
compfit for comparing multiple models,
desctable for descriptive statistics
Other regression functions:
compfit(),
fit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
Examples
# Load example data
data(clintrial)
data(clintrial_labels)
# Example 1: Basic screening with p < 0.05 threshold
result1 <- fullfit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking",
"hypertension", "diabetes",
"treatment", "stage"),
method = "screen",
p_threshold = 0.05,
labels = clintrial_labels
)
print(result1)
# Shows both univariable and multivariable results
# Only significant univariable predictors in multivariable model
# Example 2: Include all predictors (no selection)
result2 <- fullfit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
method = "all",
labels = clintrial_labels
)
print(result2)
# Example 3: Custom variable selection
result3 <- fullfit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "treatment", "stage"),
method = "custom",
multi_predictors = c("age", "treatment", "stage"),
labels = clintrial_labels
)
print(result3)
# Univariable for all, multivariable for selected only
# Example 4: Cox regression with screening
library(survival)
cox_result <- fullfit(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "coxph",
method = "screen",
p_threshold = 0.10,
labels = clintrial_labels
)
print(cox_result)
# Example 5: Linear regression without screening
linear_result <- fullfit(
data = clintrial,
outcome = "bmi",
predictors = c("age", "sex", "smoking", "creatinine"),
model_type = "lm",
method = "all",
labels = clintrial_labels
)
print(linear_result)
# Example 6: Poisson regression for count outcomes
poisson_result <- fullfit(
data = clintrial,
outcome = "fu_count",
predictors = c("age", "stage", "treatment", "surgery"),
model_type = "glm",
family = "poisson",
method = "all",
labels = clintrial_labels
)
print(poisson_result)
# Example 7: Show only multivariable results
multi_only <- fullfit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
method = "all",
columns = "multi",
labels = clintrial_labels
)
print(multi_only)
# Example 8: Return both table and model object
both <- fullfit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
method = "all",
return_type = "both"
)
print(both$table)
summary(both$model)
# Example 9: Keep univariable models for diagnostics
with_models <- fullfit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi", "creatinine"),
keep_models = TRUE
)
uni_results <- attr(with_models, "uni_results")
uni_models <- attr(uni_results, "models")
summary(uni_models[["age"]])
# Example 10: Linear mixed effects with site clustering
if (requireNamespace("lme4", quietly = TRUE)) {
lmer_result <- fullfit(
data = clintrial,
outcome = "los_days",
predictors = c("age", "treatment", "surgery", "stage"),
random = "(1|site)",
model_type = "lmer",
method = "all",
labels = clintrial_labels
)
print(lmer_result)
}
Extract event variable from survival model
Description
Parses the Surv() expression in survival model formulas to extract the event/status variable name. Works with coxph, clogit, and coxme models.
Usage
get_event_variable(model, model_class)
Arguments
model |
Fitted survival model object. |
model_class |
Character string of the model class. |
Value
Character string naming the event variable, or NULL if not found.
Get data from model object (works with S3 and S4)
Description
Retrieves the original data used to fit a model. Checks multiple locations including model attributes, $data slot, $model slot, and @frame for S4.
Usage
get_model_data(model)
Arguments
model |
Fitted model object (S3 or S4). |
Value
Data frame or data.table used to fit the model, or NULL if unavailable.
Get readable model type name
Description
Converts model class names to human-readable descriptions. For GLMs, uses the family to provide specific names (e.g., "Logistic", "Poisson").
Usage
get_model_type_name(model)
Arguments
model |
Fitted model object. |
Value
Character string with readable model type name.
Get factor levels from model (works with S3 and S4)
Description
Extracts factor level information from fitted model objects. Handles both S3 models (glm, lm, coxph) via xlevels slot and S4 models (lme4) via the model frame.
Usage
get_model_xlevels(model)
Arguments
model |
Fitted model object (S3 or S4). |
Value
Named list of factor levels, or NULL if no factors present.
Get paper size for PDF/LaTeX export
Description
Returns paper dimensions and margin settings for the specified paper size.
Usage
get_paper_settings(paper, margins = NULL)
Arguments
paper |
Character string: "letter", "a4", or "auto". |
margins |
Optional numeric vector of margins (length 1 or 4). |
Value
List with latex_paper, width, height, and margins components.
Get display label for statistic type
Description
Converts internal statistic type codes to formatted display labels for table column headers.
Usage
get_stat_label(stat_type)
Arguments
stat_type |
Character string: "mean_sd", "median_iqr", "median_range", "range", "n_miss", or custom type. |
Value
Character string with formatted label (e.g., "Mean \pm SD").
Create Forest Plot for Generalized Linear Models
Description
Generates a publication-ready forest plot that combines a formatted data table with a graphical representation of effect estimates (odds ratios, risk ratios, or coefficients) from a generalized linear model. The plot integrates variable names, group levels, sample sizes, effect estimates with confidence intervals, p-values, and model diagnostics in a single comprehensive visualization designed for manuscripts and presentations.
Usage
glmforest(
x,
data = NULL,
title = "Generalized Linear Model",
effect_label = NULL,
digits = 2,
p_digits = 3,
conf_level = 0.95,
font_size = 1,
annot_size = 3.88,
header_size = 5.82,
title_size = 23.28,
plot_width = NULL,
plot_height = NULL,
table_width = 0.6,
show_n = TRUE,
show_events = TRUE,
indent_groups = FALSE,
condense_table = FALSE,
bold_variables = FALSE,
center_padding = 4,
zebra_stripes = TRUE,
ref_label = "reference",
labels = NULL,
color = NULL,
exponentiate = NULL,
qc_footer = TRUE,
units = "in",
number_format = NULL
)
Arguments
x |
Either a fitted GLM object (class |
data |
Data frame or data.table containing the original data used to
fit the model. If |
title |
Character string specifying the plot title displayed at the top.
Default is |
effect_label |
Character string for the effect measure label on the
forest plot axis. If |
digits |
Integer specifying the number of decimal places for effect estimates and confidence intervals in the data table. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Values > 1 increase all fonts proportionally, values < 1 decrease them. Default is 1.0. Useful for adjusting readability across different output sizes. |
annot_size |
Numeric value controlling the relative font size for
data annotations (variable names, values in table cells). Default is 3.88.
Adjust relative to |
header_size |
Numeric value controlling the relative font size for column headers ("Variable", "Group", "n", etc.). Default is 5.82. Headers are typically larger than annotations for hierarchy. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. The title is typically the largest text element. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of
total plot width allocated to the data table (left side). The forest plot
occupies |
show_n |
Logical. If |
show_events |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying the horizontal spacing (in character units) between the data table and forest plot. Increase for more separation, decrease to fit more content. Default is 4. |
zebra_stripes |
Logical. If |
ref_label |
Character string to display for reference categories of
factor variables. Typically shown in place of effect estimates.
Default is |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names in the model,
values are the labels to display. Example:
|
color |
Character string specifying the color for effect estimate point
markers in the forest plot. Use hex codes or R color names. Default is
Gaussian with log link), and |
exponentiate |
Logical. If |
qc_footer |
Logical. If |
units |
Character string specifying the units for plot dimensions.
Options: |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Details
Plot Components:
The forest plot consists of several integrated components:
-
Title: Centered at top, describes the analysis
-
Data Table (left side): Contains columns for:
Variable: Predictor names (or custom labels)
Group: Factor levels (optional, hidden when indenting)
n: Sample sizes by group (optional)
Events: Event counts by group (optional)
Effect (95% CI); p-value: Formatted estimates with p-values
-
Forest Plot (right side): Graphical display with:
Point estimates (squares sized by sample size)
95% confidence intervals (error bars)
Reference line (at OR/RR = 1 or coefficient = 0)
Log scale for odds/risk ratios
Labeled axis
-
Model Statistics (footer): Summary of:
Observations analyzed (with percentage of total data)
Model family (Binomial, Poisson, etc.)
Deviance statistics
Pseudo-R
^2(McFadden)AIC
Automatic Effect Measure Selection:
When effect_label = NULL and exponentiate = NULL, the function
intelligently selects the appropriate effect measure:
-
Logistic regression (
family = binomial(link = "logit")): Odds Ratios (OR) -
Log-link models (
link = "log"): Risk Ratios (RR) or Rate Ratios -
Other exponential families: exp(coefficient)
-
Identity link: Raw coefficients
Reference Categories:
For factor variables, the first level (determined by factor ordering or alphabetically for character variables) serves as the reference category:
Displayed with the
ref_labelinstead of an estimateNo confidence interval or p-value shown
Visually aligned with other categories
When
condense_table = TRUE, reference-only variables may be omitted entirely
Layout Optimization:
The function automatically optimizes layout based on content:
Calculates appropriate axis ranges to accommodate all confidence intervals
Selects meaningful tick marks on log or linear scales
Sizes point markers proportional to sample size (larger = more data)
Adjusts table width based on variable name lengths when
table_width = NULLRecommends overall dimensions based on number of rows
Visual Grouping Options:
Three display modes are available:
-
Standard (
indent_groups = FALSE,condense_table = FALSE): Separate "Variable" and "Group" columns, all categories shown -
Indented (
indent_groups = TRUE,condense_table = FALSE): Hierarchical display with groups indented under variables -
Condensed (
condense_table = TRUE): Binary variables shown in single rows, automatically indented
Zebra Striping:
When zebra_stripes = TRUE, alternating variables (not individual rows)
receive light gray backgrounds. This helps visually group all levels of a
factor variable together, making the plot easier to read especially with
many multi-level factors.
Model Statistics Display:
The footer shows key diagnostic information:
-
Observations analyzed: Total N and percentage of original data (accounting for missing values)
-
Null/Residual Deviance: Model fit improvement
-
Pseudo-R
^2: McFadden R^2= 1 - (log L_1 / log L_2) -
AIC: For model comparison (lower is better)
For logistic regression, concordance (C-statistic/AUC) may also be displayed if available.
Saving Plots:
Use ggplot2::ggsave() with recommended dimensions:
p <- glmforest(model, data)
dims <- attr(p, "rec_dims")
ggplot2::ggsave("forest.pdf", p, width = dims$width, height = dims$height)
Or specify custom dimensions:
ggplot2::ggsave("forest.png", p, width = 12, height = 8, dpi = 300)
Value
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly:
print(plot)Saved to file:
ggsave("forest.pdf", plot, width = 12, height = 8)Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
- width
Numeric. Recommended plot width in specified units
- height
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
See Also
autoforest for automatic model detection,
coxforest for Cox proportional hazards forest plots,
lmforest for linear model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
glm for fitting GLMs,
fit for regression modeling
Other visualization functions:
autoforest(),
coxforest(),
lmforest(),
multiforest(),
uniforest()
Examples
data(clintrial)
data(clintrial_labels)
# Create example model
model1 <- glm(os_status ~ age + sex + bmi + treatment,
data = clintrial, family = binomial)
# Example 1: Basic logistic regression forest plot
p <- glmforest(model1, data = clintrial)
old_width <- options(width = 180)
# Example 2: With custom variable labels
plot2 <- glmforest(
x = model1,
data = clintrial,
title = "Risk Factors for Mortality",
labels = clintrial_labels
)
# Example 3: Indented layout with formatting options
plot3 <- glmforest(
x = model1,
data = clintrial,
indent_groups = TRUE,
zebra_stripes = TRUE,
color = "#D62728",
labels = clintrial_labels
)
# Example 4: Condensed layout for many binary variables
model4 <- glm(os_status ~ age + sex + smoking + hypertension +
diabetes + surgery,
data = clintrial,
family = binomial)
plot4 <- glmforest(
x = model4,
data = clintrial,
condense_table = TRUE,
labels = clintrial_labels
)
# Binary variables shown in single rows
# Example 5: Poisson regression for count data
model5 <- glm(ae_count ~ age + treatment + diabetes + surgery,
data = clintrial,
family = poisson)
plot5 <- glmforest(
x = model5,
data = clintrial,
title = "Rate Ratios for Adverse Events",
labels = clintrial_labels
)
# Example 6: Save with recommended dimensions
dims <- attr(plot5, "rec_dims")
ggplot2::ggsave(file.path(tempdir(), "forest.pdf"),
plot5, width = dims$width, height = dims$height)
options(old_width)
Identify variable groups before indentation
Description
Detects variable group boundaries by finding rows where Variable column is non-empty. Returns row indices for each group for zebra stripe application.
Usage
identify_variable_groups(df)
Arguments
df |
Data.table with Variable column. |
Value
List of integer vectors, each containing row indices for one variable group.
Check if category name should be suppressed in condensed label
Description
Determines whether a category name should be suppressed when condensing
binary variables. Returns TRUE for standard affirmative values (e.g., "Yes",
"1", "Positive"), standard reference values (e.g., "No", "Absent", "None"),
or when the category name essentially matches the variable label
(case-insensitive comparison).
Usage
is_affirmative_category(
category,
label = NULL,
norm_category = NULL,
norm_label = NULL
)
Arguments
category |
Character string with the category name. |
label |
Optional character string with the variable label. If provided,
returns |
norm_category |
Optional pre-normalized category (lowercase, trimmed). If provided, skips normalization for performance. |
norm_label |
Optional pre-normalized label (lowercase, trimmed). If provided, skips normalization for performance. |
Value
Logical indicating whether category should be suppressed.
Check if object is a multifit result
Description
Internal helper to detect multifit output objects.
Usage
is_multifit_result(x)
Arguments
x |
Object to check. |
Value
Logical indicating if x is a multifit result.
Check if category name is a standard reference/negative value
Description
Determines whether a category name represents a standard reference or negative value that indicates absence. Used to suppress redundant category names when condensing binary variables.
Usage
is_reference_category(
category,
label = NULL,
norm_category = NULL,
norm_label = NULL
)
Arguments
category |
Character string with the category name. |
label |
Optional character string with the variable label. If provided, checks if category is "No [label]" or similar patterns. |
norm_category |
Optional pre-normalized category (lowercase, trimmed). If provided, skips normalization for performance. |
norm_label |
Optional pre-normalized label (lowercase, trimmed). If provided, skips normalization for performance. |
Value
Logical indicating whether category is a reference/negative value.
Check if outcome is a Surv() expression
Description
Tests whether an outcome specification string represents a survival outcome by checking for the Surv() function pattern. Used to route model fitting to Cox proportional hazards methods.
Usage
is_surv_outcome(outcome)
Arguments
outcome |
Character string of the outcome specification. |
Value
Logical TRUE if outcome starts with "Surv(", FALSE otherwise.
Check if object is a uniscreen result
Description
Internal helper to detect uniscreen output objects.
Usage
is_uniscreen_result(x)
Arguments
x |
Object to check. |
Value
Logical indicating if x is a uniscreen result.
Create Forest Plot for Linear Models
Description
Generates a publication-ready forest plot that combines a formatted data table
with a graphical representation of regression coefficients from a linear model.
The plot integrates variable names, group levels, sample sizes, coefficients
with confidence intervals, p-values, and model diagnostics (R^2,
F-statistic, AIC) in a single comprehensive visualization designed for
manuscripts and presentations.
Usage
lmforest(
x,
data = NULL,
title = "Linear Model",
effect_label = "Coefficient",
digits = 2,
p_digits = 3,
conf_level = 0.95,
font_size = 1,
annot_size = 3.88,
header_size = 5.82,
title_size = 23.28,
plot_width = NULL,
plot_height = NULL,
table_width = 0.6,
show_n = TRUE,
indent_groups = FALSE,
condense_table = FALSE,
bold_variables = FALSE,
center_padding = 4,
zebra_stripes = TRUE,
ref_label = "reference",
labels = NULL,
units = "in",
color = "#5A8F5A",
qc_footer = TRUE,
number_format = NULL
)
Arguments
x |
Either a fitted linear model object (class |
data |
Data frame or data.table containing the original data used to
fit the model. If |
title |
Character string specifying the plot title displayed at the top.
Default is |
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
digits |
Integer specifying the number of decimal places for coefficients and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6. |
show_n |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
ref_label |
Character string to display for reference categories of
factor variables. Default is |
labels |
Named character vector providing custom display labels for
variables. Example: |
units |
Character string specifying units for plot dimensions:
|
color |
Character string specifying the color for coefficient point
estimates in the forest plot. Default is |
qc_footer |
Logical. If |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Details
Linear Model-Specific Features:
The linear model forest plot differs from logistic and Cox plots in several ways:
-
Coefficients: Raw regression coefficients shown (not exponentiated)
-
Reference line: At coefficient = 0 (not at 1)
-
Linear scale: Forest plot uses linear scale (not log scale)
-
No events column: Only sample sizes shown (no event counts)
-
R
^2statistics: Model fit assessed by R^2and adjusted R^2 -
F-test: Overall model significance from F-statistic
Plot Components:
-
Title: Centered at top
-
Data Table (left): Contains:
Variable: Predictor names
Group: Factor levels (if applicable)
-
n: Sample sizes by group
Coefficient (95% CI); p-value: Raw coefficients with CIs and p-values
-
Forest Plot (right):
Point estimates (squares sized by sample size)
95% confidence intervals (error bars)
Reference line at coefficient = 0
Linear scale
-
Model Statistics (footer):
Observations analyzed (with percentage of total data)
-
R
^2and adjusted R^2 -
F-statistic with degrees of freedom and p-value
AIC
Interpreting Coefficients:
Linear regression coefficients represent the change in the outcome variable for a one-unit change in the predictor:
-
Continuous predictors: Coefficient = change in Y per unit of X
-
Binary predictors: Coefficient = difference in Y between groups
-
Factor predictors: Coefficients = differences from reference category
-
Sign matters: Positive = increase in Y, Negative = decrease in Y
-
Zero crossing: CI crossing zero suggests no significant effect
Example: If the coefficient for "age" is 0.50 when predicting BMI,
BMI increases by 0.50 kg/m^2 for each additional year of age.
Model Fit Statistics:
The footer displays key diagnostics:
-
R
^2: Proportion of variance explained (0 to 1)0.0-0.3: Weak explanatory power
0.3-0.5: Moderate
0.5-0.7: Good
> 0.7: Strong (rare in social/biological sciences)
-
Adjusted R
^2: R^2penalized for number of predictorsAlways
\leR^2Preferred for model comparison
Accounts for model complexity
-
F-statistic: Tests null hypothesis that all coefficients = 0
Degrees of freedom: df1 = # predictors, df2 = # observations - # predictors - 1
Significant p-value indicates model explains variance better than intercept-only
-
AIC: For model comparison (lower is better)
Assumptions:
Linear regression assumes:
Linearity of relationships
Independence of observations
Homoscedasticity (constant variance)
Normality of residuals
No multicollinearity
Check assumptions using:
-
plot(model)for diagnostic plots -
car::vif(model)for multicollinearity -
lmtest::bptest(model)for heteroscedasticity -
shapiro.test(residuals(model))for normality
Reference Categories:
For factor variables:
First level is the reference (coefficient = 0)
Other levels show difference from reference
Reference displayed with
ref_labelRelevel factors before modeling if needed:
factor(x, levels = c("desired_ref", ...))
Sample Size Reporting:
The "n" column shows:
For continuous variables: Total observations with non-missing data
For factor variables: Number of observations in each category
Footer shows total observations analyzed and percentage of original data (accounting for missing values)
Value
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly:
print(plot)Saved to file:
ggsave("forest.pdf", plot, width = 12, height = 8)Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
- width
Numeric. Recommended plot width in specified units
- height
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
See Also
autoforest for automatic model detection,
glmforest for logistic/GLM forest plots,
coxforest for Cox model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
lm for fitting linear models,
fit for regression modeling
Other visualization functions:
autoforest(),
coxforest(),
glmforest(),
multiforest(),
uniforest()
Examples
data(clintrial)
data(clintrial_labels)
# Create example model
model1 <- lm(bmi ~ age + sex + smoking, data = clintrial)
# Example 1: Basic linear model forest plot
p <- lmforest(model1, data = clintrial)
old_width <- options(width = 180)
# Example 2: With custom labels and title
plot2 <- lmforest(
x = model1,
data = clintrial,
title = "Predictors of Body Mass Index",
effect_label = "Change in BMI (kg/m^2)",
labels = clintrial_labels
)
# Example 3: Comprehensive model with indented layout
model3 <- lm(
bmi ~ age + sex + smoking + hypertension + diabetes + creatinine,
data = clintrial
)
plot3 <- lmforest(
x = model3,
data = clintrial,
labels = clintrial_labels,
indent_groups = TRUE,
zebra_stripes = TRUE
)
# Example 4: Condensed layout
plot4 <- lmforest(
x = model3,
data = clintrial,
condense_table = TRUE,
labels = clintrial_labels
)
# Example 5: Different outcome (hemoglobin)
model5 <- lm(
hemoglobin ~ age + sex + bmi + smoking + creatinine,
data = clintrial
)
plot5 <- lmforest(
x = model5,
data = clintrial,
title = "Predictors of Baseline Hemoglobin",
effect_label = "Change in Hemoglobin (g/dL)",
labels = clintrial_labels
)
# Example 6: Save with recommended dimensions
dims <- attr(plot5, "rec_dims")
ggplot2::ggsave(file.path(tempdir(), "linear_forest.pdf"),
plot5, width = dims$width, height = dims$height)
options(old_width)
Convert Model to Data Table
Description
Extracts coefficients, confidence intervals, and comprehensive model statistics from fitted regression models and converts them to a standardized data.table format suitable for further analysis or publication. This is a core utility function frequently used internally by other summata regression functions, although it can be used as a standalone function as well.
Usage
m2dt(
data,
model,
conf_level = 0.95,
keep_qc_stats = TRUE,
include_intercept = TRUE,
terms_to_exclude = NULL,
reference_rows = TRUE,
reference_label = "reference",
skip_counts = FALSE
)
Arguments
data |
Data frame or data.table containing the dataset used to fit the model. Required for computing group-level sample sizes and event counts. |
model |
Fitted model object. Supported classes include:
|
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% CI). |
keep_qc_stats |
Logical. If |
include_intercept |
Logical. If |
terms_to_exclude |
Character vector of term names to exclude from output.
Useful for removing specific unwanted parameters (e.g., nuisance variables,
spline terms). Default is |
reference_rows |
Logical. If |
reference_label |
Character string used to label reference category rows
in the output. Appears in the |
skip_counts |
Logical. If |
Details
This function is the core extraction utility used by fit() and other
regression functions. It handles the complexities of different model classes
and provides a consistent output format suitable for tables and forest plots.
Model Type Detection: The function automatically detects model type and applies appropriate:
Effect measure naming (OR, HR, RR, Coefficient)
Confidence interval calculation method
Event counting for binary/survival outcomes
Mixed Effects Models: For lme4 models (glmer, lmer), the function extracts fixed effects only. Random effects variance components are not included in the output table, as they represent clustering structure rather than predictor effects.
Value
A data.table containing extracted model information with the
following standard columns:
- model_scope
Character. Either "Univariable" (unadjusted model with single predictor) or "Multivariable" (adjusted model with multiple predictors)
- model_type
Character. Type of regression (e.g., "Logistic", "Linear", "Cox PH", "Poisson", etc.)
- variable
Character. Variable name (for factor variables, the base variable name without the level)
- group
Character. Group/level name for factor variables; empty string for continuous variables
- n
Integer. Total sample size used in the model
- n_group
Integer. Sample size for this specific variable level (factor variables only)
- events
Integer. Total number of events in the model (for survival and logistic models)
- events_group
Integer. Number of events for this specific variable level (for survival and logistic models with factor variables)
- coefficient
Numeric. Raw regression coefficient (log odds, log hazard, etc.)
- se
Numeric. Standard error of the coefficient
- OR/HR/RR/Coefficient
Numeric. Effect estimate - column name depends on model type:
-
ORfor logistic regression (odds ratio) -
HRfor Cox models (hazard ratio) -
RRfor Poisson regression (rate/risk ratio) -
Coefficientfor linear models or other GLMs
-
- ci_lower
Numeric. Lower bound of confidence interval for effect estimate
- ci_upper
Numeric. Upper bound of confidence interval for effect estimate
- statistic
Numeric. Test statistic (z-value for GLM/Cox, t-value for LM)
- p_value
Numeric. p-value for coefficient test
- sig
Character. Significance markers:
***(p < 0.001),**(p < 0.01),*(p < 0.05),.(p < 0.10).- sig_binary
Logical. Binary indicator:
TRUEif p < 0.05,FALSEotherwise- reference
Character. Contains
reference_labelfor reference category rows whenreference_rows = TRUE, empty string otherwise
See Also
fit for the main regression interface,
glmforest, coxforest, lmforest for
forest plot visualization
Examples
# Load example data
data(clintrial)
# Example 1: Extract from logistic regression
glm_model <- glm(os_status ~ age + sex + treatment,
data = clintrial, family = binomial)
glm_result <- m2dt(clintrial, glm_model)
glm_result
# Example 2: Extract from linear model
lm_model <- lm(los_days ~ age + sex + surgery, data = clintrial)
lm_result <- m2dt(clintrial, lm_model)
lm_result
# Example 3: Cox proportional hazards model
library(survival)
cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + stage,
data = clintrial)
cox_result <- m2dt(clintrial, cox_model)
cox_result
# Example 4: Exclude intercept for cleaner tables
clean_result <- m2dt(clintrial, glm_model, include_intercept = FALSE)
clean_result
# Example 5: Change confidence level
result_90ci <- m2dt(clintrial, glm_model, conf_level = 0.90)
result_90ci
Multivariate Regression Analysis
Description
Performs regression analyses of a single predictor (exposure) across multiple outcomes. This function is designed for studies where a single exposure variable is tested against multiple endpoints, such as complication screening, biomarker associations, or phenome-wide association studies. Returns publication-ready formatted results with optional covariate adjustment. Supports interactions, mixed-effects models, stratification, and clustered standard errors.
Usage
multifit(
data,
outcomes,
predictor,
covariates = NULL,
interactions = NULL,
random = NULL,
strata = NULL,
cluster = NULL,
model_type = "glm",
family = "binomial",
columns = "adjusted",
p_threshold = 1,
conf_level = 0.95,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
predictor_label = NULL,
include_predictor = TRUE,
keep_models = FALSE,
exponentiate = NULL,
parallel = TRUE,
n_cores = NULL,
number_format = NULL,
verbose = NULL,
...
)
Arguments
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcomes |
Character vector of outcome variable names to analyze. Each
outcome is tested in its own model with the predictor. For time-to-event
analysis, use |
predictor |
Character string specifying the predictor (exposure) variable name. This variable is tested against each outcome. Can be continuous or categorical (factor). |
covariates |
Optional character vector of covariate variable names to
include in adjusted models. When specified, models are fit as
|
interactions |
Optional character vector of interaction terms to include
in adjusted models, using colon notation (e.g., |
random |
Optional character string specifying random effects formula for
mixed effects models (e.g., |
strata |
Optional character string naming the stratification variable for
Cox or conditional logistic models. Creates separate baseline hazards for
each stratum. Default is |
cluster |
Optional character string naming the clustering variable for
Cox models. Computes robust clustered standard errors. Default is |
model_type |
Character string specifying the type of regression model to fit. Options include:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
columns |
Character string specifying which result columns to display when
both unadjusted and adjusted models are fit (i.e., when
Ignored when |
p_threshold |
Numeric value between 0 and 1 specifying a p-value threshold for filtering results. Only outcomes with p-value less than or equal to the threshold are included in the output. Default is 1 (no filtering; all outcomes returned). |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Can include labels for outcomes, predictors, and
covariates. Names should match variable names, values are the display labels.
Labels are applied to: (1) outcome names in the Outcome column, (2) predictor
variable name when displayed, and (3) variable names in formatted interaction
terms. Variables not in |
predictor_label |
Optional character string providing a custom display
label for the predictor variable. Takes precedence over |
include_predictor |
Logical. If |
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients (display
OR/HR/RR instead of log odds/log hazards). Default is |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for
parallel processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting functions. |
Details
Analysis Approach:
The function implements a multivariate (multi-outcome) screening workflow that inverts the typical regression paradigm:
For each outcome in
outcomes, fits a separate model with the predictor as the main exposureIf
covariatesspecified, fits adjusted model:outcome ~ predictor + covariates + interactionsExtracts only the predictor effect(s) from each model, ignoring covariate coefficients
Combines results into a single table for comparison across outcomes
Optionally filters by p-value threshold
This is conceptually opposite to uniscreen(), which tests multiple
predictors against a single outcome. Use multifit() when you have one
exposure of interest and want to screen across multiple endpoints.
When to Use Multivariate Regression Analysis:
-
Complication screening: Test one exposure (e.g., operative time, BMI, biomarker level) against multiple postoperative complications
-
Treatment effects: Test one treatment against multiple efficacy and safety endpoints simultaneously
-
Biomarker studies: Test one biomarker against multiple clinical outcomes to understand its prognostic value
-
Phenome-wide association studies (PheWAS): Test genetic variants or exposures against many phenotypes
-
Risk factor profiling: Understand how one risk factor relates to a spectrum of outcomes
Handling Categorical Predictors:
When the predictor is a factor variable with multiple levels:
Each non-reference level gets its own row for each outcome
Reference category is determined by factor level ordering
The Predictor column shows "Variable (Level)" format (e.g., "Treatment (Drug A)", "Treatment (Drug B)")
For binary variables with affirmative non-reference levels (Yes, 1, True, Present, Positive, +), shows just "Variable" (e.g., "Diabetes" instead of "Diabetes (Yes)")
Effect estimates compare each level to the reference
Adjusted vs. Unadjusted Results:
When covariates is specified, the function fits both models but only
extracts predictor effects:
-
columns = "adjusted": Reports only covariate-adjusted effects. Column labeled "aOR/aHR," etc. -
columns = "unadjusted": Reports only crude effects. Column labeled "OR/HR," etc. -
columns = "both": Reports both side-by-side. Useful for identifying confounding (large change in effect) or independent effects (similar estimates)
Interaction Terms:
When interactions includes terms involving the predictor:
Main effect of predictor is always reported
Interaction effects are extracted and displayed with formatted names
Format:
Variable (Level) × Variable (Level)using multiplication sign notationUseful for testing effect modification (e.g., does treatment effect differ by sex?)
Mixed-Effects Models:
For clustered or hierarchical data (e.g., patients within hospitals):
Use
model_type = "glmer"withrandom = "(1|cluster)"for random intercept modelsNested random effects:
random = "(1|site/patient)"Crossed random effects:
random = "(1|site) + (1|doctor)"For survival outcomes, use
model_type = "coxme"
Stratification and Clustering (Cox models):
For Cox proportional hazards models:
-
strata: Creates separate baseline hazards for each stratum level. Use when hazards are non-proportional across strata but stratum effects do not need to be estimated -
cluster: Computes robust (sandwich) standard errors accounting for within-cluster correlation. Alternative to mixed effects when only robust SEs are needed
Filtering based on p-value:
The p_threshold parameter filters results after fitting all models:
Only outcomes with p less than or equal to the threshold are retained in output
For factor predictors, outcome is kept if any level is significant
Useful for focusing on significant associations in exploratory analyses
Default is 1 (no filtering) - recommended for confirmatory analyses
Outcome Homogeneity:
All outcomes in a single multifit() call should be of the same type
(all binary, all continuous, or all survival). Mixing outcome types produces
tables with incompatible effect measures (e.g., odds ratios alongside regression
coefficients), which can mislead readers. The function validates outcome
compatibility and issues a warning when mixed types are detected.
For analyses involving multiple outcome types, run separate multifit()
calls for each type:
# Binary outcomes
binary_results <- multifit(data, outcomes = c("death", "readmission"),
predictor = "treatment", model_type = "glm")
# Continuous outcomes
continuous_results <- multifit(data, outcomes = c("los_days", "cost"),
predictor = "treatment", model_type = "lm")
Effect Measures by Model Type:
-
Logistic (
model_type = "glm",family = "binomial"): Odds ratios (OR/aOR) -
Cox (
model_type = "coxph"): Hazard ratios (HR/aHR) -
Poisson (
model_type = "glm",family = "poisson"): Rate ratios (RR/aRR) -
Linear (
model_type = "lm"): Coefficient estimates -
Mixed effects: Same as fixed-effects counterparts
Memory and Performance:
-
parallel = TRUE(default) uses multiple cores for faster fitting -
keep_models = FALSE(default) discards model objects to save memory For many outcomes, parallel processing provides substantial speedup
Set
keep_models = TRUEonly when you need model diagnostics
Value
A data.table with S3 class "multifit_result" containing formatted
multivariate regression results. The table structure includes:
- Outcome
Character. Outcome variable name or custom label
- Predictor
Character. For factor predictors: formatted as "Variable (Level)" showing the level being compared to reference. For binary variables where the non-reference level is an affirmative value (Yes, 1, True, Present, Positive, +), shows just "Variable". For continuous predictors: the variable name. For interactions: the formatted interaction term (e.g., "Treatment (Drug A) × Sex (Male)")
- n
Integer. Sample size used in the model (if
show_n = TRUE)- Events
Integer. Number of events (if
show_events = TRUE)- OR/HR/RR/Coefficient (95% CI)
Character. Unadjusted effect estimate with CI (if
columns = "unadjusted"or"both")- aOR/aHR/aRR/Adj. Coefficient (95% CI)
Character. Adjusted effect estimate with CI (if
columns = "adjusted"or"both")- Uni p / Multi p / p-value
Character. Formatted p-value(s). Column names depend on
columnssetting
The returned object includes the following attributes accessible via attr():
- raw_data
data.table. Unformatted numeric results with separate columns for effect estimates, standard errors, confidence intervals, and p-values. Suitable for custom analysis or visualization
- models
list (if
keep_models = TRUE). Named list of fitted model objects, with outcome names as list names. Each element contains$unadjustedand/or$adjustedmodels depending on settings- predictor
Character. The predictor variable name
- outcomes
Character vector. The outcome variable names
- covariates
Character vector or
NULL. The covariate variable names- interactions
Character vector or
NULL. The interaction terms- random
Character or
NULL. The random effects formula- strata
Character or
NULL. The stratification variable- cluster
Character or
NULL. The clustering variable- model_type
Character. The regression model type used
- columns
Character. Which columns were displayed
- analysis_type
Character.
"multi_outcome"to identify analysis type- significant
Character vector. Names of outcomes with p < 0.05 for the predictor (uses adjusted p-values when available)
See Also
uniscreen for screening multiple predictors against one outcome,
multiforest for creating forest plots from multifit results,
fit for single-outcome regression with full coefficient output,
fullfit for complete univariable-to-multivariable workflow
Other regression functions:
compfit(),
fit(),
fullfit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
Examples
# Load example data
data(clintrial)
data(clintrial_labels)
# Example 1: Basic multivariate analysis (unadjusted)
# Test treatment effect on multiple binary outcomes
result1 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
labels = clintrial_labels,
parallel = FALSE
)
print(result1)
# Shows odds ratios comparing Drug A and Drug B to Control
# Example 2: Adjusted analysis with covariates
# Adjust for age, sex, and disease stage
result2 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
labels = clintrial_labels,
parallel = FALSE
)
print(result2)
# Shows adjusted odds ratios (aOR)
# Example 3: Compare unadjusted and adjusted results
result3 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
columns = "both",
labels = clintrial_labels,
parallel = FALSE
)
print(result3)
# Useful for identifying confounding effects
# Example 4: Continuous predictor across outcomes
# Test age effect on multiple outcomes
result4 <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "age",
covariates = c("sex", "treatment", "stage"),
labels = clintrial_labels,
parallel = FALSE
)
print(result4)
# One row per outcome for continuous predictor
# Example 5: Cox regression for survival outcomes
library(survival)
cox_result <- multifit(
data = clintrial,
outcomes = c("Surv(pfs_months, pfs_status)",
"Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_result)
# Returns hazard ratios (HR/aHR)
# Example 6: Cox with stratification by site
cox_strat <- multifit(
data = clintrial,
outcomes = c("Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex"),
strata = "site",
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_strat)
# Example 7: Cox with clustered standard errors
cox_cluster <- multifit(
data = clintrial,
outcomes = c("Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
cluster = "site",
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_cluster)
# Example 8: Interaction between predictor and covariate
# Test if treatment effect differs by sex
result_int <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
interactions = c("treatment:sex"),
labels = clintrial_labels,
parallel = FALSE
)
print(result_int)
# Shows main effects and interaction terms with × notation
# Example 9: Linear model for continuous outcomes
linear_result <- multifit(
data = clintrial,
outcomes = c("los_days", "biomarker_x"),
predictor = "treatment",
covariates = c("age", "sex"),
model_type = "lm",
labels = clintrial_labels,
parallel = FALSE
)
print(linear_result)
# Returns coefficient estimates, not ratios
# Example 10: Poisson regression for equidispersed count outcomes
# fu_count has variance ~= mean, appropriate for standard Poisson
poisson_result <- multifit(
data = clintrial,
outcomes = c("fu_count"),
predictor = "treatment",
covariates = c("age", "stage"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels,
parallel = FALSE
)
print(poisson_result)
# Returns rate ratios (RR)
# For overdispersed counts (ae_count), use model_type = "negbin" instead
# Example 11: Filter to significant results only
sig_results <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "stage",
p_threshold = 0.05,
labels = clintrial_labels,
parallel = FALSE
)
print(sig_results)
# Only outcomes with significant associations shown
# Example 12: Custom outcome labels
result_labeled <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
labels = c(
surgery = "Surgical Resection",
pfs_status = "Disease Progression",
os_status = "Death",
treatment = "Treatment Group"
),
parallel = FALSE
)
print(result_labeled)
# Example 13: Keep models for diagnostics
result_models <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "treatment",
covariates = c("age", "sex"),
keep_models = TRUE,
parallel = FALSE
)
# Access stored models
models <- attr(result_models, "models")
names(models)
# Get adjusted model for surgery outcome
surgery_model <- models$surgery$adjusted
summary(surgery_model)
# Example 14: Access raw numeric data
result <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "age",
parallel = FALSE
)
# Get unformatted results for custom analysis
raw_data <- attr(result, "raw_data")
print(raw_data)
# Contains exp_coef, ci_lower, ci_upper, p_value, \emph{etc.}
# Example 15: Hide sample size and event columns
result_minimal <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "treatment",
show_n = FALSE,
show_events = FALSE,
parallel = FALSE
)
print(result_minimal)
# Example 16: Customize decimal places
result_digits <- multifit(
data = clintrial,
outcomes = c("surgery", "os_status"),
predictor = "age",
digits = 3,
p_digits = 4,
parallel = FALSE
)
print(result_digits)
# Example 17: Force coefficient display (no exponentiation)
result_coef <- multifit(
data = clintrial,
outcomes = c("surgery"),
predictor = "age",
exponentiate = FALSE,
parallel = FALSE
)
print(result_coef)
# Example 18: Complete publication workflow
final_table <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage", "grade"),
columns = "both",
labels = clintrial_labels,
digits = 2,
p_digits = 3,
parallel = FALSE
)
print(final_table)
# Example 19: Gamma regression for positive continuous outcomes
gamma_result <- multifit(
data = clintrial,
outcomes = c("los_days", "recovery_days"),
predictor = "treatment",
covariates = c("age", "surgery"),
model_type = "glm",
family = Gamma(link = "log"),
labels = clintrial_labels,
parallel = FALSE
)
print(gamma_result)
# Returns multiplicative effects on positive continuous data
# Example 20: Quasipoisson for overdispersed counts
quasi_result <- multifit(
data = clintrial,
outcomes = c("ae_count"),
predictor = "treatment",
covariates = c("age", "diabetes"),
model_type = "glm",
family = "quasipoisson",
labels = clintrial_labels,
parallel = FALSE
)
print(quasi_result)
# Adjusts standard errors for overdispersion
# Example 21: Generalized linear mixed effects (GLMER)
# Test treatment across outcomes with site clustering
if (requireNamespace("lme4", quietly = TRUE)) {
glmer_result <- suppressWarnings(multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex"),
random = "(1|site)",
model_type = "glmer",
family = "binomial",
labels = clintrial_labels,
parallel = FALSE
))
print(glmer_result)
}
# Example 22: Cox mixed effects with random site effects
if (requireNamespace("coxme", quietly = TRUE)) {
coxme_result <- multifit(
data = clintrial,
outcomes = c("Surv(pfs_months, pfs_status)",
"Surv(os_months, os_status)"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
random = "(1|site)",
model_type = "coxme",
labels = clintrial_labels,
parallel = FALSE
)
print(coxme_result)
}
# Example 23: Multiple interactions across outcomes
multi_int <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
interactions = c("treatment:stage", "treatment:sex"),
labels = clintrial_labels,
parallel = FALSE
)
print(multi_int)
# Shows how treatment effects vary by stage and sex across outcomes
Create Forest Plot for Multivariate Regression
Description
Generates a publication-ready forest plot from a multifit() output
object. The plot displays effect estimates (OR, HR, RR, or coefficients) with
confidence intervals across multiple outcomes, organized by outcome with the
predictor levels shown for each.
Usage
multiforest(
x,
title = "Multivariate Analysis",
effect_label = NULL,
column = "adjusted",
digits = 2,
p_digits = 3,
conf_level = 0.95,
font_size = 1,
annot_size = 3.88,
header_size = 5.82,
title_size = 23.28,
plot_width = NULL,
plot_height = NULL,
table_width = 0.6,
show_n = TRUE,
show_events = NULL,
show_predictor = NULL,
covariates_footer = TRUE,
indent_predictor = FALSE,
bold_variables = TRUE,
center_padding = 4,
zebra_stripes = TRUE,
color = NULL,
null_line = NULL,
log_scale = NULL,
labels = NULL,
units = "in",
number_format = NULL
)
Arguments
x |
Multifit result object (data.table with class attributes from
|
title |
Character string specifying the plot title. Default is
|
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
column |
Character string specifying which results to plot when
|
digits |
Integer specifying the number of decimal places for effect estimates and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6 (60% table, 40% forest plot). |
show_n |
Logical. If |
show_events |
Logical. If |
show_predictor |
Logical. If |
covariates_footer |
Logical. If |
indent_predictor |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
color |
Character string specifying the color for point estimates in
the forest plot. Default is |
null_line |
Numeric value for the reference line position. Default is
|
log_scale |
Logical. If |
labels |
Named character vector providing custom display labels for
outcomes and variables. Applied to outcome names in the plot.
Default is |
units |
Character string specifying units for plot dimensions:
|
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Details
Plot Layout:
The forest plot is organized with outcomes as grouping headers and predictor levels (or interaction terms) as rows within each outcome. This provides a clear visual comparison of how a single predictor affects multiple outcomes.
-
Title: Centered at top
-
Data Table (left): Contains:
Outcome column (or grouped headers)
Predictor/Group column
n: Sample sizes (optional)
Events: Event counts (optional, for applicable models)
Effect (95% CI); p-value
-
Forest Plot (right):
Point estimates (squares)
95% confidence intervals
Reference line at null value (1 or 0)
Log scale for ratio measures
Data Source:
The function extracts effect estimates directly from the multifit output
object's raw_data attribute, which contains the numeric values
needed for plotting. This approach is efficient and ensures consistency
with the formatted table output.
Value
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly:
print(plot)Saved to file:
ggsave("forest.pdf", plot, width = 12, height = 8)Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
- width
Numeric. Recommended plot width in specified units
- height
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
See Also
autoforest for automatic model detection,
multifit for multi-outcome regression analysis,
glmforest for single GLM forest plots,
coxforest for single Cox model forest plots,
lmforest for single linear model forest plots,
uniforest for univariable screening forest plots
Other visualization functions:
autoforest(),
coxforest(),
glmforest(),
lmforest(),
uniforest()
Examples
data(clintrial)
data(clintrial_labels)
library(survival)
# Create example multifit result
result <- multifit(
data = clintrial,
outcomes = c("surgery", "pfs_status", "os_status"),
predictor = "treatment",
covariates = c("age", "sex", "stage"),
parallel = FALSE
)
# Example 1: Basic multivariate forest plot
p <- multiforest(result)
old_width <- options(width = 180)
# Example 2: With custom title and labels
plot2 <- multiforest(
result,
title = "Treatment Effects Across Clinical Outcomes",
labels = clintrial_labels
)
# Example 3: Customize appearance
plot3 <- multiforest(
result,
color = "#E74C3C",
zebra_stripes = TRUE,
labels = clintrial_labels
)
# Example 4: Save with recommended dimensions
dims <- attr(plot3, "rec_dims")
ggplot2::ggsave(file.path(tempdir(), "multioutcome_forest.pdf"),
plot3, width = dims$width, height = dims$height)
options(old_width)
Normalize model type names
Description
Converts model class names to standardized model type strings.
Usage
normalize_model_type(model_type)
Arguments
model_type |
Character string of model type or class name. |
Value
Normalized character string (e.g., "lmerMod" becomes "lmer").
Number Formatting Utilities
Description
Internal utilities for locale-aware number formatting across all summata output functions. Supports preset locales (US, European, SI/ISO, plain) and fully custom separator definitions.
Global Option
The default number format can be set once per session:
options(summata.number_format = "eu")
This avoids passing number_format to every function call.
Order comparison columns based on model type
Description
Reorders columns in the comparison table to follow a logical sequence appropriate for the model type.
Usage
order_comparison_columns(comparison, model_type)
Arguments
comparison |
Data.table with comparison metrics. |
model_type |
Character string indicating model type. |
Value
Data.table with reordered columns.
Parse term into variable and group
Description
Splits coefficient term names into base variable names and factor levels. For example, "sexMale" becomes variable="sex" and group="Male". Handles interaction terms and continuous variables appropriately.
Usage
parse_term(terms, xlevels = NULL, model = NULL)
Arguments
terms |
Character vector of coefficient term names. |
xlevels |
Named list of factor levels from the model. |
model |
Optional model object for extracting factor info from coxme models. |
Value
Data.table with 'variable' and 'group' columns.
Perform statistical tests for categorical variables
Description
Conducts chi-square or Fisher's exact tests for categorical variables across groups. Automatically selects Fisher's exact test for small expected frequencies.
Usage
perform_categorical_test(tab, test_type)
Arguments
tab |
Contingency table (matrix or table object). |
test_type |
Character string: "chisq" for chi-square, "fisher" for Fisher's exact, or "auto" for automatic selection. |
Value
Numeric p-value from the hypothesis test.
Perform statistical tests for continuous variables
Description
Conducts hypothesis tests comparing continuous variables across groups. Supports t-tests, Wilcoxon tests, ANOVA, and Kruskal-Wallis tests with automatic selection based on number of groups.
Usage
perform_continuous_test(var_vec, grp_vec, test_type, stat_type)
Arguments
var_vec |
Numeric vector of the continuous variable. |
grp_vec |
Factor or character vector defining groups. |
test_type |
Character string: "parametric", "nonparametric", or "auto". |
stat_type |
Character string indicating primary statistic being tested. |
Value
Numeric p-value from the hypothesis test.
Perform survival comparison test
Description
Performs statistical test comparing survival curves across groups.
Usage
perform_survival_test(surv_obj, group_var, test_type)
Arguments
surv_obj |
Survival object created by Surv(). |
group_var |
Vector of group assignments. |
test_type |
Character string specifying test type. |
Value
List with test statistic, p-value, and test type.
Print method showing scoring methodology
Description
Print method showing scoring methodology
Usage
## S3 method for class 'compfit_result'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
See Also
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
Print method for fit results
Description
Displays a summary header with model scope (Univariable/Multivariable), model type, formula, sample size, and event count before printing the formatted results table.
Usage
## S3 method for class 'fit_result'
print(x, ...)
Arguments
x |
fit_result object. |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
See Also
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
Print method for fullfit results
Description
Displays a summary header with outcome, model type, method, and number of multivariable predictors before printing the results table.
Usage
## S3 method for class 'fullfit_result'
print(x, ...)
Arguments
x |
fullfit_result object. |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
See Also
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
Print method for multifit results
Description
Print method for multifit results
Usage
## S3 method for class 'multifit_result'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
See Also
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.uniscreen_result(),
uniscreen()
Print method for survtable
Description
Print method for survtable
Usage
## S3 method for class 'survtable'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
See Also
Other descriptive functions:
desctable(),
survtable()
Print method for table2docx results
Description
Print method for table2docx results
Usage
## S3 method for class 'table2docx_result'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
Print method for table2pptx results
Description
Print method for table2pptx results
Usage
## S3 method for class 'table2pptx_result'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
Print method for table2rtf results
Description
Print method for table2rtf results
Usage
## S3 method for class 'table2rtf_result'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
Print method for uniscreen results
Description
Print method for uniscreen results
Usage
## S3 method for class 'uniscreen_result'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional arguments passed to print methods. |
Value
Invisibly returns the input object x. Called for its
side effect of printing a formatted summary to the console.
See Also
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
uniscreen()
Process categorical variable
Description
Calculates frequency and percentage statistics for categorical variables, with optional grouping and chi-square/Fisher's exact testing. Handles factor levels, missing values, and custom labeling.
Usage
process_categorical(
data,
var,
var_label,
group_var,
stats,
na_include,
na_label,
test,
test_type,
total,
total_label,
na_percent,
marks = NULL,
...
)
Arguments
data |
Data.table containing the variable. |
var |
Character string naming the variable to process. |
var_label |
Character string label for display. |
group_var |
Optional character string naming the grouping variable. |
stats |
Character vector of statistics to calculate. |
na_include |
Logical whether to include missing values as a category. |
na_label |
Character string label for missing values. |
test |
Logical whether to perform statistical tests. |
test_type |
Character string specifying test type. |
total |
Logical or character controlling total column display. |
total_label |
Character string label for total column. |
na_percent |
Logical whether to include NA in percentage denominators. |
... |
Additional arguments passed to test functions. |
Value
List with 'formatted' and 'raw' data.table components.
Process continuous variable
Description
Calculates descriptive statistics for continuous numeric variables, with
optional grouping and statistical testing. Supports multiple summary
statistics (mean \pm SD, median [IQR], range) and various hypothesis tests.
Usage
process_continuous(
data,
var,
var_label,
group_var,
stats,
digits,
na_include,
na_label,
test,
test_type,
total,
total_label,
p_per_stat = FALSE,
marks = NULL,
...
)
Arguments
data |
Data.table containing the variable. |
var |
Character string naming the variable to process. |
var_label |
Character string label for display. |
group_var |
Optional character string naming the grouping variable. |
stats |
Character vector of statistics to calculate. |
digits |
Integer number of decimal places. |
na_include |
Logical whether to include missing values. |
na_label |
Character string label for missing values. |
test |
Logical whether to perform statistical tests. |
test_type |
Character string specifying test type. |
total |
Logical or character controlling total column display. |
total_label |
Character string label for total column. |
p_per_stat |
Logical. If TRUE, calculate separate p-values for each
statistic type (e.g., t-test for means, Wilcoxon for medians). If |
... |
Additional arguments passed to test functions. |
Value
List with 'formatted' and 'raw' data.table components.
Process a single survival outcome
Description
Process a single survival outcome
Usage
process_single_outcome(
data,
outcome,
outcome_label,
by,
times,
probs,
stats,
type,
conf_level,
conf_type,
digits,
time_digits,
percent,
test,
test_type,
total,
total_label,
time_unit,
time_label,
median_label,
labels,
na_rm,
marks = NULL,
...
)
Process survival variable
Description
Calculates survival statistics including median survival times with confidence intervals, with optional grouping and log-rank testing. Parses Surv() expressions and uses survival package functions.
Usage
process_survival(
data,
var,
var_label,
group_var,
digits,
conf_level = 0.95,
na_include,
na_label,
test,
total,
total_label,
marks = NULL,
...
)
Arguments
data |
Data.table containing the survival variables. |
var |
Character string with Surv() expression (e.g., "Surv(time, status)"). |
var_label |
Character string label for display. |
group_var |
Optional character string naming the grouping variable. |
digits |
Integer number of decimal places. |
conf_level |
Numeric confidence level for confidence intervals. |
na_include |
Logical whether to include missing values. |
na_label |
Character string label for missing values. |
test |
Logical whether to perform log-rank test. |
total |
Logical or character controlling total column display. |
total_label |
Character string label for total column. |
... |
Additional arguments (currently unused). |
Value
List with 'formatted' and 'raw' data.table components.
Process survival probability quantiles (optimized)
Description
Extracts survival time quantiles from survfit objects. Uses vectorized operations for efficiency.
Usage
process_survival_probs(
survfit_objects,
probs,
groups,
group_labels,
time_digits,
total,
total_label,
median_label,
by,
data,
conf_level = 0.95,
marks = NULL
)
Arguments
survfit_objects |
List of survfit objects. |
probs |
Numeric vector of probabilities. |
groups |
Character vector of group names. |
group_labels |
Character vector of group display labels. |
time_digits |
Integer decimal places for time values. |
total |
Logical or character controlling total column. |
total_label |
Character label for total column. |
median_label |
Character label for median row. |
by |
Character name of stratifying variable. |
data |
Data.table with the source data. |
Value
List with formatted and raw data.tables.
Process survival at specified time points (optimized)
Description
Extracts survival probabilities at specified time points from survfit objects. Uses vectorized operations for efficiency.
Usage
process_survival_times(
survfit_objects,
times,
groups,
group_labels,
stats,
type,
digits,
percent,
total,
total_label,
time_label,
time_unit,
by,
data,
marks = NULL
)
Arguments
survfit_objects |
List of survfit objects. |
times |
Numeric vector of time points. |
groups |
Character vector of group names. |
group_labels |
Character vector of group display labels. |
stats |
Character vector of statistics to include. |
type |
Character string specifying probability type. |
digits |
Integer decimal places for percentages. |
percent |
Logical whether to display as percentages. |
total |
Logical or character controlling total column. |
total_label |
Character label for total column. |
time_label |
Character template for time column headers. |
time_unit |
Character time unit for column headers. |
by |
Character name of stratifying variable. |
data |
Data.table with the source data. |
Value
List with formatted and raw data.tables.
Core flextable processing function
Description
Central processing function for creating flextable objects from data tables. Handles N row extraction, condensing, indentation, zebra stripes, formatting, and styling. Used by table2docx, table2pptx, and table2rtf.
Usage
process_table_for_flextable(
table,
caption = NULL,
font_size = 10,
font_family = "Arial",
format_headers = TRUE,
bold_significant = TRUE,
p_threshold = 0.05,
indent_groups = FALSE,
condense_table = FALSE,
condense_quantitative = FALSE,
zebra_stripes = FALSE,
dark_header = FALSE,
bold_variables = TRUE,
paper = "letter",
orientation = "portrait",
width = NULL,
align = NULL
)
Arguments
table |
Data.frame or data.table to process. |
caption |
Optional character string for table caption. |
font_size |
Numeric font size in points. |
font_family |
Character string font family name. |
format_headers |
Logical whether to format headers. |
bold_significant |
Logical whether to bold significant p-values. |
p_threshold |
Numeric p-value threshold for significance. |
indent_groups |
Logical whether to indent group levels. |
condense_table |
Logical whether to condense all variable types. |
condense_quantitative |
Logical whether to condense only continuous/survival. |
zebra_stripes |
Logical whether to apply alternating row shading. |
dark_header |
Logical whether to use dark header style. |
bold_variables |
Logical whether to bold variable names (non-indented rows). |
paper |
Character string paper size. |
orientation |
Character string page orientation. |
width |
Optional numeric table width in inches. |
align |
Optional alignment specification. |
Value
List with ft (flextable object) and caption components.
Process variable wrapper
Description
Routes variable processing to appropriate handler based on variable type (continuous, categorical, or survival). Returns both formatted display strings and raw numeric values.
Usage
process_variable(
data,
var,
group_var = NULL,
stats_continuous,
stats_categorical,
digits,
conf_level = 0.95,
na_include,
na_label,
test,
test_continuous,
test_categorical,
total,
total_label,
labels,
na_percent,
p_per_stat = FALSE,
marks = NULL,
...
)
Arguments
data |
Data.table containing the variable. |
var |
Character string naming the variable to process. |
group_var |
Optional character string naming the grouping variable. |
stats_continuous |
Character vector of statistics for continuous variables. |
stats_categorical |
Character vector of statistics for categorical variables. |
digits |
Integer number of decimal places for continuous statistics. |
conf_level |
Numeric confidence level for survival confidence intervals. |
na_include |
Logical whether to include missing values as a category. |
na_label |
Character string label for missing values. |
test |
Logical whether to perform statistical tests. |
test_continuous |
Character string specifying test type for continuous variables. |
test_categorical |
Character string specifying test type for categorical variables. |
total |
Logical or character controlling total column display. |
total_label |
Character string label for total column. |
labels |
Named character vector of variable labels. |
na_percent |
Logical whether to include NA in percentage denominators. |
p_per_stat |
Logical whether to show separate p-values per statistic for
continuous variables. Default |
marks |
List with |
... |
Additional arguments passed to test functions. |
Value
List with 'formatted' and 'raw' data.table components.
Fit a model with selective warning suppression
Description
Wraps model fitting expressions to suppress routine warnings from mixed-effects
and GLM fitting (e.g., singular fits, convergence messages, separation
warnings) while allowing unexpected warnings through. When verbose = TRUE,
all warnings are displayed.
Usage
quiet_fit(expr, verbose = FALSE)
Arguments
expr |
An unevaluated expression (model fitting call) to execute. |
verbose |
Logical. If |
Value
The result of evaluating expr.
Reorder columns to position total column
Description
Rearranges data.table columns to place the total column in the specified position (first, last, or default). Ensures proper ordering of Variable, Group, total, group columns, and p-value.
Usage
reorder_total_column(result, total, total_label)
Arguments
result |
Data.table with columns to reorder. |
total |
Logical or character: |
total_label |
Character string name of the total column. |
Value
Modified data.table with reordered columns.
Replace empty cells with "-"
Description
Converts empty strings and NA values to "-" for consistent display in exported tables. Preserves Variable column values.
Usage
replace_empty_cells(df)
Arguments
df |
Data.frame or data.table to process. |
Value
Data.table with empty cells replaced by "-".
Resolve number format marks
Description
Converts a number_format specification into a list of big.mark
and decimal.mark values used by all downstream formatting functions.
Supports named presets, custom two-element vectors, and the global
summata.number_format option.
Usage
resolve_number_marks(number_format = NULL)
Arguments
number_format |
Character string specifying a named preset, a
two-element character vector Named presets:
Custom vector: |
Value
A list with components:
big.markCharacter string for thousands separator.
decimal.markCharacter string for decimal separator.
Resolve the CI or range separator
Description
Determines the appropriate separator character between two numeric bounds
(e.g., CI lower-upper, range min-max) based on whether either bound is
negative and the current locale's decimal mark. This avoids ambiguous
output like "(-5--3)" or "(1,2-3,4)" with European commas.
Usage
resolve_separator(lower, upper, marks)
Arguments
lower |
Numeric value of the lower bound (or minimum). |
upper |
Numeric value of the upper bound (or maximum). |
marks |
List with |
Details
Rules:
If either bound is negative, use
" to "to avoid double-hyphen ambiguity (e.g.,"-5 to -3"not"-5--3").If the decimal mark is a comma (EU locale), use
"\u2013"(en-dash) to avoid confusion between decimal commas and separating commas (e.g.,"1,2\u20133,4"not"1,2-3,4").Otherwise, use a plain hyphen
"-".
Value
Character string separator.
Safe rounding that handles NULL and NA
Description
Rounds numeric values while gracefully handling NULL, empty, and NA inputs.
Usage
safe_round(x, digits)
Arguments
x |
Numeric value to round. |
digits |
Integer number of decimal places. |
Value
Rounded numeric value, or NA_real_ if input is NULL/NA/empty.
Sanitize certain symbols for LaTeX
Description
Escapes special LaTeX characters ( preserving existing LaTeX commands. Uses negative lookbehind to avoid double-escaping already escaped characters.
Usage
sanitize_for_latex(x)
Arguments
x |
Character vector to sanitize. |
Value
Character vector with special characters escaped for LaTeX.
Check if a binary variable should be condensed without category suffix
Description
Uses a greedy/liberal approach to determine if a binary variable's condensed
display should omit the category name. Returns TRUE if EITHER level of the
binary variable is a standard reference/affirmative value, OR if either level
matches/contains the variable label.
Usage
should_condense_binary(ref_category, non_ref_category, label = NULL)
Arguments
ref_category |
Character string with the reference category name (the level with NA estimate). |
non_ref_category |
Character string with the non-reference category name (the level with the actual estimate). |
label |
Optional character string with the variable label. Used for intelligent matching (e.g., "30-Day Readmission" label with "30-day readmission" / "No 30-day readmission" levels). |
Details
This function is designed for binary (2-level) categorical variables where one level is a reference and one is the "event" or "condition" level.
Value
Logical indicating whether the binary variable should be condensed without appending the category name.
Package Imports
Description
This file declares imports from base R and stats packages to avoid "no visible global function definition" warnings in R CMD check.
Create Publication-Ready Survival Summary Tables
Description
Generates comprehensive survival summary tables with survival probabilities at specified time points, median survival times, and optional group comparisons with statistical testing. Designed for creating survival summaries commonly used in clinical and epidemiological research publications.
Usage
survtable(
data,
outcome,
by = NULL,
times = NULL,
probs = 0.5,
stats = c("survival", "ci"),
type = "survival",
conf_level = 0.95,
conf_type = "log",
digits = 0,
time_digits = 1,
p_digits = 3,
percent = TRUE,
test = TRUE,
test_type = "logrank",
total = TRUE,
total_label = "Total",
time_unit = NULL,
time_label = NULL,
median_label = NULL,
labels = NULL,
by_label = NULL,
na_rm = TRUE,
number_format = NULL,
...
)
Arguments
data |
Data frame or data.table containing the survival dataset. Automatically converted to a data.table for efficient processing. |
outcome |
Character string or character vector specifying one or more
survival outcomes using |
by |
Character string specifying the column name of the stratifying
variable for group comparisons (e.g., treatment arm, risk group). When
|
times |
Numeric vector of time points at which to estimate survival
probabilities. For example, |
probs |
Numeric vector of survival probabilities for which to estimate
corresponding survival times (quantiles). Values must be between 0 and 1.
For example, |
stats |
Character vector specifying which statistics to display:
Default is |
type |
Character string specifying the type of probability to report:
|
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
conf_type |
Character string specifying the confidence interval type for survival estimates:
|
digits |
Integer specifying the number of decimal places for survival probabilities (as percentages). Default is 0 (whole percentages). |
time_digits |
Integer specifying the number of decimal places for survival time estimates (median, quantiles). Default is 1. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
percent |
Logical. If |
test |
Logical. If |
test_type |
Character string specifying the statistical test for comparing survival curves:
|
total |
Logical or character string controlling the total/overall column:
|
total_label |
Character string for the total/overall row label.
Default is |
time_unit |
Character string specifying the time unit for display
in column headers and labels (e.g., |
time_label |
Character string template for time column headers when
|
median_label |
Character string for the median survival row label.
Default is |
labels |
Named character vector or list providing custom display
labels. For stratified analyses, names should match levels of the
|
by_label |
Character string providing a custom label for the
stratifying variable (used in output attributes and headers).
Default is |
na_rm |
Logical. If |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
... |
Additional arguments passed to
|
Details
Survival Probability Estimation:
Survival probabilities are estimated using the Kaplan-Meier method via
survfit. At each specified time point, the function
reports the estimated probability of surviving beyond that time.
Confidence Intervals:
The default "log" transformation for confidence intervals is
recommended as it ensures intervals remain within [0, 1] and has good
statistical properties. The "log-log" transformation is also
commonly used and may perform better in the tails.
Statistical Testing:
The log-rank test (default) tests the null hypothesis that survival curves are identical across groups. Alternative tests weight different parts of the survival curve:
Log-rank: Equal weights (best for proportional hazards)
Wilcoxon: Weights by number at risk (sensitive to early differences)
Tarone-Ware: Weights by square root of number at risk
Peto-Peto: Modified Wilcoxon weights
Formatting:
All numeric output respects the number_format parameter.
Separators within confidence intervals adapt automatically to avoid
ambiguity:
Survival probabilities:
"85% (80%-89%)"(US) or"85% (80%-89%)"(EU, en-dash separator)Median survival:
"24.5 (21.2-28.9)"(US) or"24,5 (21,2-28,9)"(EU)Counts
\ge1000:"1,234"(US) or"1.234"(EU)-
p-values:
"< 0.001"(US) or"< 0,001"(EU)
Value
A data.table with S3 class "survtable" containing formatted
survival statistics. The table structure depends on parameters:
When times is specified (survival at time points):
- Variable/Group
Row identifier – stratifying variable levels
- Time columns
Survival statistics at each requested time point
- p-value
Test p-value (if
test = TRUEandbyspecified)
When only probs is specified (survival quantiles):
- Variable/Group
Row identifier – stratifying variable levels
- Quantile columns
Time to reach each survival probability
- p-value
Test p-value (if
test = TRUEandbyspecified)
All numeric output (probabilities, times, counts, p-values)
respects the number_format setting for locale-appropriate
formatting.
The returned object includes the following attributes:
- raw_data
Data.table with unformatted numeric values
- survfit_objects
List of survfit objects for each stratum
- by_variable
The stratifying variable name
- times
The time points requested
- probs
The probability quantiles requested
- test_result
Full test result object (if test performed)
See Also
desctable for baseline characteristics tables,
fit for regression analysis,
table2pdf for PDF export,
table2docx for Word export,
survfit for underlying survival estimation,
survdiff for survival curve comparison tests
Other descriptive functions:
desctable(),
print.survtable()
Examples
# Load example data
data(clintrial)
# Example 1: Survival at specific time points by treatment
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24, 36),
time_unit = "months"
)
# Example 2: Median survival only
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = NULL,
probs = 0.5
)
# Example 3: Multiple quantiles (quartiles)
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "stage",
times = NULL,
probs = c(0.25, 0.5, 0.75)
)
# Example 4: Both time points and median
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
probs = 0.5,
time_unit = "months"
)
# Example 5: Cumulative incidence (1 - survival)
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
type = "risk"
)
# Example 6: Include number at risk
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
stats = c("survival", "ci", "n_risk")
)
# Example 7: Overall survival without stratification
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
times = c(12, 24, 36, 48)
)
# Example 8: Without total row
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
total = FALSE
)
# Example 9: Custom labels
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
labels = c("Drug A" = "Treatment A", "Drug B" = "Treatment B"),
time_unit = "months"
)
# Example 10: Different confidence interval type
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
conf_type = "log-log"
)
# Example 11: Wilcoxon test instead of log-rank
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
test_type = "wilcoxon"
)
# Example 12: Access raw data for custom analysis
result <- survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24)
)
raw <- attr(result, "raw_data")
print(raw)
# Example 13: Access survfit objects for plotting
fits <- attr(result, "survfit_objects")
plot(fits$overall) # Plot overall survival curve
# Example 14: Multiple survival outcomes stacked
survtable(
data = clintrial,
outcome = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"),
by = "treatment",
times = c(12, 24),
probs = 0.5,
time_unit = "months",
total = FALSE,
labels = c(
"Surv(pfs_months, pfs_status)" = "Progression-Free Survival",
"Surv(os_months, os_status)" = "Overall Survival"
)
)
# Example 15: European number formatting
survtable(
data = clintrial,
outcome = "Surv(os_months, os_status)",
by = "treatment",
times = c(12, 24),
number_format = "eu"
)
Export Table to Microsoft Word Format (DOCX)
Description
Converts a data frame, data.table, or matrix to a fully editable Microsoft Word
document (.docx) using the flextable and officer packages.
Creates publication-ready tables with extensive formatting options including
typography, alignment, colors, and page layout. Tables can be further edited in
Microsoft Word after creation.
Usage
table2docx(
table,
file,
caption = NULL,
font_size = 8,
font_family = "Arial",
format_headers = TRUE,
bold_significant = TRUE,
bold_variables = FALSE,
p_threshold = 0.05,
indent_groups = FALSE,
condense_table = FALSE,
condense_quantitative = FALSE,
zebra_stripes = FALSE,
dark_header = FALSE,
paper = "letter",
orientation = "portrait",
width = NULL,
align = NULL,
return_ft = FALSE,
...
)
Arguments
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output DOCX filename. Must have
|
caption |
Character string. Optional caption displayed above the table
in the Word document. Default is |
font_size |
Numeric. Base font size in points for table content. Default is 8. Typical range: 8-12 points. Headers use slightly larger size. |
font_family |
Character string. Font family name for the table. Must be
a font installed on the system. Default is |
format_headers |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
dark_header |
Logical. If |
paper |
Character string specifying paper size:
|
orientation |
Character string specifying page orientation:
|
width |
Numeric. Table width in inches. If |
align |
Character vector specifying column alignment for each column.
Options: |
return_ft |
Logical. If |
... |
Additional arguments passed to |
Details
Package Requirements:
This function requires:
-
flextable - For creating formatted tables
-
officer - For Word document manipulation
Install if needed:
install.packages(c("flextable", "officer"))
Output Features:
The generated Word document contains:
Fully editable table (native Word table, not image)
Professional typography and spacing
Proper page setup (size, orientation, margins)
Caption (if provided) as separate paragraph above table
All formatting preserved but editable
Compatible with Word 2007 and later
Further Customization:
For programmatic customization beyond the built-in options, access the
flextable object:
Method 1: Via attribute (default)
result <- table2docx(table, "output.docx") ft <- attr(result, "flextable") # Customize flextable ft <- flextable::bold(ft, i = 1, j = 1, part = "body") ft <- flextable::color(ft, i = 2, j = 3, color = "red") # Re-save if needed doc <- officer::read_docx() doc <- flextable::body_add_flextable(doc, ft) print(doc, target = "customized.docx")
Method 2: Direct return
ft <- table2docx(table, "output.docx", return_ft = TRUE) # Customize immediately ft <- flextable::bg(ft, bg = "yellow", part = "header") ft <- flextable::autofit(ft) # Save to new document doc <- officer::read_docx() doc <- flextable::body_add_flextable(doc, ft) print(doc, target = "custom.docx")
Page Layout:
The function automatically sets up the Word document with:
Specified paper size and orientation
Standard margins (1 inch by default)
Continuous section (no page breaks before table)
Left-aligned table placement
For landscape orientation:
Automatically swaps page width and height
Applies landscape property to section
Useful for wide tables with many columns
Table Width Management:
Width behavior:
-
width = NULL- Auto-fits to content and page width -
width = 6- Exactly 6 inches wide Width distributed evenly across columns by default
Can adjust individual column widths in Word after creation
For very wide tables:
Use
orientation = "landscape"Use
paper = "legal"for extra widthReduce
font_sizeUse
condense_table = TRUEConsider breaking across multiple tables
Typography:
The function applies professional typography:
Column headers: Bold, slightly larger font
Body text: Regular weight, specified font size
Numbers: Right-aligned for easy comparison
Text: Left-aligned for readability
Consistent spacing: Adequate padding in cells
Font family must be installed on the system where Word opens the document. Common cross-platform choices:
Arial - Sans-serif, highly readable
Times New Roman - Serif, traditional
Calibri - Microsoft default, modern
Helvetica - Sans-serif, professional
Zebra Striping:
When zebra_stripes = TRUE:
Alternating variables receive light gray background
All rows of same variable share same shading
Improves visual grouping
Particularly useful for tables with many factor variables
Color can be changed in Word after creation
Dark Header:
When dark_header = TRUE:
Header row: Dark gray/black background
Header text: White for high contrast
Modern, professional appearance
Draws attention to column names
Integration with R Markdown/Quarto:
For R Markdown/Quarto Word output:
# Create flextable for inline display ft <- table2docx(results, "temp.docx", return_ft = TRUE) # Display in R Markdown chunk ft # Renders in Word output
Or use flextable directly in chunks:
flextable::flextable(results)
Value
Behavior depends on return_ft:
return_ft = FALSEInvisibly returns a list with components:
-
file- Path to created file -
caption- Caption text (if provided)
The flextable object is accessible via
attr(result, "flextable")-
return_ft = TRUEDirectly returns the flextable object for immediate further customization
In both cases, creates a .docx file at the specified location.
See Also
autotable for automatic format detection,
table2pptx for PowerPoint slides,
table2pdf for PDF output,
table2html for HTML tables,
table2rtf for Rich Text Format,
table2tex for LaTeX output,
flextable for the underlying table object,
read_docx for Word document manipulation
Other export functions:
autotable(),
table2html(),
table2pdf(),
table2pptx(),
table2rtf(),
table2tex()
Examples
data(clintrial)
data(clintrial_labels)
# Create example table
results <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
labels = clintrial_labels
)
# Example 1: Basic Word export
if (requireNamespace("flextable", quietly = TRUE) &&
requireNamespace("officer", quietly = TRUE)) {
table2docx(results, file.path(tempdir(), "results.docx"))
}
old_width <- options(width = 180)
# Example 2: With caption
table2docx(results, file.path(tempdir(), "captioned.docx"),
caption = "Table 1: Multivariable Logistic Regression Results")
# Example 3: Landscape orientation for wide tables
table2docx(results, file.path(tempdir(), "wide.docx"),
orientation = "landscape")
# Example 4: Custom font and size
table2docx(results, file.path(tempdir(), "custom_font.docx"),
font_family = "Times New Roman",
font_size = 11)
# Example 5: Hierarchical display
table2docx(results, file.path(tempdir(), "indented.docx"),
indent_groups = TRUE)
# Example 6: Condensed table
table2docx(results, file.path(tempdir(), "condensed.docx"),
condense_table = TRUE)
# Example 7: With zebra stripes
table2docx(results, file.path(tempdir(), "striped.docx"),
zebra_stripes = TRUE)
# Example 8: Dark header style
table2docx(results, file.path(tempdir(), "dark.docx"),
dark_header = TRUE)
# Example 9: A4 paper for international journals
table2docx(results, file.path(tempdir(), "a4.docx"),
paper = "a4")
# Example 10: Get flextable for customization
result <- table2docx(results, file.path(tempdir(), "base.docx"))
ft <- attr(result, "flextable")
# Customize the flextable
ft <- flextable::bold(ft, i = 1, part = "body")
ft <- flextable::color(ft, j = "p-value", color = "blue")
# Example 11: Direct flextable return
ft <- table2docx(results, file.path(tempdir(), "direct.docx"), return_ft = TRUE)
ft <- flextable::bg(ft, bg = "yellow", part = "header")
# Example 12: Publication-ready table
table2docx(results, file.path(tempdir(), "publication.docx"),
caption = "Table 2: Adjusted Odds Ratios for Mortality",
font_family = "Times New Roman",
font_size = 10,
indent_groups = TRUE,
zebra_stripes = FALSE,
bold_significant = TRUE)
# Example 13: Custom column alignment
table2docx(results, file.path(tempdir(), "aligned.docx"),
align = c("left", "left", "center", "right", "right"))
# Example 14: Disable significance bolding
table2docx(results, file.path(tempdir(), "no_bold.docx"),
bold_significant = FALSE)
# Example 15: Stricter significance threshold
table2docx(results, file.path(tempdir(), "strict.docx"),
bold_significant = TRUE,
p_threshold = 0.01)
options(old_width)
Export Table to HTML Format
Description
Converts a data frame, data.table, or matrix to HTML format with optional CSS styling for web display, HTML documents, or embedding in web applications. Generates clean, standards-compliant HTML with professional styling options including responsive design support, color schemes, and interactive features. Requires xtable for export.
Usage
table2html(
table,
file,
caption = NULL,
format_headers = TRUE,
variable_padding = FALSE,
bold_significant = TRUE,
bold_variables = FALSE,
p_threshold = 0.05,
indent_groups = FALSE,
condense_table = FALSE,
condense_quantitative = FALSE,
zebra_stripes = FALSE,
stripe_color = "#EEEEEE",
dark_header = FALSE,
include_css = TRUE,
...
)
Arguments
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output HTML filename. Must have
|
caption |
Character string. Optional caption displayed below the table.
Supports basic HTML formatting. Default is |
format_headers |
Logical. If |
variable_padding |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
stripe_color |
Character string. HTML color specification for zebra
stripes. Can use hex codes ( |
dark_header |
Logical. If |
include_css |
Logical. If |
... |
Additional arguments passed to |
Details
Output Format:
The function generates standards-compliant HTML5 markup with:
Semantic
<table>structureProper
<thead>and<tbody>sectionsAccessible header cells (
<th>)Clean, readable markup
Optional embedded CSS styling
Standalone vs. Embedded:
Standalone HTML (include_css = TRUE):
Can be opened directly in web browsers
Includes all necessary styling
Self-contained, portable
Suitable for sharing via email or web hosting
Embedded HTML (include_css = FALSE):
For inclusion in existing HTML documents
No CSS included (use parent document's styles)
Smaller file size
Integrates with web frameworks (Shiny, R Markdown, Quarto)
CSS Styling:
When include_css = TRUE, the function applies professional styling:
-
Table: Border-collapse, sans-serif font (Arial), 20px margin
-
Cells: 8px vertical × 12px horizontal padding, left-aligned text
-
Borders: 1px solid
#DDD(light gray) -
Headers: Bold text, light gray background (
#F2F2F2) -
Numeric columns: Center-aligned (auto-detected)
-
Caption: Bold, 1.1em font, positioned below table
With dark_header = TRUE:
Header background: Black (
#000000)Header text: White (
#FFFFFF)Creates high contrast, modern appearance
With zebra_stripes = TRUE:
Alternating variable groups receive background color
Default color:
#EEEEEE(light gray)Applied via CSS class
.zebra-stripeGroups entire variable (all factor levels together)
Hierarchical Display:
The indent_groups option creates visual hierarchy using HTML
non-breaking spaces:
<td><b>Treatment</b></td> <!-- Variable name --> <td> Control</td> <!-- Indented level --> <td> Active</td> <!-- Indented level -->
Integration with R Markdown/Quarto:
For R Markdown or Quarto documents:
# Generate HTML fragment (no CSS) table2html(results, "table.html", include_css = FALSE)
Then include in your document chunk with results='asis':
cat(readLines("table.html"), sep = "\n")
Or directly render without file:
# For inline display
htmltools::HTML(
capture.output(
print(xtable::xtable(results), type = "html")
)
)
Integration with Shiny:
For Shiny applications:
# In server function
output$results_table <- renderUI({
table2html(results_data(), "temp.html", include_css = FALSE)
HTML(readLines("temp.html"))
})
# Or use directly with DT package for interactive tables
output$interactive_table <- DT::renderDT({
results_data()
})
Accessibility:
The generated HTML follows accessibility best practices:
Semantic table structure
Proper header cells (
<th>) with scope attributesClear visual hierarchy
Adequate color contrast (when using default styles)
Screen reader friendly markup
Value
Invisibly returns NULL. Creates an HTML file at the specified
location that can be opened in web browsers or embedded in HTML documents.
See Also
autotable for automatic format detection,
table2pdf for PDF output,
table2tex for LaTeX output,
table2docx for Word documents,
table2pptx for PowerPoint,
table2rtf for Rich Text Format,
fit for regression tables,
desctable for descriptive tables
Other export functions:
autotable(),
table2docx(),
table2pdf(),
table2pptx(),
table2rtf(),
table2tex()
Examples
data(clintrial)
data(clintrial_labels)
# Create example table
results <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
labels = clintrial_labels
)
# Example 1: Basic HTML export (standalone)
if (requireNamespace("xtable", quietly = TRUE)) {
table2html(results, file.path(tempdir(), "results.html"))
}
# Example 2: With caption
table2html(results, file.path(tempdir(), "captioned.html"),
caption = "Table 1: Multivariable Logistic Regression Results")
# Example 3: For embedding (no CSS)
table2html(results, file.path(tempdir(), "embed.html"),
include_css = FALSE)
# Include in your HTML document
# Example 4: Hierarchical display
table2html(results, file.path(tempdir(), "indented.html"),
indent_groups = TRUE)
# Example 5: Condensed table
table2html(results, file.path(tempdir(), "condensed.html"),
condense_table = TRUE)
# Example 6: With zebra stripes
table2html(results, file.path(tempdir(), "striped.html"),
zebra_stripes = TRUE,
stripe_color = "#F0F0F0")
# Example 7: Dark header style
table2html(results, file.path(tempdir(), "dark.html"),
dark_header = TRUE)
# Example 8: Combination styling
table2html(results, file.path(tempdir(), "styled.html"),
zebra_stripes = TRUE,
dark_header = TRUE,
bold_significant = TRUE)
# Example 9: Custom stripe color
table2html(results, file.path(tempdir(), "blue_stripes.html"),
zebra_stripes = TRUE,
stripe_color = "#E3F2FD") # Light blue
# Example 10: Disable significance bolding
table2html(results, file.path(tempdir(), "no_bold.html"),
bold_significant = FALSE)
# Example 11: Stricter significance threshold
table2html(results, file.path(tempdir(), "strict.html"),
bold_significant = TRUE,
p_threshold = 0.01)
# Example 12: No header formatting
table2html(results, file.path(tempdir(), "raw_headers.html"),
format_headers = FALSE)
# Example 13: Descriptive statistics table
desc_table <- desctable(clintrial, by = "treatment",
variables = c("age", "sex", "bmi"), labels = clintrial_labels)
table2html(desc_table, file.path(tempdir(), "baseline.html"),
caption = "Table 1: Baseline Characteristics by Treatment Group")
# Example 14: For R Markdown (no CSS, for inline display)
table2html(results, file.path(tempdir(), "rmd_table.html"),
include_css = FALSE,
indent_groups = TRUE)
# Then in R Markdown, use a chunk with results='asis' to display inline:
cat(readLines(file.path(tempdir(), "rmd_table.html")), sep = "\n")
# Example 15: Email-friendly version
table2html(results, file.path(tempdir(), "email.html"),
include_css = TRUE, # Self-contained
zebra_stripes = TRUE,
caption = "Regression Results - See Attached")
# Can be directly included in HTML emails
# Example 16: Publication-ready web version
table2html(results, file.path(tempdir(), "publication.html"),
caption = "Table 2: Multivariable Analysis of Risk Factors",
indent_groups = TRUE,
zebra_stripes = FALSE, # Clean look
bold_significant = TRUE,
dark_header = FALSE)
# Example 17: Modern dark theme
table2html(results, file.path(tempdir(), "dark_theme.html"),
dark_header = TRUE,
stripe_color = "#2A2A2A", # Dark gray stripes
zebra_stripes = TRUE)
# Example 18: Minimal styling for custom CSS
table2html(results, file.path(tempdir(), "minimal.html"),
include_css = FALSE,
format_headers = FALSE,
bold_significant = FALSE)
# Apply your own CSS classes and styling
# Example 19: Model comparison table
models <- list(
base = c("age", "sex"),
full = c("age", "sex", "treatment", "stage")
)
comparison <- compfit(
data = clintrial,
outcome = "os_status",
model_list = models
)
table2html(comparison, file.path(tempdir(), "comparison.html"),
caption = "Model Comparison Statistics")
Export Table to PDF Format
Description
Converts a data frame, data.table, or matrix to a professionally formatted PDF document using LaTeX as an intermediate format. Provides extensive control over page layout, typography, and formatting for publication-ready output. Particularly well-suited for tables from regression analyses, descriptive statistics, and model comparisons. Requires xtable for export.
Usage
table2pdf(
table,
file,
orientation = "portrait",
paper = "letter",
margins = NULL,
fit_to_page = TRUE,
font_size = 8,
caption = NULL,
caption_size = NULL,
format_headers = TRUE,
variable_padding = FALSE,
cell_padding = "normal",
bold_significant = TRUE,
bold_variables = FALSE,
p_threshold = 0.05,
align = NULL,
indent_groups = FALSE,
condense_table = FALSE,
condense_quantitative = FALSE,
zebra_stripes = FALSE,
stripe_color = "gray!20",
dark_header = FALSE,
show_logs = FALSE,
...
)
Arguments
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output PDF filename. Must have
|
orientation |
Character string specifying page orientation:
|
paper |
Character string specifying paper size:
|
margins |
Numeric vector of length 4 specifying margins in inches as
|
fit_to_page |
Logical. If |
font_size |
Numeric. Base font size in points. Default is 8. Smaller values accommodate more content; larger values improve readability. Typical range: 6-12 points. |
caption |
Character string. Optional caption displayed below the table.
Supports LaTeX formatting for multi-line captions, superscripts, italics, etc.
See Details for formatting guidance. Default is |
caption_size |
Numeric. Caption font size in points. If |
format_headers |
Logical. If |
variable_padding |
Logical. If |
cell_padding |
Character string or numeric specifying vertical padding within table cells:
Adjusts |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
align |
Character string or vector specifying column alignment. Options:
If |
indent_groups |
Logical. If |
condense_table |
Logical. If
Significantly reduces table height. Default is |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
stripe_color |
Character string. LaTeX color specification for zebra
stripes. Default is |
dark_header |
Logical. If |
show_logs |
Logical. If |
... |
Additional arguments passed to |
Details
LaTeX Requirements:
This function requires a working LaTeX installation. The function checks for LaTeX availability and provides installation guidance if missing.
Recommended LaTeX distributions:
-
TinyTeX (lightweight, R-integrated): Install via
tinytex::install_tinytex() -
TeX Live (comprehensive, cross-platform)
-
MiKTeX (Windows)
-
MacTeX (macOS)
Required LaTeX packages (auto-installed with most distributions):
-
fontenc,inputenc- Character encoding -
array,booktabs,longtable- Table formatting -
graphicx- Scaling tables -
geometry- Page layout -
pdflscape,lscape- Landscape orientation -
helvet- Sans-serif fonts -
standalone,varwidth- Auto-sizing (forpaper = "auto") -
float,caption- Floats and captions -
xcolor,colortable- Colors (forzebra_stripesordark_header)
Caption Formatting:
Captions support LaTeX commands for rich formatting:
# Multi-line caption with line breaks
caption = "Table 1: Multivariable Analysis\\
OR = odds ratio; CI = confidence interval"
# With superscripts (using LaTeX syntax)
caption = "Table 1: Results\\
Adjusted for age and sex\\
p-values from Wald tests"
# With special characters (must escape percent signs)
caption = "Results for income (in thousands)"
Auto-Sizing (paper = "auto"):
When paper = "auto", the function attempts to create a minimal PDF
sized exactly to the table content:
Using the
standaloneLaTeX class (cleanest output)Fallback to
pdfcroputility if standalone unavailableFallback to minimal margins if neither available
Table Width Management:
For wide tables that don't fit on the page:
Use
orientation = "landscape"Use
fit_to_page = TRUE(default) to auto-scaleReduce
font_size(e.g., 7 or 6)Consider
paper = "auto"for maximum flexibility
Troubleshooting:
If PDF compilation fails:
Check that LaTeX is installed: Run
Sys.which("pdflatex")Set
show_logs = TRUEand examine the .log fileCommon issues:
Missing LaTeX packages: Install via package manager
Special characters in text: Escape properly
Very wide tables: Use landscape or reduce font size
Caption formatting: Check LaTeX syntax
Value
Invisibly returns NULL. Creates a PDF file at the specified
location. If compilation fails, check the .log file (if
show_logs = TRUE) for error details.
See Also
autotable for automatic format detection,
table2tex for LaTeX source files,
table2html for HTML output,
table2docx for Microsoft Word,
table2pptx for PowerPoint,
table2rtf for Rich Text Format,
desctable for descriptive tables,
fit for regression tables
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pptx(),
table2rtf(),
table2tex()
Examples
data(clintrial)
data(clintrial_labels)
# Create example table
results <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
labels = clintrial_labels
)
# Test that LaTeX can compile (needed for all PDF examples)
has_latex <- local({
if (!nzchar(Sys.which("pdflatex"))) return(FALSE)
test_tex <- file.path(tempdir(), "summata_latex_test.tex")
writeLines(c("\\documentclass{article}", "\\usepackage{booktabs}",
"\\begin{document}", "test", "\\end{document}"), test_tex)
tryCatch(
system2("pdflatex", c("-interaction=nonstopmode",
paste0("-output-directory=", tempdir()), test_tex),
stdout = FALSE, stderr = FALSE),
error = function(e) 1L) == 0L
})
# Example 1: Basic PDF export
if(has_latex){
table2pdf(results, file.path(tempdir(), "basic_results.pdf"))
}
if(has_latex){
# Example 2: Landscape orientation for wide tables
table2pdf(results, file.path(tempdir(), "wide_results.pdf"),
orientation = "landscape")
# Example 3: With caption
table2pdf(results, file.path(tempdir(), "captioned.pdf"),
caption = "Table 1: Multivariable logistic regression results")
# Example 4: Multi-line caption with formatting
table2pdf(results, file.path(tempdir(), "formatted_caption.pdf"),
caption = "Table 1: Risk Factors for Mortality\\\\
aOR = adjusted odds ratio; CI = confidence interval")
# Example 5: Auto-sized PDF (no fixed page dimensions)
table2pdf(results, file.path(tempdir(), "autosize.pdf"),
paper = "auto")
# Example 6: A4 paper with custom margins
table2pdf(results, file.path(tempdir(), "a4_custom.pdf"),
paper = "a4",
margins = c(0.75, 0.75, 0.75, 0.75))
# Example 7: Larger font for readability
table2pdf(results, file.path(tempdir(), "large_font.pdf"),
font_size = 11)
# Example 8: Indented hierarchical display
table2pdf(results, file.path(tempdir(), "indented.pdf"),
indent_groups = TRUE)
# Example 9: Condensed table (reduced height)
table2pdf(results, file.path(tempdir(), "condensed.pdf"),
condense_table = TRUE)
# Example 10: With zebra stripes
table2pdf(results, file.path(tempdir(), "striped.pdf"),
zebra_stripes = TRUE,
stripe_color = "gray!15")
# Example 11: Dark header style
table2pdf(results, file.path(tempdir(), "dark_header.pdf"),
dark_header = TRUE)
# Example 12: Combination of formatting options
table2pdf(results, file.path(tempdir(), "publication_ready.pdf"),
orientation = "portrait",
paper = "letter",
font_size = 9,
caption = "Table 2: Multivariable Analysis\\\\
Model adjusted for age, sex, and clinical factors",
indent_groups = TRUE,
zebra_stripes = TRUE,
bold_significant = TRUE,
p_threshold = 0.05)
# Example 13: Adjust cell padding
table2pdf(results, file.path(tempdir(), "relaxed_padding.pdf"),
cell_padding = "relaxed") # More spacious
# Example 14: No scaling (natural table width)
table2pdf(results, file.path(tempdir(), "no_scale.pdf"),
fit_to_page = FALSE,
font_size = 10)
# Example 15: Hide significance bolding
table2pdf(results, file.path(tempdir(), "no_bold.pdf"),
bold_significant = FALSE)
# Example 16: Custom column alignment
table2pdf(results, file.path(tempdir(), "custom_align.pdf"),
align = c("c", "c", "c", "c", "c", "c", "c"))
# Example 17: Descriptive statistics table
desc_table <- desctable(clintrial, by = "treatment",
variables = c("age", "sex", "bmi", "stage"), labels = clintrial_labels)
table2pdf(desc_table, file.path(tempdir(), "descriptive.pdf"),
caption = "Table 1: Baseline Characteristics by Treatment Group",
orientation = "landscape")
# Example 18: Model comparison table
models <- list(
base = c("age", "sex"),
full = c("age", "sex", "bmi", "treatment")
)
comparison <- compfit(
data = clintrial,
outcome = "os_status",
model_list = models
)
table2pdf(comparison, file.path(tempdir(), "model_comparison.pdf"),
caption = "Table 3: Model Comparison Statistics")
# Example 19: Very wide table with aggressive fitting
wide_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "race", "bmi", "smoking",
"hypertension", "diabetes", "treatment", "stage")
)
table2pdf(wide_model, file.path(tempdir(), "very_wide.pdf"),
orientation = "landscape",
font_size = 7,
fit_to_page = TRUE,
condense_table = TRUE)
# Example 20: With caption size control
table2pdf(results, file.path(tempdir(), "caption_size.pdf"),
font_size = 8,
caption_size = 6,
caption = "Table 4: Results with Compact Caption\\\\
Smaller caption fits better on constrained pages")
# Example 21: Troubleshooting - keep logs
table2pdf(results, file.path(tempdir(), "debug.pdf"),
show_logs = TRUE)
# If it fails, check debug.log for error messages
}
Export Table to Microsoft PowerPoint Format (PPTX)
Description
Converts a data frame, data.table, or matrix to a Microsoft PowerPoint slide
(.pptx) with a formatted table using the flextable and officer
packages. Creates presentation-ready slides with extensive control over table
formatting, positioning, and layout. Tables can be further edited in PowerPoint
after creation. Ideal for creating data-driven presentations and conference talks.
Usage
table2pptx(
table,
file,
caption = NULL,
font_size = 10,
font_family = "Arial",
format_headers = TRUE,
bold_significant = TRUE,
bold_variables = FALSE,
p_threshold = 0.05,
indent_groups = FALSE,
condense_table = FALSE,
condense_quantitative = FALSE,
zebra_stripes = FALSE,
dark_header = FALSE,
width = NULL,
align = NULL,
template = NULL,
layout = "Title and Content",
master = "Office Theme",
left = 0.5,
top = 1.5,
return_ft = FALSE,
...
)
Arguments
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output PPTX filename. Must have
|
caption |
Character string. Optional title displayed in the slide's title
placeholder or as text box above the table. Default is |
font_size |
Numeric. Base font size in points for table content. Default is 10. Typical range for presentations: 10-14 points. Larger than print documents for visibility at distance. |
font_family |
Character string. Font family name for the table. Must be
installed on the system. Default is |
format_headers |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
dark_header |
Logical. If |
width |
Numeric. Table width in inches. If |
align |
Character vector specifying column alignment. Options:
|
template |
Character string. Path to custom PPTX template file. If
|
layout |
Character string. Name of slide layout to use from template.
Default is
|
master |
Character string. Name of slide master to use. Default is
|
left |
Numeric. Horizontal position from left edge of slide in inches. Default is 0.5. Standard slide is 10 inches wide. |
top |
Numeric. Vertical position from top edge of slide in inches. Default is 1.5 (leaves room for title). Standard slide is 7.5 inches tall. Adjust based on table size and layout. |
return_ft |
Logical. If |
... |
Additional arguments passed to |
Details
Package Requirements:
Requires:
-
flextable - Table creation and formatting
-
officer - PowerPoint manipulation
Install: install.packages(c("flextable", "officer"))
Slide Dimensions:
Standard PowerPoint slide:
Width: 10 inches (25.4 cm)
Height: 7.5 inches (19.05 cm)
Aspect ratio: 4:3 (standard) or 16:9 (widescreen)
Safe content area (with margins):
Width: ~9 inches
Height: ~6 inches (accounting for title)
Positioning:
The left and top parameters control table placement:
(0, 0) = Top-left corner of slide
Default (0.5, 1.5) = Standard position with title room
Center:
left = (10 - table_width) / 2
When caption is provided:
Attempts to use title placeholder (if layout supports)
Falls back to text box above table
Automatically adjusts table position downward
Slide Layouts:
Different layouts serve different purposes:
Title and Content (default):
Has title and content placeholders
Caption goes in title area
Table in content area
Most common for data slides
Blank:
No predefined areas
Maximum flexibility
Use absolute positioning (
left,top)Good for custom layouts
Title-Only:
Title area only
Large space for table
Good for data-heavy slides
Custom Templates:
Use organizational or conference templates:
table2pptx(table, "branded.pptx",
template = "company_template.pptx",
layout = "Content Layout", # Name from template
master = "Company Theme") # Name from template
To find layout and master names in template:
pres <- officer::read_pptx("template.pptx")
officer::layout_summary(pres)
Multiple Slides:
Creating presentations with multiple tables:
# Each call creates new presentation - combine after
table2pptx(table1, "slide1.pptx", caption = "Results Part 1")
table2pptx(table2, "slide2.pptx", caption = "Results Part 2")
# Then manually combine in PowerPoint, or:
# Use officer to create multi-slide presentation
pres <- officer::read_pptx()
# Add first table
ft1 <- table2pptx(table1, "temp1.pptx", return_ft = TRUE)
pres <- officer::add_slide(pres)
pres <- officer::ph_with(pres, ft1,
location = officer::ph_location(left = 0.5, top = 1.5))
# Add second table
ft2 <- table2pptx(table2, "temp2.pptx", return_ft = TRUE)
pres <- officer::add_slide(pres)
pres <- officer::ph_with(pres, ft2,
location = officer::ph_location(left = 0.5, top = 1.5))
print(pres, target = "combined.pptx")
Further Customization:
Access the flextable object for advanced formatting:
ft <- table2pptx(table, "base.pptx", return_ft = TRUE)
# Customize
ft <- flextable::color(ft, j = "p-value", color = "red")
ft <- flextable::bg(ft, i = 1, bg = "yellow")
ft <- flextable::bold(ft, i = ~ estimate > 0, j = "estimate")
# Save to new slide
pres <- officer::read_pptx()
pres <- officer::add_slide(pres)
pres <- officer::ph_with(pres, ft,
location = officer::ph_location(left = 0.5, top = 1.5))
print(pres, target = "custom.pptx")
Value
Behavior depends on return_ft:
return_ft = FALSEInvisibly returns a list with:
-
file- Path to created file -
caption- Caption/title text -
layout- Layout name used -
master- Master name used -
template- Template path (if provided) -
position- List withleftandtopcoordinates
Flextable accessible via
attr(result, "flextable")-
return_ft = TRUEDirectly returns the flextable object
Always creates a .pptx file at the specified location.
See Also
autotable for automatic format detection,
table2docx for Word documents,
table2pdf for PDF output,
table2html for HTML tables,
table2rtf for Rich Text Format,
table2tex for LaTeX output,
flextable for table customization,
read_pptx for PowerPoint manipulation
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pdf(),
table2rtf(),
table2tex()
Examples
# Create example data
data(clintrial)
data(clintrial_labels)
tbl <- desctable(clintrial, by = "treatment",
variables = c("age", "sex"), labels = clintrial_labels)
# Basic PowerPoint export
if (requireNamespace("flextable", quietly = TRUE) &&
requireNamespace("officer", quietly = TRUE)) {
table2pptx(tbl, file.path(tempdir(), "example.pptx"))
}
old_width <- options(width = 180)
# Load data
data(clintrial)
data(clintrial_labels)
# Create regression table
results <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment"),
labels = clintrial_labels
)
# Example 1: Basic PowerPoint slide
table2pptx(results, file.path(tempdir(), "results.pptx"))
# Example 2: With title
table2pptx(results, file.path(tempdir(), "titled.pptx"),
caption = "Multivariable Regression Results")
# Example 3: Larger font for visibility
table2pptx(results, file.path(tempdir(), "large_font.pptx"),
font_size = 12,
caption = "Main Findings")
# Example 4: Condensed for slide space
table2pptx(results, file.path(tempdir(), "condensed.pptx"),
condense_table = TRUE,
caption = "Key Results")
# Example 5: Dark header for emphasis
table2pptx(results, file.path(tempdir(), "dark.pptx"),
dark_header = TRUE,
caption = "Risk Factors")
# Example 6: With zebra stripes
table2pptx(results, file.path(tempdir(), "striped.pptx"),
zebra_stripes = TRUE)
# Example 7: Blank layout with custom positioning
table2pptx(results, file.path(tempdir(), "blank.pptx"),
layout = "Blank",
left = 1,
top = 1.5,
width = 8)
# Example 8: Get flextable for customization
ft <- table2pptx(results, file.path(tempdir(), "base.pptx"), return_ft = TRUE)
# Customize the returned flextable object
ft <- flextable::color(ft, j = "p-value", color = "darkred")
# Example 9: Presentation-optimized table
table2pptx(results, file.path(tempdir(), "presentation.pptx"),
caption = "Main Analysis Results",
font_size = 11,
condense_table = TRUE,
zebra_stripes = TRUE,
dark_header = TRUE,
bold_significant = TRUE)
# Example 10: Descriptive statistics slide
desc <- desctable(
data = clintrial,
by = "treatment",
variables = c("age", "sex", "bmi"),
labels = clintrial_labels
)
table2pptx(desc, file.path(tempdir(), "baseline.pptx"),
caption = "Baseline Characteristics",
font_size = 10)
# Example 11: Conference presentation style
table2pptx(results, file.path(tempdir(), "conference.pptx"),
caption = "Study Outcomes",
font_family = "Calibri",
font_size = 14, # Large for big rooms
dark_header = TRUE,
condense_table = TRUE)
options(old_width)
Export Table to Rich Text Format (RTF)
Description
Converts a data frame, data.table, or matrix to a Rich Text Format (.rtf)
document using the flextable and officer packages. Creates
widely compatible tables with extensive formatting options. RTF files can be
opened and edited in Microsoft Word, LibreOffice, WordPad, and many other word
processors. Particularly useful for regulatory submissions, cross-platform
compatibility, and when maximum editability is required.
Usage
table2rtf(
table,
file,
caption = NULL,
font_size = 8,
font_family = "Arial",
format_headers = TRUE,
bold_significant = TRUE,
bold_variables = FALSE,
p_threshold = 0.05,
indent_groups = FALSE,
condense_table = FALSE,
condense_quantitative = FALSE,
zebra_stripes = FALSE,
dark_header = FALSE,
paper = "letter",
orientation = "portrait",
width = NULL,
align = NULL,
return_ft = FALSE,
...
)
Arguments
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output RTF filename. Must have
|
caption |
Character string. Optional caption displayed above the table
in the RTF document. Default is |
font_size |
Numeric. Base font size in points for table content. Default is 8. Typical range: 8-12 points. Headers use slightly larger size. |
font_family |
Character string. Font family name for the table. Must be
a font installed on the system. Default is |
format_headers |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
dark_header |
Logical. If |
paper |
Character string specifying paper size:
|
orientation |
Character string specifying page orientation:
|
width |
Numeric. Table width in inches. If |
align |
Character vector specifying column alignment for each column.
Options: |
return_ft |
Logical. If |
... |
Additional arguments (currently unused, reserved for future extensions). |
Details
Package Requirements:
This function requires:
-
flextable - For creating formatted tables
-
officer - For RTF document generation
Install if needed:
install.packages(c("flextable", "officer"))
RTF Format Advantages:
RTF (Rich Text Format) is a universal document format with several advantages:
-
Maximum compatibility - Opens in virtually all word processors
-
Cross-platform - Works on Windows, Mac, Linux without conversion
-
Fully editable - Native text format, not embedded objects
-
Lightweight - Smaller file sizes than DOCX
-
Regulatory compliance - Widely accepted for submissions (FDA, EMA)
-
Long-term accessibility - Simple text-based format
-
Version control friendly - Text-based, works with diff tools
Applications that can open RTF files:
Microsoft Word (Windows, Mac)
LibreOffice Writer
Apache OpenOffice Writer
WordPad (Windows built-in)
TextEdit (Mac built-in)
Google Docs (with import)
Pages (Mac)
Many other word processors
Output Features:
The generated RTF document contains:
Fully editable table (native RTF table, not image)
Professional typography and spacing
Proper page setup (size, orientation, margins)
Caption (if provided) as separate paragraph above table
All formatting preserved but editable
Compatible with RTF 1.5 specification
Further Customization:
For programmatic customization beyond the built-in options, access the
flextable object:
Method 1: Via attribute (default)
result <- table2rtf(table, "output.rtf") ft <- attr(result, "flextable") # Customize flextable ft <- flextable::bold(ft, i = 1, j = 1, part = "body") ft <- flextable::color(ft, i = 2, j = 3, color = "red") # Re-save if needed flextable::save_as_rtf(ft, path = "customized.rtf")
Method 2: Direct return
ft <- table2rtf(table, "output.rtf", return_ft = TRUE) # Customize immediately ft <- flextable::bg(ft, bg = "yellow", part = "header") ft <- flextable::autofit(ft) # Save to new file flextable::save_as_rtf(ft, path = "custom.rtf")
Page Layout:
The function automatically sets up the RTF document with:
Specified paper size and orientation
Standard margins (1 inch by default)
Table positioned at document start
Left-aligned table placement
For landscape orientation:
Automatically swaps page dimensions
Applies landscape property
Useful for wide tables with many columns
Table Width Management:
Width behavior:
-
width = NULL- Auto-fits to content and page width -
width = 6- Exactly 6 inches wide Width distributed evenly across columns by default
Can adjust individual column widths in word processor after creation
For very wide tables:
Use
orientation = "landscape"Use
paper = "legal"for extra widthReduce
font_sizeUse
condense_table = TRUEConsider breaking across multiple tables
Typography:
The function applies professional typography:
Column headers: Bold, slightly larger font
Body text: Regular weight, specified font size
Numbers: Right-aligned for easy comparison
Text: Left-aligned for readability
Consistent spacing: Adequate padding in cells
Statistical notation: Italicized appropriately
Value
Behavior depends on return_ft:
return_ft = FALSEInvisibly returns a list with components:
-
file- Path to created file -
caption- Caption text (if provided)
The flextable object is accessible via
attr(result, "flextable")-
return_ft = TRUEDirectly returns the flextable object for immediate further customization
In both cases, creates a .rtf file at the specified location.
See Also
autotable for automatic format detection,
table2docx for Word documents,
table2pptx for PowerPoint slides,
table2pdf for PDF output,
table2html for HTML tables,
table2tex for LaTeX output,
flextable for the underlying table object,
save_as_rtf for direct RTF export
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pdf(),
table2pptx(),
table2tex()
Examples
data(clintrial)
data(clintrial_labels)
# Create example table
results <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
labels = clintrial_labels
)
# Example 1: Basic RTF export
if (requireNamespace("flextable", quietly = TRUE)) {
table2rtf(results, file.path(tempdir(), "results.rtf"))
}
old_width <- options(width = 180)
# Example 2: With caption
table2rtf(results, file.path(tempdir(), "captioned.rtf"),
caption = "Table 1: Multivariable Logistic Regression Results")
# Example 3: Landscape orientation for wide tables
table2rtf(results, file.path(tempdir(), "wide.rtf"),
orientation = "landscape")
# Example 4: Custom font and size
table2rtf(results, file.path(tempdir(), "custom_font.rtf"),
font_family = "Times New Roman",
font_size = 11)
# Example 5: Hierarchical display
table2rtf(results, file.path(tempdir(), "indented.rtf"),
indent_groups = TRUE)
# Example 6: Condensed table
table2rtf(results, file.path(tempdir(), "condensed.rtf"),
condense_table = TRUE)
# Example 7: With zebra stripes
table2rtf(results, file.path(tempdir(), "striped.rtf"),
zebra_stripes = TRUE)
# Example 8: Dark header style
table2rtf(results, file.path(tempdir(), "dark.rtf"),
dark_header = TRUE)
# Example 9: A4 paper for international submissions
table2rtf(results, file.path(tempdir(), "a4.rtf"),
paper = "a4")
# Example 10: Get flextable for customization
result <- table2rtf(results, file.path(tempdir(), "base.rtf"))
ft <- attr(result, "flextable")
# Customize the flextable
ft <- flextable::bold(ft, i = 1, part = "body")
ft <- flextable::color(ft, j = "p-value", color = "blue")
# Re-save
flextable::save_as_rtf(ft, path = file.path(tempdir(), "customized.rtf"))
# Example 11: Direct flextable return
ft <- table2rtf(results, file.path(tempdir(), "direct.rtf"), return_ft = TRUE)
ft <- flextable::bg(ft, bg = "yellow", part = "header")
# Example 12: Regulatory submission table
table2rtf(results, file.path(tempdir(), "submission.rtf"),
caption = "Table 2: Adjusted Odds Ratios for Mortality",
font_family = "Times New Roman",
font_size = 10,
indent_groups = TRUE,
zebra_stripes = FALSE,
bold_significant = TRUE)
# Example 13: Custom column alignment
table2rtf(results, file.path(tempdir(), "aligned.rtf"),
align = c("left", "left", "center", "right", "right"))
# Example 14: Disable significance bolding
table2rtf(results, file.path(tempdir(), "no_bold.rtf"),
bold_significant = FALSE)
# Example 15: Stricter significance threshold
table2rtf(results, file.path(tempdir(), "strict.rtf"),
bold_significant = TRUE,
p_threshold = 0.01)
# Example 16: Descriptive statistics for baseline characteristics
desc <- desctable(clintrial, by = "treatment",
variables = c("age", "sex", "bmi", "stage"), labels = clintrial_labels)
table2rtf(desc, file.path(tempdir(), "baseline.rtf"),
caption = "Table 1: Baseline Patient Characteristics",
zebra_stripes = TRUE)
# Example 17: Clinical trial efficacy table
table2rtf(results, file.path(tempdir(), "efficacy.rtf"),
caption = "Table 3: Primary Efficacy Analysis - Intent to Treat Population",
font_family = "Courier New", # Monospace for alignment
paper = "letter",
orientation = "landscape",
condense_table = TRUE)
options(old_width)
Export Table to LaTeX Format
Description
Converts a data frame, data.table, or matrix to LaTeX source code suitable for
inclusion in LaTeX documents. Generates publication-quality table markup with
extensive formatting options including booktabs styling, color schemes, and
hierarchical displays. Output can be directly \input{} or \include{}
into LaTeX manuscripts. Requires xtable for export.
Usage
table2tex(
table,
file,
format_headers = TRUE,
variable_padding = FALSE,
cell_padding = "normal",
bold_significant = TRUE,
bold_variables = FALSE,
p_threshold = 0.05,
align = NULL,
indent_groups = FALSE,
condense_table = FALSE,
condense_quantitative = FALSE,
booktabs = FALSE,
zebra_stripes = FALSE,
stripe_color = "gray!20",
dark_header = FALSE,
caption = NULL,
caption_size = NULL,
label = NULL,
show_logs = FALSE,
...
)
Arguments
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output |
format_headers |
Logical. If |
variable_padding |
Logical. If |
cell_padding |
Character string or numeric. Vertical padding within cells:
|
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
align |
Character string or vector specifying column alignment:
If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
booktabs |
Logical. If |
zebra_stripes |
Logical. If |
stripe_color |
Character string. LaTeX color specification for zebra
stripes (e.g., |
dark_header |
Logical. If |
caption |
Character string. Table caption for LaTeX caption command.
Supports multi-line captions using double backslash. Default is |
caption_size |
Numeric. Caption font size in points. If |
label |
Character string. LaTeX label for cross-references.
Example: |
show_logs |
Logical. If |
... |
Additional arguments passed to |
Details
Output Format:
The function generates a standalone LaTeX tabular environment that can be:
Included in documents with
\inputcommandEmbedded in table/figure environments
Used in manuscript classes (
article,report, etc.)
The output includes:
Complete tabular environment with proper alignment
Horizontal rules (
\hlineorbooktabsrules)Column headers with optional formatting
Data rows with automatic escaping of special characters
Optional caption and label commands
Required LaTeX Packages:
Add these to your LaTeX document preamble:
Always required:
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{array}
\usepackage{graphicx}
Optional (based on parameters):
\usepackage{booktabs}
\usepackage[table]{xcolor}
Booktabs Style:
When booktabs = TRUE, the table uses publication-quality rules:
-
\toprule- Heavy rule at top -
\midrule- Medium rule below headers -
\bottomrule- Heavy rule at bottom No vertical rules (
booktabsstyle)Better spacing around rules
This is the preferred style for most academic journals.
Color Features:
Zebra Stripes: Creates alternating background colors for visual grouping:
zebra_stripes = TRUE stripe_color = "gray!20" # 20% gray stripe_color = "blue!10" # 10% blue
Dark Header: Creates high-contrast header row:
dark_header = TRUE # Black background, white text
Both require the xcolor package with table option in your document.
Integration with LaTeX Documents:
Basic inclusion:
\begin{table}[htbp]
\centering
\caption{Regression Results}
\label{tab:regression}
\input{results.tex}
\end{table}
With resizing:
\begin{table}[htbp]
\centering
\caption{Results}
\resizebox{\textwidth}{!}{\input{results.tex}}
\end{table}
Landscape orientation:
\usepackage{pdflscape}
\begin{landscape}
\begin{table}[htbp]
\centering
\input{wide_results.tex}
\end{table}
\end{landscape}
Caption Formatting:
Captions in the caption parameter are written as LaTeX comments in
the output file for reference. For actual LaTeX captions, wrap the table
in a table environment (see examples above).
Special Characters:
The function automatically escapes LaTeX special characters in your data:
Ampersand, percent, dollar sign, hash, underscore
Left and right braces
Tilde and caret (using
textasciitildeandtextasciicircum)
Variable names and labels should not include these characters unless intentionally using LaTeX commands.
Value
Invisibly returns NULL. Creates a .tex file at the specified
location containing a LaTeX tabular environment.
See Also
autotable for automatic format detection,
table2pdf for direct PDF output,
table2html for HTML tables,
table2docx for Word documents,
table2pptx for PowerPoint,
table2rtf for Rich Text Format,
fit for regression tables,
desctable for descriptive tables
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pdf(),
table2pptx(),
table2rtf()
Examples
data(clintrial)
data(clintrial_labels)
# Create example table
results <- fit(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
labels = clintrial_labels
)
# Example 1: Basic LaTeX export
if (requireNamespace("xtable", quietly = TRUE)) {
table2tex(results, file.path(tempdir(), "basic.tex"))
}
# Example 2: With booktabs for publication
table2tex(results, file.path(tempdir(), "publication.tex"),
booktabs = TRUE,
caption = "Multivariable logistic regression results",
label = "tab:regression")
# Example 3: Multi-line caption with abbreviations
table2tex(results, file.path(tempdir(), "detailed.tex"),
booktabs = TRUE,
caption = "Table 1: Risk Factors for Mortality\\\\
aOR = adjusted odds ratio; CI = confidence interval\\\\
Model adjusted for age, sex, treatment, and disease stage",
label = "tab:mortality")
# Example 4: Hierarchical display with indentation
table2tex(results, file.path(tempdir(), "indented.tex"),
indent_groups = TRUE,
booktabs = TRUE)
# Example 5: Condensed table (reduced height)
table2tex(results, file.path(tempdir(), "condensed.tex"),
condense_table = TRUE,
booktabs = TRUE)
# Example 6: With zebra stripes
table2tex(results, file.path(tempdir(), "striped.tex"),
zebra_stripes = TRUE,
stripe_color = "gray!15",
booktabs = TRUE)
# Remember to add \usepackage[table]{xcolor} to the LaTeX document
# Example 7: Dark header style
table2tex(results, file.path(tempdir(), "dark_header.tex"),
dark_header = TRUE,
booktabs = TRUE)
# Requires \usepackage[table]{xcolor}
# Example 8: Custom cell padding
table2tex(results, file.path(tempdir(), "relaxed.tex"),
cell_padding = "relaxed",
booktabs = TRUE)
# Example 9: Custom column alignment (auto-detected by default)
table2tex(results, file.path(tempdir(), "custom_align.tex"),
align = c("c", "c", "c", "c", "c", "c", "c"))
# Example 10: No header formatting (keep original names)
table2tex(results, file.path(tempdir(), "raw_headers.tex"),
format_headers = FALSE)
# Example 11: Disable significance bolding
table2tex(results, file.path(tempdir(), "no_bold.tex"),
bold_significant = FALSE,
booktabs = TRUE)
# Example 12: Stricter significance threshold
table2tex(results, file.path(tempdir(), "strict_sig.tex"),
bold_significant = TRUE,
p_threshold = 0.01, # Bold only if p < 0.01
booktabs = TRUE)
# Example 13: With caption size control
table2tex(results, file.path(tempdir(), "caption_size.tex"),
caption_size = 6,
caption = "Table 1 - Results with Compact Caption\\\\
Smaller caption fits better on constrained pages")
# Example 14: Complete publication-ready table
table2tex(results, file.path(tempdir(), "final_table1.tex"),
booktabs = TRUE,
caption = "Table 1: Multivariable Analysis of Mortality Risk Factors",
label = "tab:main_results",
indent_groups = TRUE,
zebra_stripes = FALSE, # Many journals prefer no stripes
bold_significant = TRUE,
cell_padding = "normal")
# Example 15: Descriptive statistics table
desc_table <- desctable(clintrial, by = "treatment",
variables = c("age", "sex", "bmi"), labels = clintrial_labels)
table2tex(desc_table, file.path(tempdir(), "table1_descriptive.tex"),
booktabs = TRUE,
caption = "Table 1: Baseline Characteristics",
label = "tab:baseline")
# Example 16: Model comparison table
models <- list(
base = c("age", "sex"),
full = c("age", "sex", "treatment", "stage")
)
comparison <- compfit(
data = clintrial,
outcome = "os_status",
model_list = models
)
table2tex(comparison, file.path(tempdir(), "model_comparison.tex"),
booktabs = TRUE,
caption = "Model Comparison Statistics",
label = "tab:models")
Create Forest Plot for Univariable Screening
Description
Generates a publication-ready forest plot from a uniscreen() output
object. The plot displays effect estimates (OR, HR, RR, or coefficients) with
confidence intervals for each predictor tested in univariable analysis against
a single outcome.
Usage
uniforest(
x,
title = "Univariable Screening",
effect_label = NULL,
digits = 2,
p_digits = 3,
conf_level = 0.95,
font_size = 1,
annot_size = 3.88,
header_size = 5.82,
title_size = 23.28,
plot_width = NULL,
plot_height = NULL,
table_width = 0.6,
show_n = TRUE,
show_events = NULL,
indent_groups = FALSE,
condense_table = FALSE,
bold_variables = FALSE,
center_padding = 4,
zebra_stripes = TRUE,
color = NULL,
null_line = NULL,
log_scale = NULL,
labels = NULL,
show_footer = TRUE,
units = "in",
number_format = NULL
)
Arguments
x |
Univariable screen result object (data.table with class attributes
from |
title |
Character string specifying the plot title. Default is
|
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
digits |
Integer specifying the number of decimal places for effect estimates and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6 (60% table, 40% forest plot). |
show_n |
Logical. If |
show_events |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
color |
Character string specifying the color for point estimates in
the forest plot. Default is |
null_line |
Numeric value for the reference line position. Default is
|
log_scale |
Logical. If |
labels |
Named character vector providing custom display labels for
variables. Applied to predictor names in the plot.
Default is |
show_footer |
Logical. If |
units |
Character string specifying units for plot dimensions:
|
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Details
The forest plot displays univariable (unadjusted) associations between each predictor and the outcome. This is useful for:
Visualizing results of initial variable screening
Identifying potential predictors for multivariable modeling
Presenting crude associations alongside adjusted results
Quick visual assessment of effect sizes and significance
The plot automatically handles:
Different effect types (OR, HR, RR, coefficients) with appropriate axis scaling (log vs linear)
Factor variables with multiple levels (grouped under variable name)
Continuous variables (single row per predictor)
Reference categories for categorical variables
Value
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly:
print(plot)Saved to file:
ggsave("forest.pdf", plot, width = 12, height = 8)Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
- width
Numeric. Recommended plot width in specified units
- height
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
See Also
autoforest for automatic model detection,
uniscreen for generating univariable screening results,
multiforest for multi-outcome forest plots,
coxforest, glmforest, lmforest for
single-model forest plots
Other visualization functions:
autoforest(),
coxforest(),
glmforest(),
lmforest(),
multiforest()
Examples
data(clintrial)
data(clintrial_labels)
# Create example uniscreen result
uni_results <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "smoking", "treatment", "stage"),
labels = clintrial_labels,
parallel = FALSE
)
# Example 1: Basic univariable forest plot
p <- uniforest(uni_results, title = "Univariable Associations with Mortality")
old_width <- options(width = 180)
# Example 2: Survival analysis
library(survival)
surv_results <- uniscreen(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
p2 <- uniforest(surv_results, title = "Univariable Survival Analysis")
# Example 3: Linear regression
lm_results <- uniscreen(
data = clintrial,
outcome = "los_days",
predictors = c("age", "sex", "surgery", "diabetes"),
model_type = "lm",
labels = clintrial_labels,
parallel = FALSE
)
p3 <- uniforest(lm_results, title = "Predictors of Length of Stay")
# Example 4: Customize appearance
p4 <- uniforest(
uni_results,
title = "Crude Associations with Mortality",
color = "#E74C3C",
indent_groups = TRUE,
zebra_stripes = TRUE,
bold_variables = TRUE
)
# Example 5: Save with recommended dimensions
dims <- attr(p4, "rec_dims")
ggplot2::ggsave(file.path(tempdir(), "univariable_forest.pdf"),
p4, width = dims$width, height = dims$height)
options(old_width)
Univariable Screening for Multiple Predictors
Description
Performs comprehensive univariable (unadjusted) regression analyses by fitting separate models for each predictor against a single outcome. This function is designed for initial variable screening, hypothesis generation, and understanding crude associations before multivariable modeling. Returns publication-ready formatted results with optional p-value filtering.
Usage
uniscreen(
data,
outcome,
predictors,
model_type = "glm",
family = "binomial",
random = NULL,
p_threshold = 0.05,
conf_level = 0.95,
reference_rows = TRUE,
show_n = TRUE,
show_events = TRUE,
digits = 2,
p_digits = 3,
labels = NULL,
keep_models = FALSE,
exponentiate = NULL,
parallel = TRUE,
n_cores = NULL,
number_format = NULL,
verbose = NULL,
...
)
Arguments
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcome |
Character string specifying the outcome variable name. For
survival analysis, use |
predictors |
Character vector of predictor variable names to screen. Each predictor is tested independently in its own univariable model. Can include continuous, categorical (factor), or binary variables. |
model_type |
Character string specifying the type of regression model to fit. Options include:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
p_threshold |
Numeric value between 0 and 1 specifying the p-value threshold used to count significant predictors in the printed summary. All predictors are always included in the output table. Default is 0.05. |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match predictor names, values are the
display labels. Predictors not in |
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients (display
OR/HR/RR instead of log odds/log hazards). Default is |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for parallel
processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting functions
( |
Details
Analysis Approach:
The function implements a comprehensive univariable screening workflow:
For each predictor in
predictors, fits a separate model:outcome ~ predictorExtracts coefficients, confidence intervals, and p-values from each model
Combines results into a single table for easy comparison
Formats output for publication with appropriate effect measures
Each predictor is tested independently - these are crude (unadjusted) associations that do not account for confounding or interaction effects.
When to Use Univariable Screening:
-
Initial variable selection: Identify predictors associated with the outcome before building multivariable models
-
Hypothesis generation: Explore potential associations in exploratory analyses
-
Understanding crude associations: Report unadjusted effects alongside adjusted estimates
-
Variable reduction: Use p-value thresholds (e.g., p < 0.20) to identify candidates for multivariable modeling
-
Checking multicollinearity: Compare univariable and multivariable effects to identify potential collinearity
Threshold for p-values:
The p_threshold parameter controls the significance threshold used
in the printed summary to count how many predictors are significant. All
predictors are always included in the output table regardless of this setting.
Effect Measures by Model Type:
-
Logistic regression (
model_type = "glm",family = "binomial"): Odds ratios (OR) -
Cox regression (
model_type = "coxph"): Hazard ratios (HR) -
Poisson regression (
model_type = "glm",family = "poisson"): Rate/risk ratios (RR) -
Negative binomial (
model_type = "negbin"): Rate ratios (RR) -
Linear regression (
model_type = "lm"or GLM with identity link): Raw coefficient estimates -
Gamma regression (
model_type = "glm",family = "Gamma"): Multiplicative effects (with default log link)
Memory Considerations:
When keep_models = FALSE (default), fitted models are discarded after
extracting results to conserve memory. Set keep_models = TRUE only when
the following are needed:
Model diagnostic plots
Predictions from individual models
Additional model statistics not extracted by default
Further analysis of specific models
Value
A data.table with S3 class "uniscreen_result" containing formatted
univariable screening results. The table structure includes:
- Variable
Character. Predictor name or custom label (from
labels)- Group
Character. For factor variables: category level. For continuous variables: typically empty or descriptive statistic label
- n
Integer. Sample size used in the model (if
show_n = TRUE)- n_group
Integer. Sample size for this specific factor level (factor variables only)
- events
Integer. Total number of events in the model for survival or logistic regression (if
show_events = TRUE)- events_group
Integer. Number of events for this specific factor level (factor variables only)
- OR/HR/RR/Coefficient (95% CI)
Character. Formatted effect estimate with confidence interval. Column name depends on model type: "OR (95% CI)" for logistic, "HR (95% CI)" for survival, "RR (95% CI)" for counts, "Coefficient (95% CI)" for linear models
- p-value
Character. Formatted p-value from the Wald test
The returned object includes the following attributes accessible via attr():
- raw_data
data.table. Unformatted numeric results with separate columns for coefficients, standard errors, confidence interval bounds, etc. Suitable for further statistical analysis or custom formatting
- models
List (if
keep_models = TRUE). Named list of fitted model objects, with predictor names as list names. Access specific models viaattr(result, "models")[["predictor_name"]]- outcome
Character. The outcome variable name used
- model_type
Character. The regression model type used
- model_scope
Character. Always "Univariable" for screening results
- screening_type
Character. Always "univariable" to identify the analysis type
- p_threshold
Numeric. The p-value threshold used for significance
- significant
Character vector. Names of predictors with p-value below the screening threshold, suitable for passing directly to downstream modeling functions
See Also
fit for fitting a single multivariable model,
fullfit for complete univariable-to-multivariable workflow,
compfit for comparing multiple models,
m2dt for converting individual models to tables
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result()
Examples
# Load example data
data(clintrial)
data(clintrial_labels)
# Example 1: Basic logistic regression screening
screen1 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension"),
model_type = "glm",
family = "binomial",
parallel = FALSE
)
print(screen1)
# Example 2: With custom variable labels
screen2 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "treatment"),
labels = clintrial_labels,
parallel = FALSE
)
print(screen2)
# Example 3: Filter by p-value threshold
# Only keep predictors with p < 0.20 (common for screening)
screen3 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension",
"diabetes", "stage"),
p_threshold = 0.20,
labels = clintrial_labels,
parallel = FALSE
)
print(screen3)
# Only significant predictors are shown
# Example 4: Cox proportional hazards screening
library(survival)
cox_screen <- uniscreen(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage", "grade"),
model_type = "coxph",
labels = clintrial_labels,
parallel = FALSE
)
print(cox_screen)
# Returns hazard ratios (HR) instead of odds ratios
# Example 5: Keep models for diagnostics
screen5 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi", "creatinine"),
keep_models = TRUE,
parallel = FALSE
)
# Access stored models
models <- attr(screen5, "models")
summary(models[["age"]])
plot(models[["age"]]) # Diagnostic plots
# Example 6: Linear regression screening
linear_screen <- uniscreen(
data = clintrial,
outcome = "bmi",
predictors = c("age", "sex", "smoking", "creatinine", "hemoglobin"),
model_type = "lm",
labels = clintrial_labels,
parallel = FALSE
)
print(linear_screen)
# Example 7: Poisson regression for equidispersed count outcomes
# fu_count has variance ~= mean, appropriate for standard Poisson
poisson_screen <- uniscreen(
data = clintrial,
outcome = "fu_count",
predictors = c("age", "stage", "treatment", "surgery"),
model_type = "glm",
family = "poisson",
labels = clintrial_labels,
parallel = FALSE
)
print(poisson_screen)
# Returns rate ratios (RR)
# Example 8: Negative binomial for overdispersed counts
# ae_count has variance > mean (overdispersed), use negbin
if (requireNamespace("MASS", quietly = TRUE)) {
nb_screen <- uniscreen(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery"),
model_type = "negbin",
labels = clintrial_labels,
parallel = FALSE
)
print(nb_screen)
}
# Example 9: Gamma regression for positive continuous outcomes (\emph{e.g.,} costs)
gamma_screen <- uniscreen(
data = clintrial,
outcome = "los_days",
predictors = c("age", "sex", "treatment", "surgery"),
model_type = "glm",
family = Gamma(link = "log"),
labels = clintrial_labels,
parallel = FALSE
)
print(gamma_screen)
# Example 10: Hide reference rows for factor variables
screen10 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("treatment", "stage", "grade"),
reference_rows = FALSE,
parallel = FALSE
)
print(screen10)
# Reference categories not shown
# Example 11: Customize decimal places
screen11 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi", "creatinine"),
digits = 3, # 3 decimal places for OR
p_digits = 4 # 4 decimal places for p-values
)
print(screen11)
# Example 12: Hide sample size and event columns
screen12 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi"),
show_n = FALSE,
show_events = FALSE,
parallel = FALSE
)
print(screen12)
# Example 13: Access raw numeric data
screen13 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment"),
parallel = FALSE
)
raw_data <- attr(screen13, "raw_data")
print(raw_data)
# Contains unformatted coefficients, SEs, CIs, etc.
# Example 14: Force coefficient display instead of OR
screen14 <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "bmi"),
model_type = "glm",
family = "binomial",
parallel = FALSE,
exponentiate = FALSE # Show log odds instead of OR
)
print(screen14)
# Example 15: Screening with weights
screen15 <- uniscreen(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "bmi"),
model_type = "coxph",
weights = runif(nrow(clintrial), min = 0.5, max = 2), # Random numbers for example
parallel = FALSE
)
# Example 16: Strict significance filter (p < 0.05)
sig_only <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension",
"diabetes", "ecog", "treatment", "stage", "grade"),
p_threshold = 0.05,
labels = clintrial_labels,
parallel = FALSE
)
# Check how many predictors passed the filter
n_significant <- length(unique(sig_only$Variable[sig_only$Variable != ""]))
cat("Significant predictors:", n_significant, "\n")
# Example 17: Complete workflow - screen then use in multivariable
# Step 1: Screen with liberal threshold
candidates <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "bmi", "smoking", "hypertension",
"diabetes", "treatment", "stage", "grade"),
p_threshold = 0.20,
parallel = FALSE
)
# Step 2: Extract significant predictor names
sig_predictors <- attr(candidates, "significant")
# Step 3: Fit multivariable model with selected predictors
multi_model <- fit(
data = clintrial,
outcome = "os_status",
predictors = sig_predictors,
labels = clintrial_labels
)
print(multi_model)
# Example 18: Mixed-effects logistic regression (glmer)
# Accounts for clustering by site
if (requireNamespace("lme4", quietly = TRUE)) {
glmer_screen <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "glmer",
random = "(1|site)",
family = "binomial",
labels = clintrial_labels,
parallel = FALSE
)
print(glmer_screen)
}
# Example 19: Mixed-effects linear regression (lmer)
if (requireNamespace("lme4", quietly = TRUE)) {
lmer_screen <- uniscreen(
data = clintrial,
outcome = "biomarker_x",
predictors = c("age", "sex", "treatment", "smoking"),
model_type = "lmer",
random = "(1|site)",
labels = clintrial_labels,
parallel = FALSE
)
print(lmer_screen)
}
# Example 20: Mixed-effects Cox model (coxme)
if (requireNamespace("coxme", quietly = TRUE)) {
coxme_screen <- uniscreen(
data = clintrial,
outcome = "Surv(os_months, os_status)",
predictors = c("age", "sex", "treatment", "stage"),
model_type = "coxme",
random = "(1|site)",
labels = clintrial_labels,
parallel = FALSE
)
print(coxme_screen)
}
# Example 21: Mixed-effects with nested random effects
# Patients nested within sites
if (requireNamespace("lme4", quietly = TRUE)) {
nested_screen <- uniscreen(
data = clintrial,
outcome = "os_status",
predictors = c("age", "treatment"),
model_type = "glmer",
random = "(1|site/patient_id)",
family = "binomial",
parallel = FALSE
)
}
# Example 22: Quasipoisson for overdispersed count data
# Alternative to negative binomial when MASS not available
quasi_screen <- uniscreen(
data = clintrial,
outcome = "ae_count",
predictors = c("age", "treatment", "diabetes", "surgery", "stage"),
model_type = "glm",
family = "quasipoisson",
labels = clintrial_labels,
parallel = FALSE
)
print(quasi_screen)
# Adjusts standard errors for overdispersion
# Example 23: Quasibinomial for overdispersed binary data
quasibin_screen <- uniscreen(
data = clintrial,
outcome = "any_complication",
predictors = c("age", "bmi", "diabetes", "surgery", "ecog"),
model_type = "glm",
family = "quasibinomial",
labels = clintrial_labels,
parallel = FALSE
)
print(quasibin_screen)
# Example 24: Inverse Gaussian for highly skewed positive data
invgauss_screen <- uniscreen(
data = clintrial,
outcome = "recovery_days",
predictors = c("age", "surgery", "pain_score", "los_days"),
model_type = "glm",
family = inverse.gaussian(link = "log"),
labels = clintrial_labels,
parallel = FALSE
)
print(invgauss_screen)
Complete input validation for fit functions
Description
Master validation function called by fit(), uniscreen(), fullfit(). Performs comprehensive checks on data structure, variable existence, numeric parameter ranges, and model-outcome consistency. Returns validated parameters with auto-corrections applied when appropriate.
Usage
validate_fit_inputs(
data,
outcome,
predictors,
model_type,
family = NULL,
conf_level = 0.95,
digits = 2,
p_digits = 3,
p_threshold = NULL,
auto_correct_model = TRUE
)
Arguments
data |
Data frame or data.table containing all variables. |
outcome |
Character string outcome specification (may include Surv()). |
predictors |
Character vector of predictor variable names. |
model_type |
Character string model type to validate. |
family |
GLM family object, function, or string if applicable. |
conf_level |
Numeric confidence level (must be between 0 and 1). |
digits |
Integer number of decimal places for effect estimates. |
p_digits |
Integer number of decimal places for p-values. |
p_threshold |
Numeric p-value threshold for significance highlighting. |
auto_correct_model |
Logical whether to auto-correct model type mismatches. |
Value
List with validated model_type, family, auto_corrected flag.
Validate model type matches outcome specification
Description
Ensures consistency between the specified model type, outcome variable type, and GLM family (if applicable). Detects common mismatches like using survival outcomes with non-survival models or binary outcomes with linear models. Can auto-correct fixable issues or raise informative errors.
Usage
validate_model_outcome(
outcome,
model_type,
family = NULL,
data = NULL,
auto_correct = TRUE
)
Arguments
outcome |
Character string outcome specification (may include Surv()). |
model_type |
Character string specified model type. |
family |
GLM family object, function, or string if applicable. |
data |
Data frame or data.table for outcome type detection. |
auto_correct |
Logical whether to auto-correct fixable mismatches. |
Details
Checks for mismatches and auto-corrects or errors as appropriate.
Value
List with model_type, family, messages, auto_corrected flag.
Validate number_format parameter
Description
Checks that a number_format value is valid before use. Called early
in top-level functions to fail fast with a clear error message.
Usage
validate_number_format(number_format)
Arguments
number_format |
Value to validate. |
Value
Invisibly returns TRUE if valid.
Validate outcome exists in data
Description
Checks that the specified outcome variable (or survival variables within Surv() expression) exists in the dataset. Raises informative error if variables are missing. Handles both simple outcomes and Surv() expressions.
Usage
validate_outcome_exists(data, outcome)
Arguments
data |
Data frame or data.table to check. |
outcome |
Character string outcome specification (may include Surv()). |
Value
Invisible TRUE if validation passes, otherwise stops with error.
Validate outcome homogeneity for multifit
Description
Checks whether all outcomes in a multifit analysis are compatible with the specified model type. Issues a warning when outcomes appear to be of mixed types (e.g., binary and continuous outcomes in the same analysis), which would produce tables with incompatible effect measures.
Usage
validate_outcome_homogeneity(data, outcomes, model_type, family = "binomial")
Arguments
data |
Data.table containing the analysis data. |
outcomes |
Character vector of outcome variable names. |
model_type |
Character string specifying the model type. |
family |
Character string specifying the GLM family (for glm/glmer). |
Value
Invisible NULL. Issues warnings if problems are detected.
Validate predictors exist in data
Description
Checks that all specified predictor variables exist in the dataset. Handles interaction terms (splits on ":"), mixed-effects random effects (ignores "|" syntax), and raises informative errors for missing variables.
Usage
validate_predictors_exist(data, predictors)
Arguments
data |
Data frame or data.table to check. |
predictors |
Character vector of predictor variable names. |
Value
Invisible TRUE if validation passes, otherwise stops with error.