---
title: "Model-based continuous summary tables in R"
description: >
  Build model-based summary tables for continuous outcomes in R with
  table_continuous_lm(): estimated marginal means, robust /
  cluster-robust / bootstrap / jackknife standard errors, case
  weights, additive covariate adjustment (G-computation or
  equal-weight), four effect-size families (f2, Cohen's d, Hedges'
  g, Hays' omega2) with noncentral CIs, and APA-style output to
  console, gt, tinytable, flextable, Excel, Word, or clipboard.
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model-based continuous summary tables in R}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true")

source("_pkgdown-helpers.R")
```

```{r setup}
library(spicy)
```

`table_continuous_lm()` is the model-based companion to
`table_continuous()`. It fits one linear model per selected
continuous outcome --
`lm(outcome ~ by + covariate1 + covariate2 + ...)` if covariates are
supplied, otherwise `lm(outcome ~ by)` -- and returns a compact
reporting table. This is the right choice when you want to stay in
a linear-model workflow with heteroskedasticity-consistent or
cluster-robust standard errors, bootstrap / jackknife variance,
case weights, additive covariate adjustment, or one of four
effect-size families (`f2`, Cohen's *d*, Hedges' *g*, Hays'
\(\omega^2\)) with noncentral CIs.

## Basic usage

Use `select` for one or more continuous outcomes and `by` for the
single predictor:

```{r basic}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi, life_sat_health),
  by = sex
)
```

For categorical predictors, the table reports estimated means by level.
When the predictor is dichotomous, it can also show a single mean
difference and confidence interval.

## Robust standard errors

Use `vcov = "HC*"` when you want heteroskedasticity-consistent standard
errors and tests:

```{r robust}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  vcov = "HC3",
  statistic = TRUE
)
```

The `HC*` family is computed via [sandwich::vcovHC()] and includes
`"HC0"`, `"HC1"`, `"HC2"`, `"HC3"` (default for small / moderate
samples), `"HC4"`, `"HC4m"`, and `"HC5"`.

## Cluster-robust standard errors

When observations are not independent (repeated measurements per
individual, students nested in classes, panel data), classical and
`HC*` standard errors are biased downward. Use the `CR*` family
together with `cluster = id_var` to get cluster-robust inference,
dispatched to `clubSandwich`:

```{r cluster, eval = requireNamespace("clubSandwich", quietly = TRUE)}
table_continuous_lm(
  sleep,
  select = extra,
  by = group,
  vcov = "CR2",
  cluster = ID,
  statistic = TRUE
)
```

`"CR2"` is the recommended default (Bell & McCaffrey 2002;
Pustejovsky & Tipton 2018). It produces fractional Satterthwaite
degrees of freedom, rendered in the displayed test header as e.g.
`t(8.7)` or `F(2, 12.4)`. `"CR1"` matches Stata's
`vce(cluster id)` default.

## Bootstrap and jackknife

For situations where the residual distribution is non-standard or
the sample is small, `vcov = "bootstrap"` and `vcov = "jackknife"`
provide resampling-based variance estimators in pure base R (no
dependency added):

```{r bootstrap, eval = FALSE}
table_continuous_lm(
  sochealth,
  select = wellbeing_score,
  by = sex,
  vcov = "bootstrap",
  boot_n = 1000  # default
)
```

When `cluster` is supplied, bootstrap switches to a cluster
bootstrap (Cameron, Gelbach & Miller 2008) and jackknife to
leave-one-cluster-out (Quenouille 1956). Both estimators use
asymptotic inference: `z` for single-coefficient contrasts and
`chi^2(q)` for the global Wald test on `k > 2` categorical
predictors, rendered in the displayed test header.

## Case weights

Use `weights` when you want weighted estimated means or slopes in the
same model-based table:

```{r weights}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = education,
  weights = weight,
  show_weighted_n = TRUE
)
```

This is often the most natural summary-table function when your
reporting workflow already relies on weighted linear models.

## Numeric predictors

If `by` is numeric, `table_continuous_lm()` reports the slope rather
than group means:

```{r numeric-by}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = age,
  vcov = "HC3"
)
```

When you need the underlying returned data for further processing, use
`output = "data.frame"` for the wide raw table or `output = "long"` for
the analytic long table.

## Covariate adjustment

When the comparison of `by`-group means is potentially confounded by
other variables, supply them through `covariates`. Each per-outcome
linear model becomes
`lm(outcome ~ by + covariate1 + covariate2 + ...)`, and the reported
`Δ` (categorical `by`) or slope (numeric `by`), its CI, *p*, R² and
effect size are all **adjusted for** the covariates. The default
ASCII print emits an APA-style footer naming the adjustment so the
reader knows which estimand they are reading.

```{r adjustment-basic}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  covariates = c(age, education),
  vcov = "HC3"
)
```

`covariates` accepts the full tidyselect vocabulary
(`c(age, sex)`, `tidyselect::starts_with("life_sat_")`,
`tidyselect::all_of(cov_names)`) or a literal character vector. A
covariate that also appears in `select` is silently auto-excluded
from the outcome list (mirroring how `by` / `weights` are excluded
from `select`); a covariate equal to `by` is rejected up front.

### Two estimands for the adjusted marginal means

Two methodologically distinct conventions exist for computing the
covariate-adjusted per-group means displayed in the table (the
`M (level)` columns of the printed output, and the `emmean` /
`emmean_se` / `emmean_ci_lower` / `emmean_ci_upper` columns of the
`output = "long"` data frame) when `by` is categorical.
`table_continuous_lm()` exposes the choice via `adjustment`; both
are valid but produce different numerical answers when at least one
factor covariate has non-uniform observed proportions.

* `adjustment = "proportional"` (the **default**) — G-computation /
  population-weighted standardisation. For each focal level of `by`,
  predictions are evaluated at every observation in the data with
  `by` replaced by that level (covariates kept at their observed
  values), then averaged. This is the convention of Stata's
  `margins` and `marginaleffects::avg_predictions(variables)`.
  Interpretation: *"What is the predicted mean in this population
  if everyone had `by = lvl`, holding their actual covariate
  profile?"*. Best for causal / counterfactual reporting.

* `adjustment = "balanced"` — equal-weight averaging on a synthetic
  grid. Factor-covariate level combinations are crossed and weighted
  uniformly (`1 / k`); numeric covariates are fixed at their sample
  mean. This is the convention of `emmeans::emmeans()` (default),
  SPSS UNIANOVA EMMEANS, and SAS LSMEANS. Interpretation:
  *"What is the predicted mean assuming a balanced covariate
  design?"*. Best for descriptive / ANOVA-style APA reporting.

The two methods coincide trivially when `covariates` is `NULL`, and
also when all covariates are numeric or logical (no factor levels
to expand over). They diverge only when categorical covariate
proportions are skewed in the observed data.

```{r adjustment-balanced}
# Same model, balanced (emmeans-style) marginal means:
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  covariates = c(age, education),
  adjustment = "balanced",
  vcov = "HC3"
)
```

### Effect sizes under adjustment

Adding covariates changes which effect sizes remain meaningful:

* `effect_size = "f2"` and `"omega2"` are reported as **partial**
  *f²* / partial *ω²* via `stats::drop1()` restricted to the focal
  term — the correct generalisation of the bivariate effect size to
  a covariate-adjusted model.
* `effect_size = "d"` and `"g"` raise `spicy_unsupported`: Cohen's
  *d* and Hedges' *g* have no defined extension under adjustment
  (the pooled SD has no canonical analogue). The error message
  points to `f2` / `omega2` as the partial-F generalisations.

Both effect-size variants and their noncentral CIs (when
`effect_size_ci = TRUE`) are computed from the same fitted model as
the displayed coefficients and remain invariant to `vcov` (HC* /
CR* / bootstrap / jackknife only affect the contrast SE).

### Scope: additive only in v1

`covariates` accepts additive terms only. Formula syntax with
interactions or transforms (`covariates = ~ age * sex`,
`covariates = ~ I(age^2)`) raises `spicy_unsupported` with a
migration hint. Interactions and polynomials are reserved for a
future spicy release. To fit a fully-specified model right now,
do so with `lm()` directly and inspect via `summary()` or
`table_regression()` (which accepts interaction / polynomial / `I()`
terms natively).

## Effect sizes

`table_continuous_lm()` supports four effect sizes via the
`effect_size` argument. All are derived from the **same fitted model**
as the displayed coefficients and `R²`, so the table stays internally
consistent — and all are **invariant to `vcov`** (the choice of
classical or `HC*` standard errors changes the contrast SE, CI, and
test statistic but not the standardized effect size itself).

Cohen's *d* and Hedges' *g* are the conventions for two-group
comparisons (Cohen 1988; Hedges and Olkin 1985; APA 2020) and require
`by` to have exactly two non-empty levels:

```{r es-d}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = smoking,
  effect_size = "d"
)
```

Hedges' *g* applies the small-sample correction
`J = 1 - 3/(4·df_resid - 1)`. It is generally preferred over raw *d*
in published reports (Goulet-Pelletier and Cousineau 2018):

```{r es-g}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = smoking,
  effect_size = "g"
)
```

For categorical predictors with three or more levels (or numeric
predictors), use Hays' `omega²` for a less biased estimate of the
population variance explained (Hays 1963; Olejnik and Algina 2003;
Lakens 2013):

```{r es-omega2}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = education,
  effect_size = "omega2"
)
```

Cohen's `f²` (= `R² / (1 - R²)`) is the effect size familiar from
power-analysis frameworks (e.g. G*Power) and is defined for any
predictor type:

```{r es-f2}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = age,
  effect_size = "f2"
)
```

## Confidence intervals for effect sizes

Setting `effect_size_ci = TRUE` adds a confidence interval for the
effect size. The implementation uses the modern noncentral-distribution
inversion approach — the consensus standard in commercial statistical
software (Stata `esize` / `estat esize`, SAS `PROC TTEST` and
`PROC GLM EFFECTSIZE`) and in mainstream R packages (`effectsize`,
`MOTE`, `TOSTER`, `effsize`):

- Noncentral *t* inversion for `"d"` and `"g"` (Steiger and Fouladi
  1997; Goulet-Pelletier and Cousineau 2018; Cousineau and
  Goulet-Pelletier 2021), which has nominal coverage across sample
  sizes — unlike the older Hedges-Olkin normal approximation that is
  biased for small samples.
- Noncentral *F* inversion for `"omega2"` and `"f2"`
  (Steiger 2004; Smithson 2003).

In the printed and rendered outputs, the effect size is displayed with
the bracketed CI (article-ready):

```{r es-ci}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = smoking,
  effect_size = "g",
  effect_size_ci = TRUE
)
```

The CI level follows `ci_level` (default `0.95`). For programmatic
access, the wide raw output exposes separate numeric columns, and the
long output always includes them:

```{r es-ci-raw}
table_continuous_lm(
  sochealth,
  select = wellbeing_score,
  by = smoking,
  effect_size = "g",
  effect_size_ci = TRUE,
  output = "data.frame"
)
```

```{r es-ci-long}
out <- table_continuous_lm(
  sochealth,
  select = wellbeing_score,
  by = smoking,
  effect_size = "g",
  effect_size_ci = TRUE,
  output = "long"
)
out[, c("variable", "es_type", "es_value", "es_ci_lower", "es_ci_upper")]
```

When `weights` is supplied, all four effect sizes (and their CIs) are
computed from the weighted least-squares fit, keeping them consistent
with the weighted contrast and its CI (DuMouchel and Duncan 1983).
This is the right convention for case-weighted reporting; for
propensity-score balance assessment (Austin and Stuart 2015) or
complex-survey designs, dedicated packages (`cobalt::bal.tab()` and
`survey`) are more appropriate.

## Article-style polish

Pretty outcome labels and a comma decimal separator (useful for
European reporting):

```{r polish}
table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  labels = c(
    wellbeing_score = "WHO-5 wellbeing (0-100)",
    bmi = "Body-mass index (kg/m²)"
  ),
  effect_size = "g",
  effect_size_ci = TRUE,
  decimal_mark = ","
)
```

## Publication-ready output

The function supports the same output formats as the other summary-table
helpers, including `tinytable`, `gt`, `flextable`, `excel`, `word`, and
`clipboard`.

```{r gt-output, eval = build_rich_tables}
pkgdown_dark_gt(
  table_continuous_lm(
    sochealth,
    select = c(wellbeing_score, bmi, life_sat_health),
    by = sex,
    vcov = "HC3",
    statistic = TRUE,
    output = "gt"
  )
)
```

## Tidying for downstream pipelines

`table_continuous_lm()` returns an object that can be coerced to a
plain `data.frame` / `tbl_df` (stripping the spicy formatting
attributes) or piped into `broom::tidy()` / `broom::glance()` for
any downstream tidyverse-stats workflow:

```{r tidy-glance}
out <- table_continuous_lm(
  sochealth,
  select = c(wellbeing_score, bmi),
  by = sex,
  effect_size = "g",
  effect_size_ci = TRUE
)

# One row per estimated parameter: emmean per level, contrast for
# binary predictors, slope for numeric predictors.
broom::tidy(out)

# One row per outcome with model-level statistics: r.squared,
# adj.r.squared, F / t, df, p.value, nobs, weighted_n, plus the
# effect-size summary.
broom::glance(out)
```

## See also

- See `vignette("table-continuous", package = "spicy")` for descriptive
  continuous summary tables with classical group-comparison tests.
- See `vignette("summary-tables-reporting", package = "spicy")` for a
  cross-function reporting workflow using the summary-table helpers.