---
title: "How to use normref"
author: "Marieke E. Timmerman, Klazien de Vries, and Hannah M. Heister"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib  
vignette: >
  %\VignetteIndexEntry{How to use normref}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```



# Introduction

The `normref` package provides tools to calculate continuous norms for psychological test scores, where the norms depend on a single continuous variable, denoted as the norm predictor. Typically, the norm predictor is age. The package allows for a single norm predictor. Applying the norms results in standardized scores such as percentiles, T-scores, or IQ scores. Such standardized scores allow for norm-referenced interpretation, that is, comparing an individual’s performance to that of a relevant reference population.

For example, an IQ score expresses an individual’s performance relative to the general population of the same age. To estimate such norms, `normref` requires raw scores from a representative normative sample.  
This vignette illustrates how to derive norms from such a sample, following the approach described in a tutorial [@timmerman2021tutorial].

---
  
# Workflow overview
  
The main workflow in `normref` consists of:
  
1. Shaping the data (``shape_data``)  
2. Fitting candidate models (`fb_select`)  
3. Inspecting model fit (BIC, worm plots, centile plots)  
4. Creating norm tables (`normtable_create`, `plot_normtable`)  
5. *(Optional)* Adding confidence intervals (CIs) for normed scores based on age-dependent reliability estimates 

Advanced functionality includes norming of composite scores, norms for new data, user-defined distributions, and adding CIs based on age-dependent reliability estimates.

---
  
# Workflow
  
## Example: norming a single test
  
  We illustrate the process using IDS data [@grob2018ids; @grob2018idsNL].
An anonymized and perturbed version of this data is included in `normref`.  
We focus on Test 14, with $n=1660$ children and raw scores ranging from 2–34.

Three candidate distributions are considered:  

- Beta-Binomial (BB): suitable for bounded integer scores  
- Box-Cox Power Exponential (BCPE): flexible for continuous scores  
- Normal (NO): a simpler benchmark  

To compare candidate models, we use the free order model selection procedure with the Bayesian Information Criterion (BIC) as selection criterion, which has been shown to be effective in model selection [@voncken2019model].

```{r loadlibrary, echo=TRUE, include=TRUE}
library(normref)
library(gamlss.dist) # also needed for vignette
```
First, we load the data.

```{r getdata, echo=TRUE, results='hide'}
get(data("ids_data"))
```
## Shape the data
 
Before fitting any distributions to model the raw scores of Test 14 as a function of age, we must ensure that the data are in the correct format for gamlss implementation. This is handled by the function `shape_data`.

The `shape_data` function verifies whether the response variable is properly formatted and, if necessary, reshapes it. For binomial-type distributions like the BB, gamlss requires the response to be a two-column matrix: one column for the number of correct items and one for the number of incorrect items. For each individual, these columns sum to the maximum possible raw score on the test.

By default, `shape_data` uses the highest observed raw score in the data as the maximum. This can be overridden with the argument max_score. The function also removes any cases with missing values for the raw scores or age.

Applying `shape_data` to our data produces a new dataframe, `mydata_BB`, which includes the additional column `"shaped_score"` containing the scores in the correct format for BB models:
```{r shapedata, echo=TRUE, include=TRUE}
mydata_BB <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y14",
                     family = "BB",
                     max_score = 34)
```

We can also apply shape data to obtain scores in the correct format for the BCPE model, which requires nonzero, positive scores.

```{r selectmodel_BCPE_y14, echo=TRUE, include=TRUE}
mydata_BCPE <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y14",
                     family = "BCPE")

```
Because Test 14 has no results with zero scores, the scores were not transformed. A new column `"shaped_score"` was created, but it is identical to `"y14"`.

## Fit candidate models

To fit the candidate models, we use the model selection function `fb_select()`, which implements the free order model selection procedure [@voncken2019model].
We first apply this procedure to the BB distribution. Note that we pass `"shaped_score"` as the `score_name`, because this column contains the reshaped response variable created by `shape_data()`.

```{r selectmodel_BB_y14, echo=TRUE, include=TRUE}
mod_BB_y14 <- fb_select(data = mydata_BB,
                       age_name = "age",
                       score_name = "shaped_score",
                       family = "BB",
                       selcrit = "BIC")
```
The free order model selection procedure selects a second degree polynomial of age for both the $\mu$ and $\sigma$ parameters of the BB model.

As alternative models, we also fit the BCPE and NO models.
```{r selectmodel_BCPENO_y14, echo=TRUE, include=TRUE}
mod_BCPE_y14 <- fb_select(data = mydata_BCPE,
                         age_name = "age",
                         score_name = "shaped_score",
                         family = "BCPE",
                         selcrit = "BIC") 

mod_NO_y14 <- fb_select(data = ids_data,
                      age_name = "age",
                      score_name = "y14",
                      family = "NO",
                      selcrit = "BIC")
``` 

## Inspect model fit
  

After fitting the BB, BCPE, and NO models, we first compare their BIC values. Lower BIC values indicate better model fit.
  
```{r showfit, echo=TRUE, include=TRUE}
mod_BB_y14$selcrit
mod_BCPE_y14$selcrit
mod_NO_y14$selcrit
``` 
The results show that the BCPE model provides the best fit, as it has the lowest BIC. The NO model follows at some distance, while the BB model performs worst according to BIC.

To further evaluate model fit, we inspect worm plots [@buuren2001worm]. Worm plots display the relationship between empirical quantiles and model-implied quantiles.

* Ideally, the points (the “worm”) are close to the red horizontal zero line,

* and most values fall inside the funnel-shaped confidence bands (dashed lines).

For Test 14, the worm plot confirms that the BCPE model has the best fit: the worm is relatively flat, and nearly all values lie within the confidence bands. By contrast, the BB and NO models show some deviations from the expected pattern in the intervals [-4, -2] and [-4, 0] respectively.
```{r show_wormplot, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
gamlss::wp(mod_BCPE_y14, ylim.all = 2)
gamlss::wp(mod_BB_y14, ylim.all = 2) 
gamlss::wp(mod_NO_y14, ylim.all = 2) 
``` 

Because both BIC values and worm plots point to the BCPE as the superior model, we select this model for further inspection and prediction.

To fully evaluate the fit, worm plots should also be created per age interval, since overall diagnostics can hide local misfit across the age range.  
This can be done with the arguments `xvar` (the age variable) and `n.inter` (the number of age intervals). For example, for the BCPE model:  

```{r show_wormplot_age, echo=TRUE, include=TRUE, fig.width=6, fig.height=4, warning=FALSE, message = FALSE}
gamlss::wp(mod_BCPE_y14, xvar = age, ylim.worm = 2, n.inter = 4)
```

Note that a sample size of at least 200 per worm plot is recommended for stable results [@buuren2001worm].

Once the best-fitting model has been selected (BCPE in this case), we can visually inspect its fit using centile curves. These curves show the predicted centiles of test scores as a function of age, allowing us to see how scores are distributed across the age range.

```{r plot_model_BCPE, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
gamlss::centiles(mod_BCPE_y14, cent = c(5,25,50,75,95))
``` 
These centile plots complement worm plots and BIC comparisons by providing a direct, visual summary of the distribution of scores across ages, making it easier to assess model fit and to interpret age-dependent norms.

For most distributions, the centiles function in GAMLSS works well. However, for binomial-type distributions such as the BB distribution, the standard centiles function does not always render correctly. For these cases, `centiles_bin()` provides an alternative (e.g., `centiles_bin(mod_BB_y14, xvar = age, cent = c(5,25,50,75,95))`).



## Create norm tables

After selecting the best-fitting model, we can generate norm tables using the function `normtable_create()`. By default, the function produces z-scores (i.e., $\mu_{age} = 0, \sigma_{age} = 1$), but other norm types can also be requested:

- T-scores: ($\mu_{age} = 50, \sigma_{age} = 10$)
- IQ-scores: ($\mu_{age} = 100, \sigma_{age} = 15$)

The function returns a list containing:

- `cdf_matrix`: the percentiles for each raw score and age  
- `norm_matrix`: the corresponding normed scores in the requested norm type  
- `norm_sample` and `cdf_sample`: the percentile and normed scores for the observed sample  

Optionally, an Excel file can be generated by specifying `excel_name`. The Excel file contains:

1. Norm matrix tab: first column shows ages, first row shows raw scores, with the normed score at each age/score combination.  
2. Norm sample tab: the ages and associated normed scores of the observed sample.  

For example, in the norm matrix:

- At age 4.0, a raw score of 10 corresponds to an IQ of 129.1 (almost 2 SD above the mean).  
- At age 6.2, the same raw score of 10 corresponds to an IQ of 97.0.  

In this vignette, we save results to a temporary Excel file (`tempfile()`) to comply with CRAN policies.  
In real applications, users can specify a permanent path, e.g.  
`excel_name = "norms_y14_1.xlsx"`.

```{r, normtable_BCPE, echo=TRUE, include=TRUE}
temp_file <- tempfile(fileext = ".xlsx")
norm_mod_BCPE_y14  <- normtable_create(model = mod_BCPE_y14,
                                      data = mydata_BCPE,
                                      age_name = "age",
                                      score_name = "shaped_score", 
                                      normtype = "IQ", 
                                      excel_name = temp_file,
                                      excel = TRUE)
head(norm_mod_BCPE_y14[["norm_matrix"]][, c(1, 11)],
     n = 15)
``` 

We can visualize the generated norm tables using the function `plot_normtable()`. This function plots the percentiles as a function of age for each raw score. Each line represents a specific raw score:

- Lines are ordered from highest score (most difficult) at the top to lowest score (easiest) at the bottom.  
- Green circles indicate the combinations of raw scores and ages observed in the sample.  

This visualization allows us to inspect how raw scores translate to normed scores across ages and to see whether the norms behave as expected.

```{r plot_normtable_y14, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
plot_normtable(normtable = norm_mod_BCPE_y14)
``` 

## Confidence intervals for normed scores

By default, `normtable_create()` returns point estimates of the norms. Optionally, you can also calculate confidence intervals (CIs) for the normed scores. The default confidence level is 0.95 (95% interval).  

The CIs are estimated using an extension of the group model from classical test theory, adapted for age-dependent scores [@heister2024item]. To compute the upper and lower boundaries, you need a dataframe containing two vectors:

- `age`: the ages at which the norms are evaluated  
- `rel`: the estimated test reliability at each age  

The resulting CIs are stored in the output tables with the suffix `_lower` and `_upper` (e.g., `norm_sample_lower` and `norm_sample_upper`).  

For distributions without a closed-form mean and standard deviation, such as the BCPE distribution, computing the CIs can be slower because these quantities need to be estimated via resampling.

In the example below, we use the fictitious age-dependent reliabilities provided in `ids_rel_data`. Later, in the section *Estimating test reliability dependent on age*, we show how such reliabilities can be derived from individual item scores using a sliding window approach. Note that for Test 14, we cannot estimate these reliabilities from raw scores, as the individual items are not available.

```{r create_normtable_y14, echo=TRUE, results='hide'}
get(data("ids_rel_data"))
temp_file <- tempfile(fileext = ".xlsx")
norm_mod_BCPE_y14  <- normtable_create(model = mod_BCPE_y14,
                                      data = mydata_BCPE,
                                      age_name = "age",
                                      score_name = "shaped_score",
                                      datarel = ids_rel_data,
                                      normtype = "IQ",
                                      excel_name = temp_file,
                                      excel = TRUE)
``` 

# Advanced features

## Norming of composite scores

Composite scores are calculated by combining multiple subtest scores, typically as the sum of their standardized scores (z-scores). To illustrate this process, we first fit a BCPE model to Test 7. Based on the wormplot inspection (not shown), the model fits the data well. Note that fitting BCPE models can be computationally intensive.

```{r modelnorms_y7, echo=TRUE, include=TRUE}
mydata_BCPE_y7 <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y7",
                     family = "BCPE")
mod_BCPE_y7 <- fb_select(data = mydata_BCPE_y7,
                        age_name = "age",
                        score_name = "shaped_score",
                        family = "BCPE",
                        selcrit = "BIC")
norm_mod_BCPE_y7 <- normtable_create(model = mod_BCPE_y7,
                                    data = mydata_BCPE_y7,
                                    age_name = "age",
                                    score_name = "shaped_score")
``` 
Next, we calculate composite scores based on the norm tables of Test 7 and Test 14. The workflow is as follows:

1. Create a list of the relevant subtest norm tables.
2. Use the function `composite_shape()` to compute the composite scores as the sum of z-scores across the tests.
3. Fit a normal distribution to the composite scores, since sums of normally distributed variables are  normally distributed.
4. Generate a norm table for the composite scores using `normtable_create()`.


```{r modelnormcompositescore, echo=TRUE, include=TRUE}
composite <- list(norm_mod_BCPE_y7,
                  norm_mod_BCPE_y14)

composite_data <- composite_shape(normtables = composite)

modNO_comp <- fb_select(
  data = composite_data,
  age_name = "age",
  score_name = "z_sum",
  family = "NO",
  selcrit = "BIC"
)

norm_modNO_comp <- normtable_create(modNO_comp,
                                   composite_data,
                                   age_name = "age",
                                   score_name = "z_sum",
                                   cont_cor = TRUE)
``` 
## Norms for a new sample of individuals

It is possible to calculate normed scores directly for a new sample, using their raw scores and ages. 

In this example, we select the raw scores and ages of the first five individuals in our dataset. We then use the previously estimated model to obtain their normed scores and display the results.

```{r newsamplenorm, echo=TRUE, include=TRUE}
newdata <- ids_data[1:5,c("age","y14")]
norm_mod_BCPE_newdata <- normtable_create(model = mod_BCPE_y14,
                                         data = newdata,
                                         age_name = "age",
                                         score_name = "y14", 
                                         new_data = TRUE,
                                         datarel = ids_rel_data)
norm_mod_BCPE_newdata[["norm_sample"]]
``` 

## Norming with a truncated model

Free order model selection with the `fb_select` is also available for distributions that are defined by the user, for example truncated distributions generated with the `gamlss.tr` package. Using truncated distributions can improve model fit when continuous distributions are applied to scores that have natural bounds. By explicitly accounting for minimum and maximum possible values, truncation prevents unrealistic predictions outside the valid score range and often leads to more accurate norm estimates. We illustrate the use of a truncated NO distribution for the `ids_kn_data` data, which consists of 34 binary item scores. This example uses a NO distribution for demonstration purposes only. In practice, users should explore several candidate distributions to select the one that best fits their data.

```{r getdata_ids_kn_data, echo=TRUE, results='hide'}
get(data("ids_kn_data"))
```
To fit a truncated NO model, we first create a new distribution using the `gen.trun()` function. We specify truncation at both ends of the distribution (`type="both"`) with a minimum score of 0 and a maximum score of 34 (`par=c(0,34)`). We apply this to the NO family (`family="NO"`) and name the new distribution `tr34`, resulting in a distribution called `NOtr34`.

Next, we shape the data appropriately for the truncated distribution and fit the model.

```{r fit truncated model, echo=TRUE, include=TRUE}

gamlss.tr::gen.trun(par=c(0,34), family="NO", name="tr34", type="both")

mydata_NOtr34 <- shape_data(data = ids_kn_data,
                     age_name = "age_years",
                     score_name = "rawscore",
                     family = "NOtr34")
mod_NOtr34_KN <- fb_select(data = mydata_NOtr34,
                        age_name = "age_years",
                        score_name = "shaped_score",
                        family = "NOtr34")
```


## Estimating test reliability dependent on age

To calculate confidence intervals for the normed scores, we first estimate the reliability of the test as a function of age using a sliding window approach. This approach is implemented in the function `reliability_window()`, which computes Cronbach's alpha within age-specific groups [@heister2024item]. We again use the `ids_kn_data` dataset.

To estimate the reliability of the test, information at the item-level is needed.
First, we identify the columns in the dataset corresponding to the items.

```{r item_variables_ids_kn_data, echo=TRUE, include=TRUE}
item_variables_kn <- which(substr(colnames(ids_kn_data),1,2) == "KN")
```

To choose appropriate parameters for estimating age-dependent reliabilities, we first explore a range of window widths and step sizes. This helps ensure that the estimated reliabilities are stable and vary smoothly with age. The function `different_rel()` allows us to compare the reliability estimates across different settings:

```{r selectwindowstep_ids_kn_data, echo=TRUE, include=TRUE}
rel_int <- different_rel(data = ids_kn_data, 
                         item_variables = item_variables_kn, 
                         step_window = c(0.5, 1 , 2, 5, 10, 20), 
                         age_name = "age_years",
                         min_agegroup = 5,
                         max_agegroup = 20,
                         step_agegroup = c(0.5,1,1.5,2))
```

We then visualize the reliability estimates across age for the different window widths and step sizes to guide our selection. Ideally, the step size should be small enough to capture smooth changes in reliability, and the window width should be wide enough to assume approximately equal true scores within the window. 

```{r plot_selectwindow_ids_kn_data, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
plot_drel(rel_int, ncol = 2)
```

For this example, we select a step size of 2 and a window width of 2.

```{r estrel_ids_kn_data, echo=TRUE, include=TRUE}
rel_est <- reliability_window(data = ids_kn_data,
                   age_name = "age_years",
                   item_variables = item_variables_kn,
                   window_width = 2)
```

When we estimate the normed scores using the truncated BCPE model, we can now construct reliability based CIs for the normed scores.

```{r calculate confidence intervals, echo=TRUE, include=TRUE}
norm_mod_BCPEtr34_KN  <- normtable_create(model = mod_NOtr34_KN,
                                     data = mydata_NOtr34,
                                     age_name = "age_years",
                                     score_name = "shaped_score",
                                     datarel = rel_est,
                                     normtype = "IQ")
```

# References