trafficCAR Model Diagnostics and Checking

This vignette explains how to interpret the diagnostic tools provided by trafficCAR. These diagnostics are designed to answer three questions:

The diagnostics are intentionally simple and global. They are meant to flag problems early, not to replace detailed model criticism.

Residual diagnostics

The residuals() method for a traffic_fit object provides three types of residuals:

Raw residuals: \[ r_i = y_i - \hat{\mu}_i \]
Structured residuals (spatial effect): \[ r_i^{(s)} = \hat{x}_i \]
Unstructured residuals: \[ r_i^{(u)} = y_i - (\hat{\mu}_i - \hat{x}_i) \]

Raw residuals reflect overall lack of fit. Unstructured residuals are particularly important: they represent the portion of the data that should be approximately independent if the spatial model is adequate.

Typical usage:

r_raw <- residuals(fit, type = "raw")
r_un  <- residuals(fit, type = "unstructured")
summary(r_raw)
summary(r_un)

Interpretation guidelines:

Large means or skewness in raw residuals suggest systematic bias.
Heavy tails indicate underestimation of variability.
Unstructured residuals should have smaller variance than raw residuals if the spatial component is contributing meaningfully.

Moran’s I on residuals

Spatial autocorrelation in residuals is assessed using Moran’s I via moran_residuals().

moran_residuals(fit, type = "unstructured", method = "permutation")

Interpretation depends on the residual type:

Raw residuals: Significant Moran’s I indicates spatial structure not captured by the mean model.
Unstructured residuals: Significant Moran’s I indicates spatial dependence that remains after accounting for the CAR component.
Structured residuals: Positive Moran’s I is expected and reflects the imposed spatial smoothing.

Permutation-based p-values should be interpreted as global diagnostics. A small p-value for unstructured residuals is a strong indication of model misspecification (e.g., missing covariates or inappropriate neighborhood structure).

If residual variance is zero, Moran’s I is undefined and returned as NA. This typically occurs in saturated or near-saturated models.

Posterior predictive checks

Posterior predictive checks (PPCs) compare observed summary statistics to their distribution under replicated data generated from the fitted model.

ppc <- ppc_summary(fit, stats = c("mean", "var", "tail"))
print(ppc)

The following statistics are reported:

Mean: checks overall location
Variance: checks dispersion
Tail probabilities: checks distributional shape

Each statistic is accompanied by a posterior predictive p-value:

\[ \text{p-value} = P(T(y^{rep}) \ge T(y) \mid y) \]

Interpretation guidelines:

Values near 0 or 1 indicate lack of fit.
Values near 0.5 indicate good agreement.
Systematic failures across multiple statistics suggest model inadequacy.

PPCs are not formal hypothesis tests. They are descriptive tools intended to highlight discrepancies between the model and the data.

Practical workflow

A recommended diagnostic workflow is:

Inspect raw and unstructured residual summaries.
Compute Moran’s I on unstructured residuals.
Run posterior predictive checks on means and variances.

Consistent signals across these diagnostics provide strong evidence for or against model adequacy.

Limitations

The diagnostics provided here are intentionally conservative:

They are global rather than local.
They do not identify specific problematic road segments.
They assume correctly specified adjacency structures.

These tools are best viewed as a first line of model checking rather than a complete diagnostic framework.