---
title: "Goodman's (1979) Analysis of Association"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Goodman's (1979) Analysis of Association}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(ordinalTables)
```

Goodman (1979) has pointed out that the association in a contingency table can be partitioned similar to analysis of variance.  In that paper he gives three examples of the process.  This vignette will examine the first of those examples, the mental_health data set which relates children's mental health status (rows) to the socioeconomic status of their parents (columns).

``` {r mental_health}
mental_health
```

## Null Model

The first, null model is the independence model.

```{r null_model}
null_model <- Goodman_null_association(mental_health)
```

This model does not fit, with a G^2 of `null_model$g_squared` on `null_model$df` degrees of freedom.

## Uniform Association Model

The next model is the uniform association model. It states that the association can be modeled by a single parameter, theta. The model implies that the adjacent-category odds ratios for the table are constant.

``` {r uniform_association}
uniform_model <- Goodman_uniform_association(mental_health)
```

This model fits relatively well, with G^2 of `r uniform_model$g_squared` on `r uniform_model$df` degrees of freedom.

However, to illustrate the process, we will continue with more models.

## Rows and Columns as Special Cases of Model I

The effect of rows and the effect of columns can be fit as special cases of Goodman's Model I.  Model I specifies a linear-by-linear log-linear model (see the second half of the vignette "Models for Rater Agreement and Reliability" for more information about log-linear models and linear and linear-by-linear models).  The locations of the linear coefficients are fixed at the integerrs 1 .. r.

``` {r r_c_association}
rows <- Goodman_model_i(mental_health, row_effects = TRUE, column_effects = FALSE)
columns <- Goodman_model_i(mental_health, row_effects=FALSE, column_effects=TRUE)
```
These models improve the fit, rows = `rows$g_squared`, df =`rows$df` and columns = `columns$g_squared`, df =`columns_df`.

Next, the effect of allowing both row and column effects can be estimated by allowing both row_effects and column_effects to take their default value of TRUE.

``` {r rc_association}
rc_association <- Goodman_model_i(mental_health)
```
There is an improvement of fit, with G^2 = `r rc_association$g_squared` on `rc_association$df` degrees of freedom.

## Decomposing the Association 

The models just reviewed are all hierarchically related, which means that the effect of a component can be computed as the difference in G^2 fit.  For example, the effect of rows can be computed as `r rows$g_squared` - `r null_model$g_squared` = `r rows$g_squared - null_model$g_squared` on `r rows$df` - `r null_model$df` = `r rows$df - null_model$df` degrees of freedom.  Another estimate of the effect of rows would be rows & columns - columns, `r `rc_association$g_squared` - `r columns$g_squared` =  `r `rc_association$g_squared - columns$g_squared` on `r `rc_association$df - columns$df` degrees of freedom.  Similar subtractions allow the association to be decomposed into a table of effects.

# Model II

As was noted above, the general version of Model I is a linear-by-linear log-linear model.  The locations are fixed at the integers 1 .. r.  Model II frees this constraint and estimates the locations of the linear component.  In general, the Model II version (sy of a row-effects model) has the same degrees of freedom as the parallel Model I model. Although the Model I seems to be a constrained version of Model II, the two models are not related by hierarchical constraints, and their fit cannot be compared using a G^2 difference test.  Nonetheless, the relative fit can be interesting, and the estimated locations of the category scores can be examined to see how uniform or not they are.  If the separation appears relatively uniform, Model I may fit about as well and has a simpler interpretation.

Fitting the full rows & columns version of Model II is essentially the same as fitting Model I.

```{r model_ii_full}
model_ii_result <- Goodman_model_ii(mental_health)
```
The object returned contains information about the fit (G^2 = `r model_ii_result$g_squared` and X^2 = `r model_ii_result$chisq`, df = `r model_ii_result$df`).  Alpha and beta are the row and column log-linear effects, respectively.  The row locations are in rho, (rho = `r model_ii_result$rhow`) and the column locations are in sigma (sigma = `r model_ii_result$sigma`).  For the rows, one interesting aspect is that the middle two locations are quite similar and could likely be constrained to be equal.  For the remaining categories the spacing is fairly uniform indicating that the fit would be similar if the values were constrained to be equally spaced.  This can be specified by passing in the values of rho and specifying "update_rows=FALSE".  Using update_rows=FALSE or update_rows=FALSE can be used to obtain column-effects and row-effects models, respectively, for Model II.

``` {r model_ii_fixed_rows}
rho = c(0.2, 0.0, 0.0, -0.2)
result_rows <- Goodman_model_ii(mental_health, rho=rho, update_rows=FALSE)
```
This yields a small change in the fit (`r result_rows$g_squared` - `r model_ii_result$g_squared`).  The difference is not distributed as chi-squared, because it is based on looking at the data.  Nonetheless, in an exploratory analysis, the similarity of locations for these two levels of mental_health ("mild symptom formation" and "moderate symptom formation") could be of substantive interest.   Note that a similar point can be made about the locations of the two lowest sigma parameters, for categories A and B of socioeconomic status.

It would be possible to support arbitrary equality constraints for rho and/or sigma, but does not seem worth the effort currently.  It may appear in a future release of the package if there is call for it.

Another possible extension would be the facility to remove the diagonal cells from the model fitting for square tables where the rows and columns have similar meaning, such as the table comparing son's occupational status to that of their father (occupational_status), or where rows and columns represent ratings of a set of objects by two independent raters (see the vignette "Models for Rater Agreement and Reliability" for more on this topic). To evaluate this, all of the models include an optional parameter exclude_diagonal.  This defaults to FALSE, include all cells in the analysis.  But if it is set to TRUE, the cells of the main diagonal are ignored in the analysis.

To illustrate, consider fitting the social status data, ignoring the main diagonal as Goodman does in thet 1979 paper.

```{r ignore_diagonal}
null_result <- Goodman_null_association(social_status, exclude_diagonal=TRUE, verbose=FALSE, max_iter=15)
uniform_result <- Goodman_uniform_association(social_status, exclude_diagonal=TRUE, verbose=FALSE, max_iter=15)
row_result <- Goodman_model_i(social_status, row_effects=TRUE, column_effects =FALSE, exclude_diagonal=TRUE, verbose=FALSE, max_iter=15)
column_result <- Goodman_model_i(social_status, row_effects=FALSE, column_effects=TRUE, exclude_diagonal=TRUE, verbose=FALSE, max_iter=15)
rc_result <- Goodman_model_i(social_status, row_effects=TRUE, column_effects=TRUE, exclude_diagonal=TRUE, verbose=FALSE, max_iter=15)
model_ii_result <- Goodman_model_ii(social_status, update_rows=TRUE, update_columns =FALSE, exclude_diagonal=TRUE, verbose=FALSE, max_iter =15)
```

In my reading, Model II is not widely used.  The linear-by-linear specification with fixed locations (of Model I) fits a wide variety of data sets fairly well (while possibly making allowance for the agreement on the main diagonal), and the subtlety in interpreting Model II generally tips the balance in favor of Model I.  Still, it is an interesting model to consider, and it may be exactly what is required in some circumstances.


# Reference
Goodman, L. A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories.  Journal of the American Statistical Association, 74(367) 537-552.