What’s Unique about Rater Agreement?
From one point of view, nothing. Rater agreement studies tend to
produce r X r tables, where r is the number of rating categories. As
such, the models that test marginal homogeneity and symmetry are quite
relevant, asking does one rater score similarly (or higher) than
another. However, rater agreement data are different in that there are
usually a surplus of responses on the diagonal, indicating that the
raters agree. Often it makes sense to explicitly model this excess, and
the models in this vignette do that.
Another feature of these models is that they are nearly all regular
log-linear models. As such they are less interesting, in that they can
be fit with standard log-linear estimation proceudres, including glm()
with Poisson family and “log” as the link function. Where things can be
a bit challenging is that one has to explicitly form the design matrix.
For some designs this is a bit of a puzzle, and there are some utility
routines to help in the specification.
Much of the modeling in this section comes from chapter 2 of von Eye,
A. & Mun, E. Y. (2005), Analyzing rater agreement: Manifest variable
methods. Mahwah, NJ: Lawrence Erlbaum.
The basic main effects model
The baseline model for rater modeling is the main effect model, which
specifies an intercept and a row (rater 1) effect and a column (rater 2
effect), but no interaction terms. This is the independence model that
underlies Cohen’s kappa for nominal items: p_a - p_c kappa = ———– 1 -
p_c
dogs_kappa <- kappa(dogs)
print(dogs_kappa)
#> $kappa
#> [1] 0.6351767
#>
#> $se
#> [1] 0.05126776
p_c is taken as p_i+ times p_+j, which is the model prediction for
all cells in the main effects model. There is also a weighted version of
kappa available, weighted_kappa(n, w) where w is the matrix of weights.
The unweighted kappa for the dogs data is 0.6351767, 0.0512678,
indicating good agreement betweeen the two clinicians.
There are two models that specifically include kappa as part of the
model. The first is by Agresti, 1989, and the second is by Schuster,
2001. The Agresti model is described as “symmetry plus
quasi-independence.” It assumes marginal homogeneity, as well as
assuming that quasi-symmetry holds.
The Agresti kappa model is available as:
result <- Agresti_kappa_agreement(dogs)
The result is a list with several elements. The first is kappa,
0.6235352, which is quite similar to the observed value. Estimates of
the marginal parameters (the moodel assumes that these fit to both
raters) are also returned, 0.6316093, 0.2278695, 0.1316074, 0.0146445.
The Pearson chi-squared statistic of 92.3636704 on
result$df
degrees of freedom indicates that the model does
not fit well to this data.
The Schuster model also assumes marginal homogeneity. It incorporates
an underlying symmetry matrix as well. The model is much harder to fit
acceptably, as reflected in the amount of time and number of iterations
that the model runs. It does not yield a plausible result for the dogs
data (not shown: the log(likelihood) decreased, the estimated kappa was
negative and the residuals were remarkably large for the highest
category), so we apply it to the vision_data.
result <- Schuster_symmetric_rater_agreement_model(vision_data)
#> 100 -10365.131345958 5669.64113345708 114810.134161738 0.000219059130695239
#> 200 -10365.0027942057 5669.38402995252 114808.82366675 0.000231464328770828
#> 300 -10364.8743346158 5669.12711077268 114807.513696302 0.000243860940594987
#> 400 -10364.7459671253 5668.87037579168 114806.204250146 0.000256248971591903
#> 500 -10364.6176916713 5668.61382488376 114804.895328033 0.000268628427181174
#> 600 -10364.4895081911 5668.35745792331 114803.586929717 0.000280999312775702
#> 700 -10364.3614166219 5668.10127478488 114802.27905495 0.000293361633782037
#> 800 -10364.233416901 5667.84527534316 114800.971703486 0.00030571539560073
#> 900 -10364.1055089659 5667.58945947302 114799.664875078 0.000318060603625277
#> 1000 -10363.9776927541 5667.33382704943 114798.35856948 0.000330397263244574
#> 1100 -10363.8499682032 5667.07837794755 114797.052786446 1.23196687967666e-07
#> 1200 -10363.7223352507 5666.82311204265 114795.747525731 1.23109851597136e-07
#> 1300 -10363.5947938345 5666.56802921021 114794.442787091 1.23023071325962e-07
#> 1400 -10363.4673438923 5666.31312932578 114793.138570279 1.22936348212369e-07
#> 1500 -10363.339985362 5666.05841226513 114791.834875051 1.22849681910455e-07
#> 1600 -10363.2127181815 5665.80387790414 114790.531701165 1.22763072074297e-07
#> 1700 -10363.0855422888 5665.54952611882 114789.229048374 1.22676518357946e-07
#> 1800 -10362.9584576221 5665.29535678538 114787.926916437 1.22590021468601e-07
#> 1900 -10362.8314641195 5665.04136978014 114786.625305109 1.22503581411367e-07
#> 2000 -10362.7045617192 5664.78756497956 114785.324214149 1.22417197489217e-07
#> 2100 -10362.5777503596 5664.53394226028 114784.023643312 1.22330869356149e-07
#> 2200 -10362.451029979 5664.28050149906 114782.723592358 1.22244598070435e-07
#> 2300 -10362.3244005158 5664.02724257282 114781.424061045 1.22158383286079e-07
#> 2400 -10362.1978619087 5663.7741653586 114780.125049129 1.22072223954901e-07
#> 2500 -10362.0714140962 5663.52126973364 114778.826556372 1.21986120433009e-07
#> 2600 -10361.945057017 5663.26855557526 114777.52858253 1.2190007342761e-07
#> 2700 -10361.8187906099 5663.01602276097 114776.231127364 1.21814082241564e-07
#> 2800 -10361.6926148136 5662.7636711684 114774.934190634 1.21728146879887e-07
#> 2900 -10361.5665295671 5662.51150067536 114773.637772098 1.21642267347591e-07
#> 3000 -10361.4405348093 5662.25951115977 114772.341871519 1.21556443298574e-07
#> 3100 -10361.3146304793 5662.00770249969 114771.046488655 1.21470674737823e-07
#> 3200 -10361.1888165161 5661.75607457337 114769.751623269 1.21384962372551e-07
#> 3300 -10361.063092859 5661.50462725915 114768.45727512 1.21299305505521e-07
#> 3400 -10360.9374594472 5661.25336043552 114767.163443972 1.21213703790579e-07
#> 3500 -10360.81191622 5661.00227398116 114765.870129585 1.21128157583805e-07
#> 3600 -10360.6864631168 5660.75136777484 114764.577331722 1.21042666539027e-07
#> 3700 -10360.5611000772 5660.50064169551 114763.285050146 1.20957231012324e-07
#> 3800 -10360.4358270405 5660.25009562225 114761.993284618 1.2087185030636e-07
#> 3900 -10360.3106439466 5659.99972943429 114760.702034902 1.20786525128346e-07
#> 4000 -10360.1855507349 5659.74954301096 114759.411300762 1.2070125513207e-07
#> 4100 -10360.0605473453 5659.49953623179 114758.121081961 1.20616039620142e-07
#> 4200 -10359.9356337176 5659.24970897641 114756.831378264 1.20530879299768e-07
#> 4300 -10359.8108097917 5659.00006112462 114755.542189433 1.20445774175857e-07
#> 4400 -10359.6860755076 5658.75059255634 114754.253515235 1.2036072355098e-07
#> 4500 -10359.5614308052 5658.50130315166 114752.965355434 1.20275727781178e-07
#> 4600 -10359.4368756248 5658.25219279077 114751.677709794 1.20190786520158e-07
#> 4700 -10359.3124099064 5658.00326135403 114750.390578083 1.20105900123961e-07
#> 4800 -10359.1880335904 5657.75450872194 114749.103960065 1.20021068597455e-07
#> 4900 -10359.063746617 5657.50593477511 114747.817855507 1.1993629089194e-07
#> 5000 -10358.9395489266 5657.25753939434 114746.532264175 1.19851568065815e-07
#> 5100 -10358.8154404597 5657.00932246053 114745.247185836 1.19766899421538e-07
#> 5200 -10358.6914211568 5656.76128385471 114743.962620257 1.19682285315132e-07
#> 5300 -10358.5674909585 5656.5134234581 114742.678567205 1.19597725751429e-07
#> 5400 -10358.4436498054 5656.26574115201 114741.395026448 1.19513219681625e-07
#> 5500 -10358.3198976384 5656.01823681793 114740.111997754 1.19428767812928e-07
#> 5600 -10358.1962343981 5655.77091033746 114738.829480891 1.19344370501356e-07
#> 5700 -10358.0726600256 5655.52376159232 114737.547475628 1.19260026698053e-07
#> 5800 -10357.9491744616 5655.27679046442 114736.265981733 1.19175737461463e-07
#> 5900 -10357.8257776473 5655.02999683577 114734.984998975 1.19091501742693e-07
#> 6000 -10357.7024695237 5654.78338058853 114733.704527125 1.19007319546498e-07
#> 6100 -10357.5792500319 5654.53694160501 114732.424565952 1.18923191931344e-07
#> 6200 -10357.4561191132 5654.29067976762 114731.145115225 1.18839117497042e-07
#> 6300 -10357.3330767089 5654.04459495893 114729.866174716 1.18755096599565e-07
#> 6400 -10357.2101227603 5653.79868706168 114728.587744195 1.18671129243649e-07
#> 6500 -10357.0872572088 5653.5529559587 114727.309823432 1.18587215785277e-07
#> 6600 -10356.9644799959 5653.30740153294 114726.032412199 1.18503355526666e-07
#> 6700 -10356.8417910632 5653.06202366755 114724.755510268 1.18419548823787e-07
#> 6800 -10356.7191903523 5652.81682224577 114723.47911741 1.18335795681353e-07
#> 6900 -10356.5966778049 5652.57179715098 114722.203233397 1.18252095401528e-07
#> 7000 -10356.4742533628 5652.32694826673 114720.927858003 1.18168448340266e-07
#> 7100 -10356.3519169677 5652.08227547665 114719.652990999 1.18084854150974e-07
#> 7200 -10356.2296685617 5651.83777866455 114718.378632158 1.18001313892171e-07
#> 7300 -10356.1075080866 5651.59345771435 114717.104781254 1.17917825812115e-07
#> 7400 -10355.9854354845 5651.34931251012 114715.831438061 1.1783439132061e-07
#> 7500 -10355.8634506974 5651.10534293605 114714.558602351 1.1775100971974e-07
#> 7600 -10355.7415536677 5650.86154887648 114713.2862739 1.17667680662851e-07
#> 7700 -10355.6197443373 5650.61793021587 114712.014452482 1.17584404505878e-07
#> 7800 -10355.4980226488 5650.37448683883 114710.743137871 1.17501180902149e-07
#> 7900 -10355.3763885445 5650.13121863007 114709.472329843 1.17418010207597e-07
#> 8000 -10355.2548419667 5649.88812547448 114708.202028173 1.17334892075532e-07
#> 8100 -10355.1333828579 5649.64520725705 114706.932232636 1.17251826510565e-07
#> 8200 -10355.0120111609 5649.40246386291 114705.662943009 1.17168813517298e-07
#> 8300 -10354.8907268181 5649.15989517733 114704.394159067 1.17085852749002e-07
#> 8400 -10354.7695297723 5648.91750108571 114703.125880587 1.17002944912928e-07
#> 8500 -10354.6484199662 5648.67528147357 114701.858107346 1.16920088608325e-07
#> 8600 -10354.5273973427 5648.43323622659 114700.590839121 1.16837285245109e-07
#> 8700 -10354.4064618447 5648.19136523054 114699.324075689 1.16754533422487e-07
#> 8800 -10354.2856134151 5647.94966837136 114698.057816827 1.16671834550397e-07
#> 8900 -10354.164851997 5647.70814553511 114696.792062314 1.16589186876651e-07
#> 9000 -10354.0441775334 5647.46679660797 114695.526811928 1.16506591811197e-07
#> 9100 -10353.9235899675 5647.22562147626 114694.262065448 1.16424048304507e-07
#> 9200 -10353.8030892426 5646.98462002644 114692.997822651 1.16341557063831e-07
#> 9300 -10353.682675302 5646.74379214508 114691.734083317 1.16259117390967e-07
#> 9400 -10353.5623480889 5646.50313771889 114690.470847225 1.16176729993171e-07
#> 9500 -10353.4421075468 5646.26265663472 114689.208114156 1.16094393820835e-07
#> 9600 -10353.3219536192 5646.02234877954 114687.945883888 1.16012109581209e-07
#> 9700 -10353.2018862496 5645.78221404045 114686.684156202 1.15929876576027e-07
#> 9800 -10353.0819053818 5645.54225230466 114685.422930878 1.15847695512544e-07
#> 9900 -10352.9620109592 5645.30246345957 114684.162207698 1.15765566043863e-07
#> 10000 -10352.8422029257 5645.06284739264 114682.901986442 1.15683487471663e-07
The iterations are slow (about 1 sec per 100 iterations). It runs for
the full 10,000 iterations specified by default, in spite of using the
Newton-Raphson method (it is likely that the performance of this
function will be improved in future versions of the package). At every
100 cycles the function outputs the iteration number, log(likelihood),
G^2 and X^2 and the convergence value, whihc is the relative improvement
in log(likelihood) value. For the vision data the observed kappa is
0.5953888, 0.0070393. The model’s final estimate is close to this,
0.5953852. It is a bit hard to evaluate the model fit, given that the
iterations did not converge, but the G^2 values that were printed out
indicate very poor fit to the data (the X^2 values were even more
extreme). Running to convergence of 1.0e-9 requires 244,503 iterations.
The final kappa estimate of 0.5953344 is very close to the observed
kappa of 0.5953888, 0.0070393. The final model yields a G^2 of 5409.781
on 9 degrees of freedom.
We note that the Agresti model:
result <- Agresti_kappa_agreement(vision_data)
runs substantially more quickly and fits this data set substantially
better, with G^2 of 409.7778011 on 11 degrees of freedom, although it
does not fit in an absolute sense.
To show that the kappa-based models can fit, we look at the
budget_actual data. Fitting this model:
s_result <- Schuster_symmetric_rater_agreement_model(budget_actual)
#> 100 -149.012309151138 2.96905386545657 3.07236917550835 0.00790303078865798
#> 200 -149.010817704918 2.96607097301722 3.06975109518432 0.00791311886943114
#> 300 -149.009343620158 2.96312280349772 3.06716522785273 0.00792308971643707
#> 400 -149.007886692519 2.96020894821833 3.06461118553905 0.00793294470492557
#> 500 -149.00644672009 2.95732900335991 3.0620885849234 0.00794268519389665
#> 600 -149.005023503362 2.95448256990533 3.05959704728408 0.00795231252629077
#> 700 -149.0036168452 2.95166925358144 3.05713619844183 0.00796182802918456
#> 800 -149.002226550811 2.94888866480201 3.05470566870467 0.00797123301397594
#> 900 -149.000852427715 2.94614041861118 3.0523050928136 0.00798052877657195
#> 1000 -148.999494285723 2.94342413462768 3.04993410988885 0.00798971659757486
yields acceptable fit (G^2 = 2.9433971 on 5 degrees of freedom), as
does fitting the Agresti kappa model:
a_result <- Agresti_kappa_agreement(budget_actual)
which gives a G^2 of 6.6831842 on 5 degrees of freedom. Again, the
two kappa estimates, 0.2851063 and 0.2788086 are not that different than
the observed kappa of 0.2845944, 0.0619068.
It is worth noting that symmetry (which is assumed by both kappa
models) fits the budget_actual data:
stuart_result <- Stuart_marginal_homogeneity(budget_actual)
yields a X^2 of 1.0889064 on 2 degrees of freedom, indicating that
symmetry is plausible for this data.
In the final analysis, neither of these specialized kappa models does
a good job of describing the vision data. Both models assume marginal
homogeneity, and we know from the vignette “Checking Whether Margins are
(Stochastically) Ordered” that marginal homogeneity can be rejected for
the vision data. When marginal homogeneity was satisfied in the
budget_actual data, both models yielded acceptable fit. It appears that
for these models to fit, marginal homogeneity must be satisfied.
Conversely, to fit a wide variety of rater data sets acceptably, it
appears that a (kappa-based) model must allow for heterogeneous
margins.