---
title: "Common errors & warnings encountered in DImulti()"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Common errors & warnings encountered in DImulti()}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{css styling, echo=FALSE}
span.R {
font-family: Courier New;
}
span.bad {
color: red;
}
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
options(crayon.enabled = TRUE)
ansi_aware_handler <- function(x, options)
{
paste0(
"
",
fansi::sgr_to_html(x = x, warn = FALSE, term.cap = "256"),
"
"
)
}
old_hooks <- fansi::set_knit_hooks(knitr::knit_hooks,
which = c("output", "message", "error", "warning"))
knitr::knit_hooks$set(
output = ansi_aware_handler,
message = ansi_aware_handler,
warning = ansi_aware_handler,
error = ansi_aware_handler
)
```
```{r setup}
library(DImodelsMulti)
data("dataBEL"); data("dataSWE"); data("simMV"); data("simRM"); data("simMVRM")
```
Incorrect argument specifications
The DImulti() function has a long list of possible parameters, which can make
it difficult to ensure that all arguments supplied are in the requested format. See below for an
example of each parameter and the included datasets,
dataBEL,
dataSWE,
simMV,
simRM and
simMVRM, for examples of dataset formatting.
y
This is a required argument. It can be passed in three forms, using either column names or indices:
* Univariate
In this case, we pass a single name or index of a numerical column containing the ecosystem function
response value. See ?dataSWE for an example of a univariate dataset suitable
for use with DImulti().
```{r y_univar, eval = FALSE}
DImulti(y = "YIELD", ..., data = dataSWE)
DImulti(y = 9, ..., data = dataSWE)
```
```{r y_dataSWE, eval = TRUE, echo = FALSE}
dataSWE[1:6, "YIELD", drop = FALSE]
```
* Multivariate stacked
In this case, we pass a single name or index of a numerical column containing the ecosystem function
response value. See ?dataBEL for an example of a multivariate stacked dataset
suitable for use with DImulti().
```{r y_MVstack, eval = FALSE}
DImulti(y = "Y", ..., data = dataBEL)
DImulti(y = 9, ..., data = dataBEL)
```
```{r y_dataBEL, eval = TRUE, echo = FALSE}
dataBEL[1:6, "Y", drop = FALSE]
```
* Multivariate wide
In this case, we pass multiple names or indices of numerical columns containing the
ecosystem function response values. See ?simMV for an example of a
multivariate wide dataset suitable for use with DImulti().
```{r y_MVwide, eval = FALSE}
DImulti(y = c("Y1", "Y2", "Y3", "Y4"), ..., data = simMV)
DImulti(y = 9:12, ..., data = simMV)
```
```{r y_simMV, eval = TRUE, echo = FALSE}
simMV[1:6, c("Y1", "Y2", "Y3", "Y4"), drop = FALSE]
```
eco_func
This is a required argument for multivariate data and excluded for univariate data. It can be passed
in two forms:
* Multivariate stacked
In this case, we pass a vector of size two. The first index contains a single column name for the
ecosystem function response type, a factor, the second contains the autocorrelation structure
desired, either "UN" or "CS". See
?dataBEL for an example of a multivariate stacked dataset suitable for use
with DImulti().
```{r ecoFunc_MVstack, eval = FALSE}
DImulti(eco_func = c("Var", "UN"), ..., data = dataBEL)
```
```{r ecoFunc_dataBEL, eval = TRUE, echo = FALSE}
dataBEL[1:6, "Var", drop = FALSE]
```
* Multivariate wide
In this case, we pass a vector of size two. The first index contains the string "NA", the second
contains the autocorrelation structure desired, either "UN" or "CS". See ?simMV for an example of a multivariate stacked
dataset suitable for use with DImulti().
```{r ecoFunc_MVwide, eval = FALSE}
DImulti(eco_func = c("NA", "UN"), ..., data = simMV)
```
time
This is a required argument for repeated measures data and excluded for data taken at a single time
point. It can be passed in the following form:
* Repeated Measures
In this case, we pass a vector of size two. The first index contains a single column name for the
time point reference, a factor, the second contains the autocorrelation structure
desired, either "UN", "CS", or
"AR1". See ?dataSWE for an example of a repeated
measures dataset suitable for use with DImulti().
```{r time, eval = FALSE}
DImulti(time = c("YEARN", "AR1"), ..., data = dataSWE)
```
```{r time_dataSWE, eval = TRUE, echo = FALSE}
dataSWE[c(1, 49, 97, 2, 50, 98), "YEARN", drop = FALSE]
```
unit_IDs
This is a required argument for any data.
It can be passed a column name or index, referencing a unique reference for each experimental unit,
used for grouping responses from different ecosystem functions or time points.
```{r unit_IDs, eval = FALSE}
DImulti(unit_IDs = "plot", ..., data = simMVRM)
DImulti(unit_IDs = 1, ..., data = simMVRM)
```
```{r unit_IDs_simVMRM, eval = TRUE, echo = FALSE}
simMVRM[1:6, "plot", drop = FALSE]
```
prop
This is a required argument for any data.
It can be passed as a vector of column names or indices, referencing the (numerical) initial
proportions of any species included in the experimental design.
In the event that a species is included as a factor as opposed to a numerical value, e.g. if it only
appears at values 0 or 1, only the numerical species should be included here, while the factor
species is passed through the extra_fixed parameter. A warning will be
printed to the console warning the user of not meeting the simplex requirement.
```{r prop, eval = FALSE}
DImulti(unit_IDs = paste0("p", 1:4), ..., data = simMVRM)
DImulti(unit_IDs = 2:5, ..., data = simMVRM)
```
```{r prop_simVMRM, eval = TRUE, echo = FALSE}
simMVRM[1:6, 2:5, drop = FALSE]
```
data
This is a required argument for any data.
This argument should be a dataframe or tibble which contains all columns referenced in the
DImulti() call. There are some restrictions on columns names allowed in this
dataset. An error will be printed to the console to inform a user of name changes required.
```{r data, eval = FALSE}
DImulti(..., data = simMVRM)
DImulti(..., data = simMVRM)
```
```{r data_simVMRM, eval = TRUE, echo = FALSE}
simMVRM[1:6, , drop = FALSE]
```
DImodel
This is a required argument for any data.
This argument should be a single string selected from the list of included interaction structures:
"STR",
"ID",
"FULL",
"E",
"AV",
"ADD",
"FG".
```{r DImodel, eval = FALSE}
DImulti(DImodel = "FULL", ...)
DImulti(DImodel = "AV", ...)
```
FG
This is a required argument if the argument "FG" is passed through
DImodel and can be excluded otherwise.
This argument should be a vector of strings, of the same length as prop,
which maps each species in the experiment to a functional grouping.
```{r FG, eval = FALSE}
DImulti(prop = c("G1", "G2", "L1", "L2"), DImodel = "FG",
FG = c("Grass", "Grass", "Legume", "Legume"), ..., data = dataBEL)
```
ID
This is an optional argument.
This argument should be a vector of strings, of the same length as prop,
which maps each species in the experiment to a identity grouping, assuming functional redundancy
between within-group species for their ID effects, this grouping does not affect interactions.
```{r ID, eval = FALSE}
DImulti(prop = c("G1", "G2", "L1", "L2"), ID = c("Grass", "Grass", "Legume", "Legume"),
..., data = dataBEL)
```
extra_fixed
This is an optional argument.
This parameter can be passed a formula of any additional fixed effects to be included, e.g.
treatments. These effects can be easily crossed with the automatic DI model formula by prefacing
the formula passed with 1: or 1*. See
?dataSWE for a dataset containing treatments suitable for use with
DImulti().
```{r extra_fixed, eval = FALSE}
DImulti(extra_fixed = ~DENS, ..., data = dataSWE)
DImulti(extra_fixed = ~DENS + TREAT, ..., data = dataSWE)
DImulti(extra_fixed = ~1*DENS, ..., data = dataSWE)
DImulti(extra_fixed = ~1:(DENS+TREAT), ..., data = dataSWE)
DImulti(extra_fixed = ~1:DENS:TREAT, ..., data = dataSWE)
```
```{r extra_fixed_dataSWE, eval = TRUE, echo = FALSE}
dataSWE[1:6, c("DENS", "TREAT"), drop = FALSE]
```
estimate_theta
This is an optional argument.
This parameter can be passed a boolean value (FALSE), indicating whether the user wants the function to use profile
likelihood to estimate values for the non-linear parameter $\theta$. Defaults to
"STR" or "ID" was passed
through the parameter DImodel, then setting
estimate_theta = TRUE will cause a warning to be printed to the console, as
$\theta$ only applies to interaction terms, which do not exist for those options.
```{r estimate_theta, eval = FALSE}
DImulti(estimate_theta = TRUE, ...)
DImulti(estimate_theta = FALSE, ...)
```
theta
This is an optional argument.
This argument should contain numerical values representing the non-linear parameter $\theta$, which
will be applied as a power to the products of pairs of species in the interaction terms of the
model. A single value may be passed, which will be applied to each ecosystem function, or a
different value may be passed for each ecosystem function.
If "STR" or "ID" was passed
through the parameter DImodel, then setting a non-1 value for
theta will cause a warning to be printed to the console, as
$\theta$ only applies to interaction terms, which do not exist for those options.
```{r theta, eval = FALSE}
DImulti(theta = 1, ...)
DImulti(theta = c(0.5, 1, 1.2), ...)
```
method
This is an optional argument.
This parameter is used to change the estimation method used by DImulti(). The
options are "REML", referring to restricted maximum likelihood, and
"ML", referring to maximum likelihood.
Defaults to "REML".
```{r method, eval = FALSE}
DImulti(method = "REML", ...)
DImulti(method = "ML", ...)
```
Singular fit
A singular fit error, thrown by nlme::gls() when fitting the model, occurs
when an element with value of exactly zero exists in the fixed effect variance covariance matrix of
the model. This usually caused by rank deficiency in the model, i.e. two terms explaining the same
variance. The fix is usually to simplify/change the fixed effect terms, be it the interactions
structure, treatment terms, or theta values. To aid in this change, the fixed effect formula and any
estimated theta values are printed to the console with the error.
As an example, we use the dataset dataSWEXXDO NOT RUNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
```{r singular_dataSWE, eval = TRUE, error = TRUE}
DImulti(prop = 5:8, y = "YIELD", time = c("YEARN", "CS"), unit_IDs = "PLOT", data = dataSWE,
DImodel = "ID", extra_fixed = ~DENS+TREAT)
```
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
As we include no theta values or interaction terms, and use the simple structure
"CS", compound symmetry, for our repeated measure, the only complication in
this model is the inclusion of the two treatment terms, DENS and
TREAT. When multiple treatments are included, it is usually good to check
that one cannot be inferred from the other, e.g. if DENS = High then we know
that TREAT = 1. Luckily this is not the case here.
```{r singular_dataSWE_DENSTREAT, eval = TRUE, echo = FALSE}
dataSWE[c(1, 16, 31, 40), c("DENS", "TREAT"), drop = FALSE]
```
Instead, we change the formatting of our formula, from creating intercepts to crossing the treatment
with our ID effects, but not each other.
```{r singular_dataSWE_fix, eval = TRUE}
DImulti(prop = 5:8, y = "YIELD", time = c("YEARN", "CS"), unit_IDs = "PLOT", data = dataSWE,
DImodel = "ID", extra_fixed = ~1:(DENS+TREAT))
```
Which has resolved our issue.
We show another example using the multivariate dataset dataBEL and estimating
$\theta$. We would first set theta = 1 and use model selection to choose an
appropriate interaction structure, which gives us the following model, with a functional group
structure:
```{r singular_dataBEL_Works, eval = TRUE}
model1 <- DImulti(prop = 2:5, y = "Y", eco_func = c("Var", "UN"), unit_IDs = 1, data = dataBEL,
FG = c("Grass", "Grass", "Leg", "Leg"), DImodel = "FG", extra_fixed = ~Density,
method = "ML", theta = 1)
```
Comparing models: ML vs REML
The optional parameter method in the function
DImulti() allows a user to change the estimation method used, between the
default "REML", restricted maximum likelihood, and
"ML", maximum likelihood.
"REML" was chosen as the default as it provides unbiased estimates, as
opposed to "ML", however, "REML" is not appropriate
for use with model comparison where fixed effects vary between models, as we can see from the
following warning message:
XXDO NOT RUNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
```{r comparison_dataBEL, eval = TRUE}
model1 <- DImulti(prop = 2:5, y = "Y", eco_func = c("Var", "UN"), unit_IDs = 1, data = dataBEL,
FG = c("Grass", "Grass", "Leg", "Leg"), DImodel = "FG",
method = "REML")
model2 <- DImulti(prop = 2:5, y = "Y", eco_func = c("Var", "UN"), unit_IDs = 1, data = dataBEL,
FG = c("Grass", "Grass", "Leg", "Leg"), DImodel = "FG", extra_fixed = ~Density,
method = "REML")
anova(model1, model2)
```
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
In this case, we should use "ML" to fit the models for comparison, then refit
the chosen model using "REML", for the unbiased estimates.