unmarked
aims to be a complete environment for the
statistical analysis of data from surveys of unmarked animals.
Currently, the focus is on hierarchical models that separately model a
latent state (or states) and an observation process. This vignette
provides a brief overview of the package - for a more thorough treatment
see Fiske and Chandler (2011) and Kellner et al. (2023).
Unmarked provides methods to estimate site occupancy, abundance, and density of animals (or possibly other organisms/objects) that cannot be detected with certainty. Numerous models are available that correspond to specialized survey methods such as temporally replicated surveys, distance sampling, removal sampling, and double observer sampling. These data are often associated with metadata related to the design of the study. For example, in distance sampling, the study design (line- or point-transect), distance class break points, transect lengths, and units of measurement need to be accounted for in the analysis. Unmarked uses S4 classes to store data and metadata in a way that allows for easy data manipulation, summarization, and model specification. Table 1 lists the currently implemented models and their associated fitting functions and data classes.
Model | Fitting Function | Data | Citation |
---|---|---|---|
Occupancy | occu | unmarkedFrameOccu | MacKenzie et al. (2002) |
Royle-Nichols | occuRN | unmarkedFrameOccu | Royle and Nichols (2003) |
Point Count | pcount | unmarkedFramePCount | Royle (2004a) |
Distance-sampling | distsamp | unmarkedFrameDS | Royle et al. (2004) |
Generalized distance-sampling | gdistsamp | unmarkedFrameGDS | Chandler et al. (2011) |
Arbitrary multinomial-Poisson | multinomPois | unmarkedFrameMPois | Royle (2004b) |
Colonization-extinction | colext | unmarkedMultFrame | MacKenzie et al. (2003) |
Generalized multinomial-mixture | gmultmix | unmarkedFrameGMM | Royle (2004b) |
Each data class can be created with a call to the constructor function of the same name as described in the examples below.
The first step is to import the data into R, which we do below using
the read.csv
function. Next, the data need to be formatted
for use with a specific model fitting function. This can be accomplished
with a call to the appropriate type of unmarkedFrame
. For
example, to prepare the data for a single-season site-occupancy
analysis, the function unmarkedFrameOccu
is used.
Two examples of reading in data from a CSV and formatting for unmarked. First, from a CSV in “wide” format:
library(unmarked)
wt <- read.csv(system.file("csv","widewt.csv", package="unmarked"))
y <- wt[,2:4]
siteCovs <- wt[,c("elev", "forest", "length")]
obsCovs <- list(date=wt[,c("date.1", "date.2", "date.3")],
ivel=wt[,c("ivel.1", "ivel.2", "ivel.3")])
wt <- unmarkedFrameOccu(y = y, siteCovs = siteCovs, obsCovs = obsCovs)
summary(wt)
## unmarkedFrame Object
##
## 237 sites
## Maximum number of observations per site: 3
## Mean number of observations per site: 2.81
## Sites with at least one detection: 79
##
## Tabulation of y observations:
## 0 1 <NA>
## 483 182 46
##
## Site-level covariates:
## elev forest length
## Min. :-1.436125 Min. :-1.265e+00 Min. :0.1823
## 1st Qu.:-0.940726 1st Qu.:-9.744e-01 1st Qu.:1.4351
## Median :-0.166666 Median :-6.499e-02 Median :1.6094
## Mean : 0.007612 Mean : 8.798e-05 Mean :1.5924
## 3rd Qu.: 0.994425 3rd Qu.: 8.080e-01 3rd Qu.:1.7750
## Max. : 2.434177 Max. : 2.299e+00 Max. :2.2407
##
## Observation-level covariates:
## date ivel
## Min. :-2.904339 Min. :-1.7533
## 1st Qu.:-1.118624 1st Qu.:-0.6660
## Median :-0.118624 Median :-0.1395
## Mean :-0.000217 Mean : 0.0000
## 3rd Qu.: 1.309947 3rd Qu.: 0.5493
## Max. : 3.809947 Max. : 5.9795
## NA's :42 NA's :46
Next, a CSV in “long” format:
pcru_raw <- read.csv(system.file("csv","frog2001pcru.csv", package="unmarked"))
sites <- unique(pcru_raw$RouteNumStopNum)
M <- length(sites)
J <- max(table(pcru_raw$RouteNumStopNum))
y <- JulianDate <- MinAfterSunset <- Wind <- Sky <- Temperature <- matrix(NA, M, J)
for (i in 1:length(sites)){
datsub <- pcru_raw[pcru_raw$RouteNumStopNum == sites[i],]
n <- nrow(datsub)
y[i,1:n] <- datsub$Pcru
JulianDate[i,1:n] <- datsub$JulianDate
MinAfterSunset[i,1:n] <- datsub$MinAfterSunset
Wind[i,1:n] <- datsub$Wind
Sky[i,1:n] <- datsub$Sky
Temperature[i,1:n] <- datsub$Temperature
}
obscovs <- list(JulianDate = JulianDate, MinAfterSunset = MinAfterSunset,
Wind = Wind, Sky = Sky, Temperature = Temperature)
pcru <- unmarkedFrameOccu(y = y, obsCovs = obscovs)
Occupancy models can then be fit with the occu() function. To help
stabilize the numerical optimization algorithm, we recommend
standardizing covariates. The best way to do this is to use the
scale()
function on covariates inside your formulas.
## Warning in truncateToBinary(dm$y): Some observations were > 1. These were
## truncated to 1.
## Warning in truncateToBinary(dm$y): Some observations were > 1. These were
## truncated to 1.
##
## Call:
## occu(formula = ~scale(MinAfterSunset) + scale(Temperature) ~
## 1, data = pcru)
##
## Occupancy (logit-scale):
## Estimate SE z P(>|z|)
## 1.54 0.292 5.26 1.42e-07
##
## Detection (logit-scale):
## Estimate SE z P(>|z|)
## (Intercept) 0.2098 0.206 1.017 3.09e-01
## scale(MinAfterSunset) -0.0855 0.160 -0.536 5.92e-01
## scale(Temperature) -1.8936 0.291 -6.508 7.60e-11
##
## AIC: 356.7591
## Number of sites: 130
Here, we have specified that the detection process is modeled with
the MinAfterSunset
and Temperature
covariates.
No covariates are specified for occupancy here. See ?occu
for more details.
unmarked
fitting functions return
unmarkedFit
objects which can be queried to investigate the
model fit. Variables can be back-transformed to the unconstrained scale
using predict
, which also returns a 95% confidence
interval. Since there are no occupancy covariates, all sites have the
same occupancy estimate, and we can look at just the first row of the
output.
## Predicted SE lower upper
## 1 0.8230724 0.04254069 0.7240711 0.8918579
The expected probability that a site was occupied is 0.823. This
estimate applies to the hypothetical population of all possible sites,
not the sites found in our sample. For a good discussion of
population-level vs finite-sample inference, see Royle and Dorazio (2008) page 117. Note also
that finite-sample quantities can be computed in unmarked
using empirical Bayes methods as demonstrated at the end of this
document.
Back-transforming the estimate of \(\psi\) was easy because there were no covariates. Because the detection component was modeled with covariates, \(p\) is a function, not just a scalar quantity, and so we need to be provide values of our covariates to obtain an estimate of \(p\). Here, we request the probability of detection given a site is occupied and all covariates are set to 0.
nd <- data.frame(MinAfterSunset = 0, Temperature = 0)
round(predict(fm2, type = 'det', newdata = nd, appendData = TRUE), 2)
## Predicted SE lower upper MinAfterSunset Temperature
## 1 1 0 0.98 1 0 0
Thus, we can say that the expected probability of detection was 0.552
when time of day and temperature are fixed at their mean value. The
predict
metho can also be used to obtain estimates of
parameters at a range of specific covariate values.
nd <- data.frame(MinAfterSunset = 0, Temperature = -2:2)
round(predict(fm2, type = 'det', newdata = nd, appendData=TRUE), 2)
## Predicted SE lower upper MinAfterSunset Temperature
## 1 1.00 0 0.99 1 0 -2
## 2 1.00 0 0.99 1 0 -1
## 3 1.00 0 0.98 1 0 0
## 4 1.00 0 0.98 1 0 1
## 5 0.99 0 0.97 1 0 2
Confidence intervals are requested with confint
, using
either the asymptotic normal approximation or profiling.
## 0.025 0.975
## p(Int) -0.1946872 0.6142292
## p(scale(MinAfterSunset)) -0.3985642 0.2274722
## p(scale(Temperature)) -2.4638797 -1.3233511
## 0.025 0.975
## p(Int) -0.1929210 0.6208837
## p(scale(MinAfterSunset)) -0.4044794 0.2244221
## p(scale(Temperature)) -2.5189984 -1.3789261
Model selection and multi-model inference can be implemented after
organizing models using the fitList
function.
## nPars AIC delta AICwt cumltvWt
## psi(.)p(Time+Temp) 4 356.76 0.00 1.0e+00 1.00
## psi(.)p(.) 2 461.00 104.25 2.3e-23 1.00
## Predicted SE lower upper
## 1 0.9986698 0.001488063 0.9881751 0.9998518
## 2 0.9981420 0.001987729 0.9850156 0.9997723
## 3 0.9974055 0.002649463 0.9810172 0.9996504
## 4 0.9963781 0.003522940 0.9759617 0.9994638
## 5 0.9949460 0.004671453 0.9695782 0.9991783
The parametric bootstrap can be used to check the adequacy of model fit. Here we use a \(\chi^2\) statistic appropriate for binary data.
chisq <- function(fm) {
umf <- fm@data
y <- umf@y
y[y>1] <- 1
fv <- fitted(fm)
sum((y-fv)^2/(fv*(1-fv)), na.rm=TRUE)
}
set.seed(1)
(pb <- parboot(fm2, statistic=chisq, nsim=100, parallel=FALSE))
##
## Call: parboot(object = fm2, statistic = chisq, nsim = 100, parallel = FALSE)
##
## Parametric Bootstrap Statistics:
## t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
## 1 356 22 17.4 0.0693
##
## t_B quantiles:
## 0% 2.5% 25% 50% 75% 97.5% 100%
## [1,] 303 308 322 331 345 375 403
##
## t0 = Original statistic computed from data
## t_B = Vector of bootstrap samples
We fail to reject the null hypothesis, and conclude that the model fit is adequate.
The parboot
function can be also be used to compute
confidence intervals for estimates of derived parameters, such as the
proportion of \(N\) sites occupied
\(\mbox{PAO} = \frac{\sum_i z_i}{N}\)
where \(z_i\) is the true occurrence
state at site \(i\), which is unknown
at sites where no individuals were detected. The colext
vignette shows examples of using parboot
to obtain
confidence intervals for such derived quantities. An alternative way
achieving this goal is to use empirical Bayes methods, which were
introduced in unmarked
version 0.9-5. These methods
estimate the posterior distribution of the latent variable given the
data and the estimates of the fixed effects (the MLEs). The mean or the
mode of the estimated posterior distibution is referred to as the
empirical best unbiased predictor (EBUP), which in unmarked
can be obtained by applying the bup
function to the
estimates of the posterior distributions returned by the
ranef
function. The following code returns an estimate of
PAO using EBUP.
## [1] 0.8076923
Note that this is similar, but slightly lower than the population-level estimate of \(\psi\) obtained above.
A plot method also exists for objects returned by ranef
,
but distributions of binary variables are not so pretty. Try it out on a
fitted abundance model instead.