1 Abstract

unmarked aims to be a complete environment for the statistical analysis of data from surveys of unmarked animals. Currently, the focus is on hierarchical models that separately model a latent state (or states) and an observation process. This vignette provides a brief overview of the package - for a more thorough treatment see Fiske and Chandler (2011) and Kellner et al. (2023).

Table 1. Models handled by unmarked.
Model	Fitting Function	Data	Citation
Occupancy	occu	unmarkedFrameOccu	MacKenzie et al. (2002)
Royle-Nichols	occuRN	unmarkedFrameOccu	Royle and Nichols (2003)
Point Count	pcount	unmarkedFramePCount	Royle (2004a)
Distance-sampling	distsamp	unmarkedFrameDS	Royle et al. (2004)
Generalized distance-sampling	gdistsamp	unmarkedFrameGDS	Chandler et al. (2011)
Arbitrary multinomial-Poisson	multinomPois	unmarkedFrameMPois	Royle (2004b)
Colonization-extinction	colext	unmarkedMultFrame	MacKenzie et al. (2003)
Generalized multinomial-mixture	gmultmix	unmarkedFrameGMM	Royle (2004b)

3 Typical unmarked session

The first step is to import the data into R, which we do below using the read.csv function. Next, the data need to be formatted for use with a specific model fitting function. This can be accomplished with a call to the appropriate type of unmarkedFrame. For example, to prepare the data for a single-season site-occupancy analysis, the function unmarkedFrameOccu is used.

3.1 Importing and formatting data

Two examples of reading in data from a CSV and formatting for unmarked. First, from a CSV in “wide” format:

library(unmarked)
wt <- read.csv(system.file("csv","widewt.csv", package="unmarked"))
y <- wt[,2:4]
siteCovs <-  wt[,c("elev", "forest", "length")]
obsCovs <- list(date=wt[,c("date.1", "date.2", "date.3")],
    ivel=wt[,c("ivel.1",  "ivel.2", "ivel.3")])
wt <- unmarkedFrameOccu(y = y, siteCovs = siteCovs, obsCovs = obsCovs)
summary(wt)

## unmarkedFrame Object
## 
## 237 sites
## Maximum number of observations per site: 3 
## Mean number of observations per site: 2.81 
## Sites with at least one detection: 79 
## 
## Tabulation of y observations:
##    0    1 <NA> 
##  483  182   46 
## 
## Site-level covariates:
##       elev               forest               length      
##  Min.   :-1.436125   Min.   :-1.265e+00   Min.   :0.1823  
##  1st Qu.:-0.940726   1st Qu.:-9.744e-01   1st Qu.:1.4351  
##  Median :-0.166666   Median :-6.499e-02   Median :1.6094  
##  Mean   : 0.007612   Mean   : 8.798e-05   Mean   :1.5924  
##  3rd Qu.: 0.994425   3rd Qu.: 8.080e-01   3rd Qu.:1.7750  
##  Max.   : 2.434177   Max.   : 2.299e+00   Max.   :2.2407  
## 
## Observation-level covariates:
##       date                ivel        
##  Min.   :-2.904339   Min.   :-1.7533  
##  1st Qu.:-1.118624   1st Qu.:-0.6660  
##  Median :-0.118624   Median :-0.1395  
##  Mean   :-0.000217   Mean   : 0.0000  
##  3rd Qu.: 1.309947   3rd Qu.: 0.5493  
##  Max.   : 3.809947   Max.   : 5.9795  
##  NA's   :42          NA's   :46

Next, a CSV in “long” format:

pcru_raw <- read.csv(system.file("csv","frog2001pcru.csv", package="unmarked"))

sites <- unique(pcru_raw$RouteNumStopNum)
M <- length(sites)
J <- max(table(pcru_raw$RouteNumStopNum))
y <- JulianDate <- MinAfterSunset <- Wind <- Sky <- Temperature <- matrix(NA, M, J)
for (i in 1:length(sites)){
  datsub <- pcru_raw[pcru_raw$RouteNumStopNum == sites[i],]
  n <- nrow(datsub)
  y[i,1:n] <- datsub$Pcru
  JulianDate[i,1:n] <- datsub$JulianDate
  MinAfterSunset[i,1:n] <- datsub$MinAfterSunset
  Wind[i,1:n] <- datsub$Wind
  Sky[i,1:n] <- datsub$Sky
  Temperature[i,1:n] <- datsub$Temperature
}
obscovs <- list(JulianDate = JulianDate, MinAfterSunset = MinAfterSunset,
                Wind = Wind, Sky = Sky, Temperature = Temperature)
pcru <- unmarkedFrameOccu(y = y, obsCovs = obscovs)

3.2 Fitting models

Occupancy models can then be fit with the occu() function. To help stabilize the numerical optimization algorithm, we recommend standardizing covariates. The best way to do this is to use the scale() function on covariates inside your formulas.

fm1 <- occu(~1 ~1, pcru)

## Warning in truncateToBinary(dm$y): Some observations were > 1.  These were
## truncated to 1.

fm2 <- occu(~ scale(MinAfterSunset) + scale(Temperature) ~ 1, pcru)

## Warning in truncateToBinary(dm$y): Some observations were > 1.  These were
## truncated to 1.

fm2

## 
## Call:
## occu(formula = ~scale(MinAfterSunset) + scale(Temperature) ~ 
##     1, data = pcru)
## 
## Occupancy (logit-scale):
##  Estimate    SE    z  P(>|z|)
##      1.54 0.292 5.26 1.42e-07
## 
## Detection (logit-scale):
##                       Estimate    SE      z  P(>|z|)
## (Intercept)             0.2098 0.206  1.017 3.09e-01
## scale(MinAfterSunset)  -0.0855 0.160 -0.536 5.92e-01
## scale(Temperature)     -1.8936 0.291 -6.508 7.60e-11
## 
## AIC: 356.7591 
## Number of sites: 130

Here, we have specified that the detection process is modeled with the MinAfterSunset and Temperature covariates. No covariates are specified for occupancy here. See ?occu for more details.

3.3 Back-transforming parameter estimates

unmarked fitting functions return unmarkedFit objects which can be queried to investigate the model fit. Variables can be back-transformed to the unconstrained scale using predict, which also returns a 95% confidence interval. Since there are no occupancy covariates, all sites have the same occupancy estimate, and we can look at just the first row of the output.

predict(fm2, 'state')[1,]

##   Predicted         SE     lower     upper
## 1 0.8230724 0.04254069 0.7240711 0.8918579

The expected probability that a site was occupied is 0.823. This estimate applies to the hypothetical population of all possible sites, not the sites found in our sample. For a good discussion of population-level vs finite-sample inference, see Royle and Dorazio (2008) page 117. Note also that finite-sample quantities can be computed in unmarked using empirical Bayes methods as demonstrated at the end of this document.

Back-transforming the estimate of \(\psi\) was easy because there were no covariates. Because the detection component was modeled with covariates, \(p\) is a function, not just a scalar quantity, and so we need to be provide values of our covariates to obtain an estimate of \(p\). Here, we request the probability of detection given a site is occupied and all covariates are set to 0.

nd <- data.frame(MinAfterSunset = 0, Temperature = 0)
round(predict(fm2, type = 'det', newdata = nd, appendData = TRUE), 2)

##   Predicted SE lower upper MinAfterSunset Temperature
## 1         1  0  0.98     1              0           0

Thus, we can say that the expected probability of detection was 0.552 when time of day and temperature are fixed at their mean value. The predict metho can also be used to obtain estimates of parameters at a range of specific covariate values.

nd <- data.frame(MinAfterSunset = 0, Temperature = -2:2)
round(predict(fm2, type = 'det', newdata = nd, appendData=TRUE), 2)

##   Predicted SE lower upper MinAfterSunset Temperature
## 1      1.00  0  0.99     1              0          -2
## 2      1.00  0  0.99     1              0          -1
## 3      1.00  0  0.98     1              0           0
## 4      1.00  0  0.98     1              0           1
## 5      0.99  0  0.97     1              0           2

Confidence intervals are requested with confint, using either the asymptotic normal approximation or profiling.

confint(fm2, type='det')
confint(fm2, type='det', method = "profile")

##                               0.025      0.975
## p(Int)                   -0.1946872  0.6142292
## p(scale(MinAfterSunset)) -0.3985642  0.2274722
## p(scale(Temperature))    -2.4638797 -1.3233511

##                               0.025      0.975
## p(Int)                   -0.1929210  0.6208837
## p(scale(MinAfterSunset)) -0.4044794  0.2244221
## p(scale(Temperature))    -2.5189984 -1.3789261

3.4 Model selection and model fit

Model selection and multi-model inference can be implemented after organizing models using the fitList function.

fms <- fitList('psi(.)p(.)' = fm1, 'psi(.)p(Time+Temp)' = fm2)
modSel(fms)

##                    nPars    AIC  delta   AICwt cumltvWt
## psi(.)p(Time+Temp)     4 356.76   0.00 1.0e+00     1.00
## psi(.)p(.)             2 461.00 104.25 2.3e-23     1.00

predict(fms, type='det', newdata = nd)

##   Predicted          SE     lower     upper
## 1 0.9986698 0.001488063 0.9881751 0.9998518
## 2 0.9981420 0.001987729 0.9850156 0.9997723
## 3 0.9974055 0.002649463 0.9810172 0.9996504
## 4 0.9963781 0.003522940 0.9759617 0.9994638
## 5 0.9949460 0.004671453 0.9695782 0.9991783

The parametric bootstrap can be used to check the adequacy of model fit. Here we use a \(\chi^2\) statistic appropriate for binary data.

chisq <- function(fm) {
    umf <- fm@data
    y <- umf@y
    y[y>1] <- 1
    fv <- fitted(fm)
    sum((y-fv)^2/(fv*(1-fv)), na.rm=TRUE)
    }

set.seed(1)
(pb <- parboot(fm2, statistic=chisq, nsim=100, parallel=FALSE))

## 
## Call: parboot(object = fm2, statistic = chisq, nsim = 100, parallel = FALSE)
## 
## Parametric Bootstrap Statistics:
##    t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
## 1 356             22             17.4       0.0693
## 
## t_B quantiles:
##       0% 2.5% 25% 50% 75% 97.5% 100%
## [1,] 303  308 322 331 345   375  403
## 
## t0 = Original statistic computed from data
## t_B = Vector of bootstrap samples

We fail to reject the null hypothesis, and conclude that the model fit is adequate.

3.5 Derived parameters and empirical Bayes methods

The parboot function can be also be used to compute confidence intervals for estimates of derived parameters, such as the proportion of \(N\) sites occupied \(\mbox{PAO} = \frac{\sum_i z_i}{N}\) where \(z_i\) is the true occurrence state at site \(i\), which is unknown at sites where no individuals were detected. The colext vignette shows examples of using parboot to obtain confidence intervals for such derived quantities. An alternative way achieving this goal is to use empirical Bayes methods, which were introduced in unmarked version 0.9-5. These methods estimate the posterior distribution of the latent variable given the data and the estimates of the fixed effects (the MLEs). The mean or the mode of the estimated posterior distibution is referred to as the empirical best unbiased predictor (EBUP), which in unmarked can be obtained by applying the bup function to the estimates of the posterior distributions returned by the ranef function. The following code returns an estimate of PAO using EBUP.

re <- ranef(fm2)
EBUP <- bup(re, stat="mode")
sum(EBUP) / numSites(pcru)

## [1] 0.8076923

Note that this is similar, but slightly lower than the population-level estimate of \(\psi\) obtained above.

A plot method also exists for objects returned by ranef, but distributions of binary variables are not so pretty. Try it out on a fitted abundance model instead.

Overview of unmarked: an R Package for the Analysis of Data from Unmarked Animals

Ian Fiske

Richard Chandler

February 5, 2019