ubms
The ubms
package fits models of wildlife occurrence and abundance in Stan (Carpenter et al. 2017), in a similar fashion to the unmarked
package (Fiske, Chandler, and others 2011). One of the advantages of ubms
is that it is possible to include random effects in your models, using the same syntax as lme4 (Bates et al. 2015). For example, if you have a group
site covariate, you can fit a model with random intercepts by group
by including + (1|group)
in your parameter formula. Random slopes, or a combination of random slopes and intercepts, are also possible. To illustrate the use of random effects of ubms
, this vignette fits a model to multi-season occupancy data using a “stacked” approach.
Suppose you have a dataset of repeated detections/non-detections or counts that are collected over several primary periods (i.e., seasons). The logical model choice for such data is a multi-season model, such as the dynamic occupancy model (MacKenzie et al. 2003) or some form of Dail-Madsen model for count data (Dail and Madsen 2011). These models estimate transition probabilities such as colonization and extinction rates between seasons.
However, in some cases you might not want to fit a dynamic model. There are several potential reasons for this: (1) You don’t have enough data (Dail-Madsen type models are particularly data hungry); (2) You aren’t interested in the transition probabilities; or (3) The dynamic model type you need isn’t available in theory or in your software package of choice.
An alternative approach is to fit multiple years of data into a single-season model using the “stacked” approach. Essentially, you treat unique site-year combinations as sites. For a helpful discussion on the topic, see this thread on the unmarked
forums.
Ideally you want to control for the pseudoreplication this creates in some form. In unmarked
you are limited to approaches such as including a dummy variable for site and/or year. In ubms
you can instead include, for example, random site intercepts to account for this pseudoreplication.
ubms
We will use the crossbill
dataset to illustrate a stacked occupancy model with a site-level random effect. The crossbill
dataset comes packaged with ubms
via unmarked
:
library(ubms)
data(crossbill)
The crossbill
dataset is a data.frame
with many columns. It contains 9 years of detection/non-detection data for the European crossbill (Loxia curvirostra) in Switzerland (Schmid, Zbinden, and Keller 2004).
dim(crossbill)
## [1] 267 58
names(crossbill)
## [1] "id" "ele" "forest" "surveys" "det991" "det992" "det993"
## [8] "det001" "det002" "det003" "det011" "det012" "det013" "det021"
## [15] "det022" "det023" "det031" "det032" "det033" "det041" "det042"
## [22] "det043" "det051" "det052" "det053" "det061" "det062" "det063"
## [29] "det071" "det072" "det073" "date991" "date992" "date993" "date001"
## [36] "date002" "date003" "date011" "date012" "date013" "date021" "date022"
## [43] "date023" "date031" "date032" "date033" "date041" "date042" "date043"
## [50] "date051" "date052" "date053" "date061" "date062" "date063" "date071"
## [57] "date072" "date073"
Check ?crossbill
for details about each column. The first three columns id
, ele
, and forest
are site covariates.
The following 27 columns beginning with det
are the binary detection/non-detection data; 9 years with 3 observations per year. The final 27 columns beginning with date
are the Julian dates for each observation.
We will use the first 3 years of crossbill
data (instead of all 9), simply to keep the analysis run time down. Converting the crossbill
data to stacked format is a bit complex. The dataset contains 267 unique sites; thus after stacking we should end up with a response variable and covariates that contain 267 * 3 = 801
“sites” (actually site-years). We will order this new dataset so that the first 267 rows are the sites in year 1, the 2nd 267 rows are the sites in year 2, and so on.
Handling the site-level covariates (which do not change between years) is the easiest task. We simply replicate the set of site covariates (which contains one row for each of the original 267 sites) one time per season, and stack each replicate on top of each other vertically with rbind
.
crossbill[,c("id", "ele", "forest")]
site_covs <- rbind(site_covs, site_covs, site_covs) sc_stack <-
We also want to add a factor column called site
to the stacked site covariates that identifies the original site number of each row. We will use this later as our grouping factor for the random effect
$site <- factor(rep(1:nrow(site_covs), 3))
sc_stackhead(sc_stack)
## id ele forest site
## 1 1 450 3 1
## 2 2 450 21 2
## 3 3 1050 32 3
## 4 4 950 9 4
## 5 5 1150 35 5
## 6 6 550 2 6
Stacking the response variable and the observation covariates is harder. Our dataset is in a “wide” format where each row is a site and each observation is a column, with columns 1-3 corresponding to year 1, 4-6 to year 2, and so on. Here is a function that splits a “wide” dataset like this into pieces and stacks them on top of each other.
function(input_df, nyears, surveys_per_year){
wide_to_stacked <- split(1:(nyears*surveys_per_year), rep(1:nyears, each=surveys_per_year))
inds <- lapply(1:nyears, function(i){
split_df <- input_df[,inds[[i]]]
out <-$site <- 1:nrow(input_df)
out$year <- i
outnames(out)[1:3] <- paste0("obs",1:3)
out
}) do.call("rbind", split_df)
stack_df <-$site <- as.factor(stack_df$site)
stack_df$year <- as.factor(stack_df$year)
stack_df
stack_df }
This function can be used to convert both the detection/non-detection data and observation covariates to the stacked format. First, we isolate the detection/non-detection data in crossbill
:
crossbill[, grep("det", names(crossbill), value=TRUE)] y_wide <-
Next we convert it to stacked format, specifying that we want only the first 3 years, and that each year has 3 observations/surveys:
wide_to_stacked(y_wide, nyears=3, surveys_per_year=3)
y_stack <-dim(y_stack)
## [1] 801 5
head(y_stack)
## obs1 obs2 obs3 site year
## 1 0 0 0 1 1
## 2 0 0 0 2 1
## 3 NA NA NA 3 1
## 4 0 0 0 4 1
## 5 0 0 0 5 1
## 6 NA NA NA 6 1
Finally, we do the same with the date
observation covariate.
crossbill[,grep("date", names(crossbill), value=TRUE)]
date_wide <- wide_to_stacked(date_wide, 3, 3)
date_stack <-dim(date_stack)
## [1] 801 5
With our stacked datasets constructed, we build our unmarkedFrame
:
unmarkedFrameOccu(y=y_stack[,1:3], siteCovs=sc_stack,
umf_stack <-obsCovs=list(date=date_stack[,1:3]))
head(umf_stack)
## Data frame representation of unmarkedFrame object.
## y.1 y.2 y.3 id ele forest site date.1 date.2 date.3
## 1 0 0 0 1 450 3 1 34 59 65
## 2 0 0 0 2 450 21 2 17 33 65
## 3 NA NA NA 3 1050 32 3 NA NA NA
## 4 0 0 0 4 950 9 4 29 59 65
## 5 0 0 0 5 1150 35 5 24 45 65
## 6 NA NA NA 6 550 2 6 NA NA NA
## 7 0 0 0 7 750 6 7 26 54 74
## 8 0 0 0 8 650 60 8 23 43 71
## 9 0 0 0 9 550 5 9 21 36 56
## 10 0 0 0 10 550 13 10 37 62 75
We’ll now fit a model with fixed effects of elevation and forest cover (ele
and forest
) on occupancy and a date
effect on detection. In addition, we will include random intercepts by site
, since in stacking the data we have pseudoreplication by site. To review, random effects are specified using the approach used in with the lme4
package. For example, a random intercept for each level of the covariate site
is specified with the formula component (1|site)
. Including random effects in a model in ubms
usually significantly increases the run time.
stan_occu(~scale(date) ~scale(ele) + scale(forest) + (1|site),
fit_stack <-data=umf_stack, chains=3, iter=500)
fit_stack
##
## Call:
## stan_occu(formula = ~scale(date) ~ scale(ele) + scale(forest) +
## (1 | site), data = umf_stack, chains = 3, iter = 500, refresh = 0)
##
## Occupancy (logit-scale):
## Estimate SD 2.5% 97.5% n_eff Rhat
## (Intercept) -1.60 0.218 -2.061 -1.21 146 0.998
## scale(ele) 1.13 0.217 0.729 1.55 312 1.004
## scale(forest) 1.49 0.223 1.066 1.94 109 1.002
## sigma [1|site] 1.95 0.319 1.387 2.66 44 1.011
##
## Detection (logit-scale):
## Estimate SD 2.5% 97.5% n_eff Rhat
## (Intercept) 0.182 0.0980 -0.00664 0.372 1082 0.998
## scale(date) 0.337 0.0886 0.15452 0.500 1767 0.997
##
## LOOIC: 1473.761
We get warnings; these should be fixed by increasing the iterations. In addition to fixed effect estimates, we now have an estimate for the site-level variance (sigma [1|site]
) in our summary table.
In order to get the actual random intercept values, we use the ranef
function. Note that this function behaves like the lme4
version, not like the unmarked
version. A further caution is that when using an effects parameterization, ranef
always returns the complete random intercept/slope term for a group (i.e., the mean + random effect, not just the random part).
ranef(fit_stack, submodel="state")
ran <-head(ran$site[[1]])
## 1 2 3 4 5 6
## -1.89508002 -1.99838584 -0.30322435 0.06205105 0.39474369 -1.72122316
You can also generate summary statistics for each random intercept:
ranef(fit_stack, submodel="state", summary=TRUE)
ran <-head(ran$site[[1]])
## Estimate SD 2.5% 97.5%
## 1 -1.89508002 1.785040 -5.680523 1.561208
## 2 -1.99838584 1.753822 -5.530354 1.201788
## 3 -0.30322435 1.373765 -3.039763 2.464200
## 4 0.06205105 1.336437 -2.665990 2.442716
## 5 0.39474369 1.214734 -1.977876 3.013954
## 6 -1.72122316 1.753082 -5.287677 1.521753
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1). https://doi.org/10.18637/jss.v067.i01.
Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). https://doi.org/10.18637/jss.v076.i01.
Dail, D., and L. Madsen. 2011. “Models for Estimating Abundance from Repeated Counts of an Open Metapopulation.” Biometrics 67: 577–87. https://doi.org/10.1111/j.1541-0420.2010.01465.x.
Fiske, Ian, Richard Chandler, and others. 2011. “Unmarked: An R Package for Fitting Hierarchical Models of Wildlife Occurrence and Abundance.” Journal of Statistical Software 43 (10): 1–23. https://doi.org/10.18637/jss.v043.i10.
MacKenzie, Darryl I., James D. Nichols, James E. Hines, Melinda G. Knutson, and Alan B. Franklin. 2003. “Estimating Site Occupancy, Colonization, and Local Extinction When a Species Is Detected Imperfectly.” Ecology 84: 2200–2207. https://doi.org/10.1890/02-3090.
Schmid, Hans, Niklaus Zbinden, and Verena Keller. 2004. “Überwachung Der Bestandsentwicklung Häufiger Brutvögel in Der Schweiz.” Swiss Ornithological Institute Sempach Switzerland.