flocker
is an R package for fitting occupancy models. To date, software for occupancy modeling has required users either to work directly with probabilistic programming languages like Stan or JAGS, or to restrict themselves to simple effects structures in packages like unmarked
. flocker
changes that, providing occupancy modelers with a simple formula-based syntax for sophisticated model structures. Based on highly optimized Stan code, flocker
is also fast, especially for large models.
flocker
is built on R package brms
, which in turn is a front-end for Stan
. Thus, mastering flocker
is mostly a matter of mastering the formula syntax available in brms
.
In the remainder of this vignette, we
flocker
brms
formula syntax, with links to additional documentation for advanced topicsflocker
’s functionality for posterior prediction and model comparison.Installation instructions are available here. To request features or report bugs (much appreciated!), please open an issue on GitHub.
The following terms feature importantly in this vignette. Some are not standard in the literature (but we think maybe they should be):
closure-unit: The groupings of observations over which closure is assumed. In single-species models, a closure-unit corresponds to a “site” or “point”. In multi-species models, a closure-unit is a species-site combination. In single-species dynamic models (not yet implemented in flocker
), a closure-unit is a site-season or site-year combination.
Z: The (unobserved) true occupancy state of each closure-unit. We can represent Z as a vector of ones and zeros with one element for each closure-unit: a one if occupied; a zero if unoccupied.
\(\boldsymbol{\psi}\), \(\boldsymbol{\theta}\): The occupancy (\(\psi\)) and detection (\(\theta\)) probabilities. In many models, both \(\psi\) and \(\theta\) will vary across closure-units. In some models \(\theta\) will additionally vary across repeated sampling events within a closure-unit.
Q The (observed) detection/nondetection state of each closure-unit (i.e. does the unit have at least one detection in the data or not). As for Z, we represent Q as a vector of ones and zeros.
rep-constant, rep-varying: We refer to models where \(\theta\) is constant across repeated sampling events within closure-units as rep-constant models, as contrasted with rep-varying models that incorporate event-specific detection covariates. It turns out that rep-constant models enable a more efficient parametrization of the likelihood than rep-varying models.
unit covariates, event covariates: We refer to any covariate that does not vary across sampling events within closure-units as a “unit covariate”. This includes covariates that are intrinsically properties of single closure-units (e.g. the elevations of sites in a single-species model), covariates that are intrinsically properties of groups of closure units (e.g. elevations of sites in a multispecies model), and covariates that are intrinsically properties of sampling events but happen to be constant within all closure-units (e.g. observer in a sampling design where every site is visited by exactly one observer). We refer to any covariate that varies across sampling events within covariates as an “event covariate”. Note that while unit covariates may appear in either the occupancy or the detection formula, event covariates are restricted to the detection formula. Models that incorporate event covariates are rep-varying (see above); those that do not are rep-constant.
The main function in flocker
for fitting occupancy models, called flock()
, expects a highly specific and somewhat peculiar data format. The function make_flocker_data()
formats data for use with flock()
automatically. At a minimum, make_flocker_data()
expects a matrix or dataframe of detection/non-detection data. Rows represent closure-units, columns represent repeated sampling events within closure-units, and entries must be 0
(nondetection), 1
(detection), or NA
(no corresponding sampling event). The data must be formatted so that all NA
s are trailing within their rows. For example, if some units were sampled four times and other three times, the three sampling events must be treated as events 1, 2, and 3 (with the fourth event NA
) rather than as events 1, 3, and 4 (with the second event NA
) or any other combination.
Many occupancy models also include covariates that influence occupancy or detection probabilities. Unit covariates (see Terms and definitions above) can be passed to make_flocker_data()
as a dataframe with the same number of rows as the observation matrix and data in the same order as the rows of the observation matrix. Columns are covariates, and we recommend using informative column names. Event covariates (see Terms and definitions above) can be passed as a named list of matrices whose elements [i, j]
are the covariate values for the sampling event represented by the corresponding position of the observation matrix. Again, we recommend using informative names for the list elements. If the corresponding observation is NA
, then the value of the event covariate does not matter.
Here’s an example of how we format data, using example data provided via the example_flocker_data()
function:
library(flocker)
ex_data <- example_flocker_data()
names(ex_data)
#> [1] "obs" "unit_covs" "event_covs"
names(ex_data$event_covs)
#> [1] "ec1" "ec2"
head(ex_data$obs) # observation matrix
#> [,1] [,2] [,3] [,4]
#> [1,] 0 0 0 0
#> [2,] 0 0 0 0
#> [3,] 0 0 0 0
#> [4,] 0 0 0 0
#> [5,] 0 0 0 1
#> [6,] 0 0 0 0
head(ex_data$unit_covs) # observation matrix
#> uc1 uc2 grp species
#> 1 -0.5604756 0.4264642 7 sp_1
#> 2 -0.5604756 0.4264642 3 sp_2
#> 3 -0.5604756 0.4264642 15 sp_3
#> 4 -0.5604756 0.4264642 5 sp_4
#> 5 -0.5604756 0.4264642 8 sp_5
#> 6 -0.5604756 0.4264642 19 sp_6
head(ex_data$event_covs$ec1)
#> [,1] [,2] [,3] [,4]
#> [1,] 0.3215200 0.1058816 -0.8922459 1.7166491
#> [2,] 0.3215200 1.7166491 0.1058816 -0.8922459
#> [3,] 0.3215200 -0.8922459 1.7166491 0.1058816
#> [4,] 0.3215200 0.1058816 -0.8922459 1.7166491
#> [5,] 0.1058816 0.3215200 -0.8922459 1.7166491
#> [6,] 0.3215200 0.1058816 1.7166491 -0.8922459
flocker_data <- make_flocker_data(obs = ex_data$obs,
unit_covs = ex_data$unit_covs,
event_covs = ex_data$event_covs)
Once we’ve formatted data with make_flocker_data()
, we are ready to fit an occupancy model using the flock()
function. Internally, flock
calls brms::brm()
, and the key to mastering flock()
is to master the formula synax from brms
. We supply formulas for both occupancy and detection. Simple formulas follow the same syntax as R’s lm()
function. For example:
flock(f_occ = ~ uc1,
f_det = ~ 1,
flocker_data = flocker_data)
Simple random effects follow lme4
syntax, including advanced lme4
syntax is supported, including ||
for uncorrelated effects and /
and :
for expansion of multiple grouping terms. Here’s a simple example:
flock(f_occ = ~ uc1 + (1 | species),
f_det = ~ 1,
flocker_data = flocker_data)
When a model includes multiple random effects with the same grouping term, by default they are modeled as correlated within the occupancy or detection formulas, but as uncorrelated between formulas. For example, the code below estimates a single correlation for the intercept and slope in the occupancy sub-model.
flock(f_occ = ~ uc1 + (1 + uc1 | species),
f_det = ~ ec1 + (1 | species),
flocker_data = flocker_data)
However, this assumption can easily be relaxed using the |<ID>|
syntax from brms
. The ID
is an arbitrary character string representing a group of terms to model as correlated. The below code, for example, models correlated intercepts in the occupancy and detection sub-models, and correlated effects of sc1
on occupancy and vc1
on detection, but no correlations between the intercepts and the slopes in either sub-model:
flock(f_occ = ~ uc1 + (1 |g1| species) + (0 + uc1 |g2| species),
f_det = ~ ec1 + (1 |g1| species) + (0 + ec1 |g2| species),
flocker_data = flocker_data)
For more on brms
syntax for random effects syntax, see the documentation here.
Via brms
, flocker
supports mgcv
syntax for thin-plate regression splines (brms::s()
) and tensor product smooths (brms::t2()
). For example:
flock(f_occ = ~ s(uc1),
f_det = ~ 1,
flocker_data = flocker_data)
brms
is capable of fitting a variety of additional effects structures. We believe that the following structures should translate directly to flocker
, but these remain untested. As we test them and verify adequate performance, we will update this vignette with examples.
Phylogenetic effects can be included by providing a covariance matrix as a data2
argument and using the brms::gr()
function to link species identities in flocker_data
with the supplied covariance matrix. Note that phylogenetic effects can be included in either the occupancy component, the detection component, or both!
# simulate an example phylogeny
phylogeny <- ape::rtree(30, tip.label = paste0("sp_", 1:30))
# calculate covariance matrix
A <- ape::vcv.phylo(phylogeny)
ff1 <- flock(f_occ = ~ 1 + (1|gr(species, cov = A)),
f_det = ~ 1 + ec1 + (1|species),
flocker_data = flocker_data,
data2 = list(A = A))
ff2 <- flock(f_occ = ~ 1 + (1|gr(species, cov = A)),
f_det = ~ 1 + ec1 + (1|gr(species, cov = A)),
flocker_data = flocker_data,
data2 = list(A = A))
See here for further details about specifying phylogenetic effects in brms
.
See here for details about conditional autoregressive (CAR) models in brms
. Note that if the spatial effect is applied to occupancy, it is essential closure-units be grouped such that many groups contain more than one unit. With just one unit per group (the brms
default if no grouping is supplied), the logit-scale residual is not identified. Note that flock()
directly accepts a data2
argument that it can pass to brms
as necessary.
See here for relevant brms
documentation.
See here for relevant brms
documentation.
flock
will pass any relevant parameters forward to brms::brm()
, giving the user important control over the algorithmic details of how the model is fit. See ?brms::brm
for details. To speed up the execution, we recommend supplying the argument backend = "cmdstanr"
. This requires the cmdstanr package and a working installation of cmdstan; see here for instructions to get started and further details.
Priors can be implemented as they would with any brms
model. Priors can be specified using set_prior()
, with priors specified for groups of parameters (via class
) or individual parameters (via coef
). The priors used for a particular model can be retrieved using prior_summary()
user_prior <- c(brms::set_prior("normal(0, 3)", class="b"),
brms::set_prior("normal(0, 2)", class="Intercept"),
brms::set_prior("normal(0, 1)", coef="ec1"))
ff <- flock(f_occ = ~ uc1 + uc2 + (1|species),
f_det = ~ ec1 + ec2 + (1|species),
flocker_data = flocker_data,
prior = user_prior)
brms::prior_summary(ff)
Note that if there are parameters shared between both the occupancy and detection model formulas, e.g.
ff <- flock(f_occ = ~ uc1 + (1|species),
f_det = ~ uc1 + ec2 + (1|species),
flocker_data = flocker_data, prior=user_prior)
then there will be two entries for each of the shared parameters in the prior table (uc1
in this example). Specifying a prior for each parameter individually can be done with reference to the dpar
column, e.g.:
user_prior <- c(brms::set_prior("normal(0, 1)", coef = "uc1"),
brms::set_prior("normal(0, 3)", coef = "uc1", dpar = "occ"))
where the uc1
parameter in the occupancy component is specified by the addition of the dpar
argument, and the uc1
parameter in the detection component is specified without reference to dpar
.
For more on priors in brms
, see ?brms::set_prior
.
flocker
provides functions for three main types of post-processing. get_Z()
provides the posterior distribution for the latent occupancy state. predict_flocker()
provides posterior predictions at the observed points (e.g. for use in posterior predictive checking) or for new data. loo_flocker()
and loo_compare_flocker()
both provide functionality for model comparison. See below for details on all three types of post-processing. Both posterior predictions and model comparison rely on subtle aspects of the occupancy model likelihood that we explain in more detail here.
The function get_Z()
returns the posterior distribution of occupancy probabilities across the closure-units. The output is a matrix where rows are posterior iterations, columns are closure-units, and values are draws from the posterior distribution of occupancy probabilities. For example:
get_Z()
accepts several additional arguments that control the way that posterior is obtained and the values of returned. See ?get_Z
for details.
The funtion predict_flocker()
provides posterior predictions. By default, predictions are provided for the covariate data to which the model were fit, but predictions to new data are also possible via the new_data
argument. The output differs for rep-constant and rep-varying models. For rep-constant models, a matrix where rows are iterations, columns are units, and values are the number of detections. For rep-varying models, an array whose first dimension is units, second dimension is sampling events, third dimension is iterations, and values are 1, 0, or NA, representing detection, nondetection, and no corresponding sampling event. For example:
ff <- flock(f_occ = ~ uc1,
f_det = ~ 1,
flocker_data = flocker_data)
predict_flocker(ff)
predict_flocker()
accepts several additional arguments that control the way that posterior is obtained and the values of returned. See ?predict_flocker
for details.
flocker
supports computationally efficient approximate leave-one-out cross-validation via R package loo
using a method commonly known as PSIS-LOO. This method generally provides superior performance to other computationally efficient model performance criteria (e.g. WAIC), and it comes with diagnostics that signal when the approximation is unreliable. Importantly, when diagnostics indicate that PSIS-LOO is unreliable, other model-comparison metrics such as WAIC are even less reliable. For details about PSIS-LOO, see ?loo::loo
package documentation and:
The most straightforward way to compare models fit with flocker
is the function loo_compare_flocker()
. This function takes a list of flocker_fit objects as its argument and returns a model comparison table based on the difference in the expected log predictive density (elpd) between models. This table is a compare.loo
object from loo::loo_compare()
. The “leave-one-out” holdouts consist of entire closure-units, not single sampling events (see here for details of why).
ff1 <- flock(f_occ = ~ uc1,
f_det = ~ 1,
flocker_data = flocker_data)
ff2 <- flock(f_occ = ~ uc1,
f_det = ~ ec1 + ec2,
flocker_data = flocker_data)
loo_compare_flock(list(ff1 = ff1, ff2 = ff2))
Flocker also provides the function loo_flocker()
to return a table of elpd_loo
, p_loo
, and looic
estimates from loo::loo()
or brms::loo()
(the former for rep-varying models, the latter for rep-constant models).
For more about PSIS-LOO with flocker_fit
objects, see flocker’s LOO vignette.