`vignettes/multiseason_models.Rmd`

`multiseason_models.Rmd`

Multiseason occupancy models, also called dynamic occupancy models,
are models that assume closure over
units that are linked by explicit colonization and extinction
processes.^{1} That is, sites are surveyed over a series
of more than one timestep,^{2} closure is assumed to hold within (but not
across) timesteps, and the latent occupancy state is linked across
timesteps within a site via models of local colonization and extinction
processes. Following the terminology introduced here,
we will refer to the units over which closure is assumed as
*closure-units* or just plain *units*. Groups of units
that are linked by colonization-extinction dynamics (i.e. units
representing different timesteps within a site) are *series*.

In `flocker`

, we implement these models using a hidden
markov model (HMM) approach to dealing with the unobserved occupancy
state in each separate series. Calling our approach a HMM doesn’t
fundamentally change the model; it serves mainly to highlight the
connection between multiseason occupancy models and well-developed
techniques for efficiently computing the likelihood for a series.^{3} The HMM
conceptualizes the latent occupancy state as a Markov-like process (the
process need not be strictly Markovian, because the transition
probabilities can vary by timestep), where the transition probabilities
between the two possible states (occupied or unoccupied) are a
colonization probability \(\mathcal{C}\), an extinction probability
\(\mathcal{E}\), and their complements
\(1 - \mathcal{C}\) and \(1 - \mathcal{E}\). The complement of the
extinction probability turns out to be particularly important; we call
it the *persistence* probability \(\mathcal{P} = 1 - \mathcal{E}\).

To specify the HMM, we need a model for the \(\mathcal{C}\), a model for \(\mathcal{E}\), and a model for the initial
occupancy probability \(\mathcal{O}\)
in the first timestep. We also need a model for the probability of
observing the data in any particular timestep conditional on the true
occupancy state in that timestep (in the HMM literature, this
probability is referred to as an *emission probability*).

*Emission probabilities* are just fancy HMM terminology for
the most familiar part of multiseason occupancy model. They are the
probability of observing the data within a unit given the latent
occupancy state, and they depend only on the detection sub-model, which
is no different from a single-season detection sub-model. In
`flocker`

, we assume that the logit-scale detection
probabilities vary according to a suite of covariates that can vary
across all levels of organization, including down to the visit level.
This is exactly the same way that `flocker`

treats the
detection sub-model in a single-season model.

In “colonization-extinction” models, logit colonization probabilities \(\mathcal{C}\) and logit extinction probabilities \(\mathcal{E}\) are both assumed to vary according to separate predictors based on covariates that may vary across timesteps within series but never across visits within closure-units.

In “autologistic” models, the occupancy probability in timestep \(t + 1\) is assumed to depend on a suite of site characteristics plus a term that depends on the latent true occupancy state at time \(t\). Some treatments write the autologistic model in terms of an “occupancy probability” that depends on covariates plus the “autologistic term” that gets added on the logit scale when the site was previously occupied. However, it is easier to understand this model in terms of its colonization and extinction probabilities. In fact, the so-called “occupancy probability” mentioned above is actually a colonization probability, because it reflects the probability of occupancy conditional on non-occupancy during the previous timestep. The logit-scale sum of the colonization probability and the autologistic term gives the probability of persistence \(\mathcal{P}\).

Thus, the autologistic model differs from the colonization-extinction model in that it assumes that colonization and extinction probabilities are related to one another via an offset that relates colonization to persistence. If the offset is modeled as constant, then we can think of this model as ascribing a “suitability” to each site that defines the position of a pair of probabilities \(\mathcal{C}\) and \(\mathcal{P}\) that differ by a constant on the logit scale. In practice, \(\mathcal{P}\) is almost always larger than \(\mathcal{C}\), and so the offset is positive.

In addition to the standard autologistic model that treats the offset
as a constant, `flocker`

also affords the flexibility to
model the offset via its own covariate-based predictor, which
reintroduces flexibility in the relationship between \(\mathcal{P}\) and \(\mathcal{C}\), but tends to retain the idea
that these probabilities ought to be related to one another and ought to
tend to move in tandem.

Most examples of multiseason occupancy models in the literature
include an explicit model for the initial occupancy probability \(\mathcal{O}\) via its own covariate-based
predictor, which is analogous to the occupancy predictor in a
single-season model. Thus, a colonization-extinction model would contain
predictors for detection, colonization, extinction, and initial
occupancy; an autologistic model would contain linear predictors for
detection, colonization, the colonization-persistence offset (often
modeled as a constant), and initial occupancy. `flocker`

can
fit either model.

Borrowing from the HMM literature, `flocker`

also comes
with optional functionality to handle \(\mathcal{O}\) differently. Any pair of
time-invariant colonization and extinction probabilities defines an
equilibrium proportion of sites that are occupied. If colonization and
extinction remain constant for a long time prior to the first timestep,
the initial occupancy probability should simply be this equilibrium
frequency, eliminating the need to parameterize and fit initial
occupancy separately from colonization and extinction.
`flocker`

is equipped to fit models without any separate
predictor for initial occupancy, and instead to begin with the
equilibrium frequencies implied by the colonization and extinction
probabilities modeled for the first timestep. Note that this approach is
sometimes valid even if the colonization and extinction probabilities
begin to vary during the timeseries being modeled, as long as they are
invariant for a substantial period up to and including the first year
modeled. As a concrete example, consider a study system that begins in
an equilibrium condition, but in which some sites burn in a forest fire
partway through the study. If the impact of the burn on colonization and
extinction probabilities is modeled adequately, then the modeled
probabilities prior to the burn will reflect the equilibrium
colonization and extinction probabilities in the pre-burn state. It is
for this reason that, when using equilibrium frequencies as initial
occupancy probabilities, `flocker`

always computes the
equilibrium based on the probabilities modeled for the first
timestep.

As an aside, we somewhat regularly encounter confusion that conflates initial occupancy probabilities \(\mathcal{O}\) with colonization probabilities \(\mathcal{C}\). This confusion apparently arises from the notion that the autologistic model consists of a single-season occupancy model plus an autologistic term, which leads people to incorrectly surmise that the linear predictor without the autologistic term gives an occupancy probability. It does not; it gives a colonization probability, which is typically much lower. Any modeling exercise that enforces equality between initial occupancy probabilities and subsequent colonization probabilities should be viewed skeptically.

Let’s simulate some data that is valid for use with multiseason
models. Here, we will simulate data for three seasons with one unit
covariate and one event covariate. The data will be simulated under a
colonization-extinction model with explicit inits, but we will be able
to fit other models (autologistic, equilibrium inits) to the same
data.^{4}

```
library(flocker)
multi_data <- simulate_flocker_data(
n_season = 3,
n_pt = 300,
n_sp = 1,
multiseason = "colex",
multi_init = "explicit",
seed = 1
)
fd <- make_flocker_data(
multi_data$obs,
multi_data$unit_covs,
multi_data$event_covs,
type = "multi",
quiet = TRUE
)
```

Here’s the colonization-extinction model with an explicit model for occupancy in the first timestep. Depending on hardware, fitting this model might take 1-5 minutes.

```
multi_colex <- flock(
f_occ = ~ uc1,
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_ex = ~ uc1,
flocker_data = fd,
multiseason = "colex",
multi_init = "explicit",
backend = "cmdstanr",
cores = 4
)
```

Here’s the colonization-extinction model using equilibrium occupancy probabilities in the first timestep:

```
multi_colex_eq <- flock(
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_ex = ~ uc1,
flocker_data = fd,
multiseason = "colex",
multi_init = "equilibrium",
backend = "cmdstanr",
cores = 4
)
```

Here’s the autologistic model with explicit occupancy probabilities
in the first timestep. To reflect the stereotypical autologistic model
with a constant logit-scale offset separating colonization and
persistence probabilities, we use the formula `f_auto = ~ 1`

,
but it is fine to relax this constraint and use,
e.g. `f_auto = ~ uc1`

.

```
multi_auto <- flock(
f_occ = ~ uc1,
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_auto = ~ 1,
flocker_data = fd,
multiseason = "autologistic",
multi_init = "explicit",
backend = "cmdstanr",
cores = 4
)
```

And finally the autologistic model with equilibrium occupancy probabilities in the first timestep:

```
multi_auto_eq <- flock(
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_auto = ~ 1,
flocker_data = fd,
multiseason = "autologistic",
multi_init = "equilibrium",
backend = "cmdstanr",
cores = 4
)
```

We can reconstruct the occupancy probabilities at every unit using
the `get_Z`

function:

`Z <- get_Z(multi_colex_eq)`

We can generate posterior predictions:

`pp <- predict_flocker(multi_colex_eq)`

We can compare the fit of these various models using approximate leave-one-out cross-validation, where the holdouts consist of entire series (leave-one-series-out cross-validation).

```
loo_out <- loo_compare_flocker(
list(multi_colex, multi_colex_eq, multi_auto, multi_auto_eq)
)
loo_out
```

See the main tutorial vignette for an example of the resulting output (this vignette does not actual run the models to save on computational resources).

As we will see, this is true even of multiseason models with autologistic occupancy dynamics↩︎

It’s fine if some sites are surveyed for just one timestep, but at least some sites are surveyed for multiple timesteps.↩︎

For the curious, we use the forward algorithm to compute the likelihood, and the forward-backward algorithm to compute unit-wise posterior distributions for the latent occupancy state, and the forward-filtering-backward-sampling algorithm to draw valid posterior samples for the sequence of latent states↩︎

note that

`simulate_flocker_data()`

can also simulate directly from these other model types.↩︎