Introduction

Stan doesn’t allow discrete unknown parameters in its models1. This limitation2 prevents us from fitting occupancy models under the parameterization commonly used by modelers in JAGS. The issue is that the unknown occupancy states (the Z-matrix in community models) are discrete, and this is not allowed in Stan.

The solution is to marginalize out the discrete occupancy states. There are a few pages floating around the web that show us how to do this. I initially found those pages frustrating to understand, which is my motivation for writing this up. The Fukami Lab page doesn’t really explain where the marginalization happens or how it works. Bob Carpenter’s translation of the Dorazio data-augmented multi-species model does the marginalization very quickly in heavy math symbology, and moreover relies on a lack of sampling-event-specific detectability covariates to do so. To be fair, neither of these pages’ main purpose is to explain marginalization for beginners, so no complaint!

What I really wanted was a satisfying explanation of the marginalization written in JAGS/BUGS language first, and translated to Stan second. That’s what I’ve tried to produce here. We’ll start with the likelihood for a single-species, single-season model in JAGS. Then we’ll marginalize, still in JAGS. Then we’ll translate to Stan3. We’ll write the Stan implementation in such a way that it’s pretty clear how to generalize to a multi-species model. Once the reader is comfortable with the multi-species model in Stan, she should be able to follow Bob Carpenter’s data-augmented version, even if she requires event-specific detection covariates.

The familiar JAGS/BUGS model

The familiar parameterization of the single-species, single-season occupancy model has a likelihood of the following form:

for(i in 1:n_unit){
  Z[i] ~ dbern(psi[i])                                       # Line A
  logit(psi[i]) <- a0 + a1*cov1[i] + a2*cov2[i]
  for(j in 1:n_rep){
    det_data[i,j] ~ dbern(Z[i]*theta[i,j])                   # Line B
    logit(theta[i,j]) <- b0 + b1*cov1[i] + b2*cov3[i,j]
  }
}

This particular implementation contains two occupancy covariates and two detection covariates, one of which is unit-specific and one of which is event-specific. Here unit refers to a closure-unit, the sampling unit over which closure is assumed. This corresponds to a site in a single-species model or a species-site in a multispecies model.

The marginalized JAGS/BUGS model

Our goal is to reparameterize the model to remove the latent occupancy state \(Z\). To do so, we need to keep track of units with at least one detection versus units with none. For now, we can simply sort the units in our data such that all \(N\) units with at least one detection come first, followed by the \(n\_unit - N\) units with no detections. We then split the likelihood into two parts: one for units with at least one detection, and another for units with no detections.

Units with a detection

We know that \(Z == 1\). Thus, we can write lines A and B of the likelihood above as:

1 ~ dbern(psi[i])             # Line A
det_data[i,j] ~ dbern(theta[i,j])   # Line B

This formulation has no \(Z\) in it and thus no discrete unknowns!

Units with no detection

Things are a bit trickier, because we don’t know whether \(Z\) is one or zero. We do know, however, that \(Z\) is either one or zero, but not both. Thus, the general strategy will be to write down the likelihood when \(Z == 0\), then to write down the likelihood when \(Z == 1\), and then to add those likelihoods together4.

One way to observe the data is via a pathway where \(Z == 0\). When \(Z == 0\), line B does not matter, and only line A contributes to the likelihood. So the likelihood associated with this pathway corresponds to the statement:

0 ~ dbern(psi[i])            # Line A

or equivalently

1 ~ dbern(1 - psi[i])         # Line A

The other way to observe the data is via a pathway where \(Z == 1\). In that case, both lines matter. Note that because the unit has no detections, \(det\_data[i,j]\) is always zero, so we can write:

ones[i] ~ dbern(psi[i])     # Because Z=1;            line A
zeros[i] ~ dbern(theta[i,j])    # Because x[i,j] is zero    line B

We rewrite the second line as:

ones[i] ~ dbern(1 – theta[i,j])         # line B

Since detection (conditioned on covariates) on repeat sampling events is independent5, we can multiply the probabilities6. Note that we eliminate the loop over j and replace it with an explicit term for each sampling event.

ones[i] ~ dbern(psi[i] * (1 – theta[i,1]) * (1 - theta[i,2]) * ... * (1 - theta[i,n_rep]))          # line B

Finally, we can combine the two cases (\(Z == 1\) and \(Z == 0\)) in a single sampling statement by adding the probabilities7.

ones[i] ~ dbern(1-psi[i] + psi[i] * (1-theta[i,1]) * (1-theta[i,2]) * ... * (1-theta[i,n_rep]))  # lines A & B

Putting it all together yields a full marginalized JAGS likelihood, shown here for a model with two occupancy covariates, two detection covariates, and four repeat sampling events at each unit.

for(i in 1:N){
  ones[i] ~ dbern(psi[i])
  logit(psi[i]) <- a0 + a1*cov1[i] + a2*cov2[i]
  for(j in 1:n_rep){
    det_data[i,j] ~ dbern(theta[i,j])
    logit(theta[i,j]) <- b0 + b1*cov1[i] + b2*cov3[i,j]
  }
}
for(i in (N+1):n_unit){
  ones[i] ~ dbern(1-psi[i] + psi[i] * (1-theta[i,1]) * (1-theta[i,2]) * (1-theta[i,3]) * (1-theta[i,4]))
  logit(psi[i]) <- a0 + a1*cov1[i] + a2*cov2[i]
  for(j in 1:n_rep){
    logit(theta[i,j]) <- b0 + b1*cov1[i] + b2*cov3[i,j]
  }
}

Note that there are no \(Z\)’s in this likelihood!

In my view, the key trick in the marginalization is splitting the likelihood at units without detections into separate cases for \(Z == 1\) and \(Z == 0\), and then writing down a single statement that subsumes both paths towards observing a zero. Syntactically, this shows up as the elimination of sampling statements from the for-loop over the repeat sampling events (the j-indexed for-loop over events exists only to compute theta)8. The crux of the marginalization is to remove \(Z\) while nevertheless ensuring that the likelihood of the data for a single unit cannot receive simultaneous contributions from \(Z == 0\) on one sampling event and \(Z == 1\) on another event.

Once we fit the model, we can still recover our posterior Z-matrix. Posterior \(Z\) is one at every unit with at least one detection. At every other unit, the likelihood for \(Z == 1\) is \(\psi_i * \prod_{j = 1}^{n\_rep} (1 - \theta_{ij})\), whereas the likelihood for \(Z == 0\) is \(1 - \psi_i\). Thus, posterior Z is one with probability \(\frac{\psi_i * \prod_{j = 1}^{n\_rep} (1 - \theta_{ij})}{(1 - \psi_i) + \psi_i * \prod_{j = 1}^{n\_rep} (1 - \theta_{ij})}\) and zero otherwise. So we just monitor psi and theta and compute the Z-matrix post-hoc.

The Stan model

A very quick Stan primer

This section assumes that the reader knows how to run a Stan program on their computer (e.g. via rstan), a bit of Stan syntax, the general structure of a Stan program, and how to search the Stan manual if you encounter an unfamiliar function like log1m(x). However, because the code below uses a couple of fundamental features of Stan that might be unfamiliar to a novice user who has simply been porting models over from JAGS, I’ll quickly review some key points.

The crux of Stan’s functionality is to compute the log-likelihood function and its derivatives. The crux of a Stan program is to tell Stan what the log-likelihood function is. Each line of the ‘model block’ in a Stan program is interpreted as a way to increment the log-likelihood. So for example, if we write

1 ~ bernoulli(p1);
0 ~ bernoulli(p2);

Stan interprets that to mean that it should increment the log-likelihood by \(log(p1) + log(1-p2)\).

Stan also contains functionality to directly increment the log-likelihood using the nifty function target +=. So we could rewrite the lines above as

target += log(p1);
target += log(1-p2);

Or equivalently

target += log(p1) + log(1-p2);

Or we can mix and match, for example

target += log(p1);
0 ~ bernoulli(p2);

All of these alternatives tell Stan exactly the same thing9.

Another thing to know about Stan is that it contains an ‘if’ statement. The ‘if’ statement isn’t necessary to translate the marginalized single-species JAGS model to Stan, but it will be quite useful for a multi-species model10. Therefore, we’ll use the ‘if’ statement from the very beginning.

The Stan likelihood

To take advantage of the ‘if’ statement, we generate a new data variable Q such that \(Q[i] = 1\) if there is at least one detection at unit i, and \(Q[i] = 0\) otherwise. (We generate Q external to Stan, in our R session or whatever we’re using). Then the likelihood looks like this:

for(i in 1:n_unit){
  if(Q[i] == 1) {
    target += log(psi[i]);
    target += bernoulli_lpmf(det_data[i] | theta[i]); //vectorized over repeat sampling events j
  }
  if(Q[i] == 0) {
    target += log_sum_exp(log(psi[i]) + log1m(theta[i,1]) + log1m(theta[i,2]) + log1m(theta[i,3]) + log1m(theta[i,4]), 
                  log1m(psi[i]));
  }
}

Above, the first target += statement is the 1 ~ dbern(psi) statement; the second target += is the det_data[i,j] ~ dbern(theta[i,j]) statement; and the third target += is the full contribution to the likelihood for a unit where \(Q[i] == 0\). The function log_sum_exp() takes two arguments. Here, the first argument captures the contribution of the \(Z == 1\) possibility, and the second argument captures the contribution of the \(Z == 0\) possibility.

Testing and benchmarking the parameterizations

It doesn’t really make sense to compare the Stan runtime against JAGS, because most of the Stan runtime is the one-time model compilation. However, it is quite interesting to compare the marginalized and non-marginalized JAGS models (and we show the full Stan model too, and make sure it gives the same parameter estimates!).

Simulating covariates and data

set.seed(3)
n_unit <- 200
n_rep <- 4

# generate covariate values
unit_covars <- data.frame(cov1 = rnorm(n_unit), cov2 = rnorm(n_unit))
event_covars <- as.data.frame(matrix(rnorm(n_unit * n_rep), nrow = n_unit))

# specify coefficient values
occ_int <- 0
occ_beta1 <- .5     # to multiply by unit_covars$cov1
occ_beta2 <- -.5    # to multiply by unit_covars$cov2
det_int <- -.7
det_beta1 <- .3     # to multiply by unit_covars$cov1
det_beta2 <- .3     # to multiply by event_covars

# get linear predictors
psi <- boot::inv.logit(occ_int + occ_beta1*unit_covars$cov1 + occ_beta2*unit_covars$cov2)
theta <- matrix(NA, nrow = n_unit, ncol = n_rep)
for(j in 1:4){
  theta[, j] <- boot::inv.logit(det_int + det_beta1*unit_covars$cov1 + det_beta2*event_covars[, j])
}

# simulate detection histories
Z <- vector()
for(i in 1:n_unit){
  Z[i] <- rbinom(1, 1, psi[i])
}
det_data <- matrix(NA, nrow = n_unit, ncol = n_rep)
for(i in 1:n_unit){
  for(j in 1:n_rep){
    det_data[i, j] <- Z[i]*rbinom(1,1,theta[i, j])
  }
}

JAGS analysis: not marginalized

library(rjags)
library(dclone)
set.seed(3)
occu_model <- function() {
  # Priors
  a0 ~ dnorm(0,0.001)
  a1 ~ dnorm(0,0.001)
  a2 ~ dnorm(0,0.001)
  b0 ~ dnorm(0,0.001)
  b1 ~ dnorm(0,0.001)
  b2 ~ dnorm(0,0.001)
  
  # Likelihood
  for(i in 1:n_unit){
    Z[i] ~ dbern(psi[i])
    logit(psi[i]) <- a0 + a1*cov1[i] + a2*cov2[i]
    for(j in 1:n_rep){
      det_data[i,j] ~ dbern(Z[i]*theta[i,j])
      logit(theta[i,j]) <- b0 + b1*cov1[i] + b2*cov3[i,j]
    }
  }
}

jags_data <- list(n_unit = n_unit, n_rep = n_rep, det_data = det_data, 
                 cov1 = unit_covars$cov1, cov2 = unit_covars$cov2,
                 cov3 = event_covars)

params <- c('a0', 'a1', 'a2', 'b0', 'b1', 'b2')

inits <- function() {
  a0 <- rnorm(1, 0, .5)
  a1 <- rnorm(1, 0, .5)
  a2 <- rnorm(1, 0, .5)
  b0 <- rnorm(1, 0, .5)
  b1 <- rnorm(1, 0, .5)
  b2 <- rnorm(1, 0, .5)
  Z <- as.numeric(rowSums(det_data) > 0)
  return(list(a0 = a0, a1 = a1, a2 = a2, b0 = b0, b1 = b1, b2 = b2, Z = Z))
}

# run model
nc <- 4
n_adapt <- 1000
n_burn <- 1000
n_iter <- 3000
thin <- 1

occu_start_time = proc.time()
cl <- makePSOCKcluster(nc)
tmp <- clusterEvalQ(cl, library(dclone))
parLoadModule(cl, "glm")
parListModules(cl)
occu_samples <- jags.parfit(cl, jags_data, params, occu_model, inits=inits, n.chains=nc, 
                                     n.adapt=n_adapt, n.update = n_burn, thin = thin, n.iter = n_iter)
stopCluster(cl)
occu_end_time = proc.time()
occu_dtime = occu_end_time - occu_start_time

JAGS analysis: marginalized

# First we sort the data so that units with a detection come first
N <- sum(rowSums(det_data) > 0)
neworder <- rev(order(rowSums(det_data)))
det_data2 <- det_data[neworder, ]
unit_covars2 <- unit_covars[neworder, ]
event_covars2 <- event_covars[neworder, ]

# Now we fit the model
margin_model <- function() {
  # Priors
  a0 ~ dnorm(0,0.001)
  a1 ~ dnorm(0,0.001)
  a2 ~ dnorm(0,0.001)
  b0 ~ dnorm(0,0.001)
  b1 ~ dnorm(0,0.001)
  b2 ~ dnorm(0,0.001)
  
  # Likelihood
  for(i in 1:N){
    ones[i] ~ dbern(psi[i])
    logit(psi[i]) <- a0 + a1*cov1[i] + a2*cov2[i]
    for(j in 1:n_rep){
      det_data[i,j] ~ dbern(theta[i,j])
      logit(theta[i,j]) <- b0 + b1*cov1[i] + b2*cov3[i,j]
    }
  }
  for(i in (N+1):n_unit){
    ones[i] ~ dbern(1-psi[i] + psi[i] * (1-theta[i,1]) * (1-theta[i,2]) * (1-theta[i,3]) * (1-theta[i,4]))
    logit(psi[i]) <- a0 + a1*cov1[i] + a2*cov2[i]
    for(j in 1:n_rep){
      logit(theta[i,j]) <- b0 + b1*cov1[i] + b2*cov3[i,j]
    }
  }
}

jags_data = list(n_unit = n_unit, n_rep = n_rep, det_data = det_data2, 
                 cov1 = unit_covars2$cov1, cov2 = unit_covars2$cov2,
                 cov3 = event_covars2, N = N, ones = rep(1, n_unit))

params = c('a0', 'a1', 'a2', 'b0', 'b1', 'b2')

inits <- function() {
  a0 <- rnorm(1, 0, .5)
  a1 <- rnorm(1, 0, .5)
  a2 <- rnorm(1, 0, .5)
  b0 <- rnorm(1, 0, .5)
  b1 <- rnorm(1, 0, .5)
  b2 <- rnorm(1, 0, .5)
  return(list(a0 = a0, a1 = a1, a2 = a2, b0 = b0, b1 = b1, b2 = b2))
}

nc <- 4
n_adapt <- 1000
n_burn <- 1000
n_iter <- 3000
thin <- 1

margin_start_time = proc.time()
cl <- makePSOCKcluster(nc)
tmp <- clusterEvalQ(cl, library(dclone))
parLoadModule(cl, "glm")
parListModules(cl)
margin_samples <- jags.parfit(cl, jags_data, params, margin_model, inits=inits, n.chains=nc, 
                            n.adapt=n_adapt, n.update = n_burn, thin = thin, n.iter = n_iter)
stopCluster(cl)
margin_end_time = proc.time()
margin_dtime = margin_end_time - margin_start_time

Stan analysis

library(rstan)
stan_model <- 'data {
  int<lower=0> n_unit; //number of units
  int<lower=0> n_rep; //number of repeat sampling events
  int<lower=0, upper=1> det_data[n_unit, n_rep]; //detection history
  real<lower=0, upper=1> Q[n_unit]; //at least one detection
  real cov1[n_unit]; //unit covariate 1
  real cov2[n_unit]; //unit covariate 2
  real cov3[n_unit, n_rep]; //event covariate
}
parameters {
  real a0; 
  real a1; 
  real a2;
  real b0;
  real b1;
  real b2;
}
transformed parameters{
  real psi[n_unit];
  real theta[n_unit, n_rep];
  for(i in 1:n_unit){
    psi[i] = inv_logit(a0 + a1 * cov1[i] + a2 * cov2[i]);
    for(j in 1:n_rep){
        theta[i,j] = inv_logit(b0 + b1 * cov1[i] + b2 * cov3[i,j]);
    }
  }
}
model {
  for(i in 1:n_unit){
    if(Q[i] == 1) {
      target += log(psi[i]);
      target += bernoulli_lpmf(det_data[i] | theta[i]); //vectorized over repeat sampling events j
    }
    if(Q[i] == 0) {
      target += log_sum_exp(log(psi[i]) + log1m(theta[i,1]) + log1m(theta[i,2]) + log1m(theta[i,3]) + log1m(theta[i,4]), 
                    log1m(psi[i]));
    }
  }
}'

stan_data <- list(n_unit = n_unit, n_rep = n_rep, det_data = det_data,
                  cov1 = unit_covars$cov1, cov2 = unit_covars$cov2,
                  cov3 = event_covars, Q = as.numeric(rowSums(det_data)>0))

nc <- 4

stan_samples <- stan(model_code = stan_model, data = stan_data, iter = 4000, chains = nc, cores = nc)

Comparison and benchmarks

The fitted posterior distributions are essentially identical:

summary(occu_samples)
## 
## Iterations = 2001:5000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 3000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##       Mean     SD Naive SE Time-series SE
## a0 -0.1630 0.2037 0.001860       0.003795
## a1  0.6367 0.2167 0.001979       0.003862
## a2 -0.5951 0.1984 0.001811       0.003170
## b0 -0.4816 0.1539 0.001405       0.003167
## b1  0.1540 0.1394 0.001273       0.002554
## b2  0.2743 0.1116 0.001018       0.001392
## 
## 2. Quantiles for each variable:
## 
##        2.5%      25%     50%     75%   97.5%
## a0 -0.54981 -0.30122 -0.1693 -0.0282  0.2498
## a1  0.23363  0.48934  0.6280  0.7751  1.0900
## a2 -1.00428 -0.72438 -0.5891 -0.4569 -0.2236
## b0 -0.78448 -0.58580 -0.4790 -0.3759 -0.1865
## b1 -0.11213  0.05561  0.1536  0.2494  0.4250
## b2  0.05637  0.20004  0.2742  0.3485  0.4906
summary(margin_samples)
## 
## Iterations = 2001:5000
## Thinning interval = 1 
## Number of chains = 4 
## Sample size per chain = 3000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##       Mean     SD Naive SE Time-series SE
## a0 -0.1662 0.2052 0.001873       0.002985
## a1  0.6366 0.2154 0.001967       0.002975
## a2 -0.5861 0.1928 0.001760       0.002493
## b0 -0.4793 0.1524 0.001391       0.002335
## b1  0.1542 0.1373 0.001253       0.001973
## b2  0.2777 0.1115 0.001018       0.001280
## 
## 2. Quantiles for each variable:
## 
##        2.5%      25%     50%      75%   97.5%
## a0 -0.55770 -0.30606 -0.1698 -0.03459  0.2511
## a1  0.23906  0.49313  0.6267  0.77080  1.0857
## a2 -0.97824 -0.71119 -0.5796 -0.45473 -0.2235
## b0 -0.78503 -0.58030 -0.4762 -0.37694 -0.1871
## b1 -0.10955  0.06105  0.1524  0.24491  0.4284
## b2  0.05786  0.20282  0.2772  0.35268  0.5016
summary(stan_samples, pars = c('a0', 'a1', 'a2', 'b0', 'b1', 'b2'))$summary
##          mean     se_mean        sd        2.5%         25%        50%
## a0 -0.1669881 0.002108233 0.1998277 -0.54017836 -0.30359205 -0.1738705
## a1  0.6356526 0.002285360 0.2123485  0.24389616  0.49154067  0.6266325
## a2 -0.5903818 0.002073971 0.1943577 -0.99584976 -0.71227563 -0.5853868
## b0 -0.4790219 0.001603104 0.1533161 -0.78540260 -0.58177591 -0.4791134
## b1  0.1557667 0.001460908 0.1391910 -0.12107813  0.06184113  0.1571500
## b2  0.2747547 0.001080308 0.1113677  0.05995826  0.19873355  0.2739931
##            75%      97.5%     n_eff      Rhat
## a0 -0.03871645  0.2516561  8984.096 0.9998687
## a1  0.76964316  1.0746625  8633.546 1.0002357
## a2 -0.45768251 -0.2285127  8782.094 1.0000281
## b0 -0.37383632 -0.1819839  9146.432 0.9998012
## b1  0.24864622  0.4250929  9077.716 0.9999789
## b2  0.34884664  0.4986296 10627.295 1.0003264

The effective sample sizes and r-hat (Gelman-Rubin) diagnostics are generally better in the marginalized JAGS model verus the “familiar” parameterization:

effectiveSize(occu_samples)
##       a0       a1       a2       b0       b1       b2 
## 2902.924 3148.100 3955.972 2371.779 2999.430 6454.731
effectiveSize(margin_samples)
##       a0       a1       a2       b0       b1       b2 
## 4737.079 5260.867 6021.521 4267.613 4859.055 7744.127
gelman.diag(occu_samples)
## Potential scale reduction factors:
## 
##    Point est. Upper C.I.
## a0          1          1
## a1          1          1
## a2          1          1
## b0          1          1
## b1          1          1
## b2          1          1
## 
## Multivariate psrf
## 
## 1
gelman.diag(margin_samples)
## Potential scale reduction factors:
## 
##    Point est. Upper C.I.
## a0          1          1
## a1          1          1
## a2          1          1
## b0          1          1
## b1          1          1
## b2          1          1
## 
## Multivariate psrf
## 
## 1

The execution time is faster in the marginalized JAGS model versus the “familiar” parameterization. On different runs and different machines, I’ve seen speedup between 5 and 30 percent, which means that for some parameters the “familiar” JAGS model takes twice as long per effective sample as the marginalized model.

occu_dtime
##    user  system elapsed 
##   0.017   0.017   7.376
margin_dtime
##    user  system elapsed 
##   0.014   0.025   5.846
(margin_dtime - occu_dtime)/occu_dtime
##       user     system    elapsed 
## -0.1875000  0.3095238 -0.2074295

  1. To be clear, data may be discrete, but unknowns (including missing data) must not be.↩︎

  2. The requirement to avoid discrete unknowns turns out not to be a limitation in most cases. As we’ll see below, we actually get better performance out of JAGS/BUGS by avoiding discrete unknowns (i.e. marginalizing), which according the Stan development team allows for better exploration of the tails of the posterior. Thus, forcing us to marginalize might be more of a feature than a limitation.↩︎

  3. This part will assume rudimenary familiarity with the structure of a Stan program.↩︎

  4. We can add the probabilities of mutually exclusive events to find the probability that exactly one of the events occurs. In this case, two mutually exclusive events are 1) [observed data AND \(Z == 0\)] and 2) [observed data AND \(Z == 1\)]. Note that in addition to being mutually exclusive, these events are the only two ways that our observed data could have arisen, so the sum of these probabilities equals the likelihood of the observed data.↩︎

  5. This is true by assumption in all “garden-variety” occupancy models.↩︎

  6. It’s weird to talk about occupancy being independent of detection, but remember that in these models the detection probabilities are conditional on occupancy. And indeed \(p(det) = p(det|occ)*p(occ)\), so we see we’re on safe footing to multiply the occupancy and conditional detection probabilities.↩︎

  7. The probabilities of mutually exclusive events add. In this case, \(Z == 1\) and \(Z == 0\) are indeed mutually exclusive.↩︎

  8. In Bob Carpenter’s translation of the Dorazio model, this is achieved by modeling the sum of detections over all repeat sampling events as a sample from a binomial distribution–a nifty trick if there are no event-specific covariates to worry about!↩︎

  9. Or close enough. For all I know there might be efficiency gains to be had by doing everything on one line (incrementing the log-likelihood once instead of multiple times), but if these exist they will be inconsequential for present purposes.↩︎

  10. It should be obvious why we don’t want to rely on sorting the units for the multi-species model, where different species will be detected at different units. Yes, we could pass a complex data structure where every species has its own covariate tables that are properly sorted for that species’ detection history, but the ‘if’ statement makes things so much easier.↩︎