Here we will go through a simple hierarchical model of species occupancy that accounts for imperfect detectability of a species. The main aims are (1) to show a variant of binomial-family regression, (2) to demonstrate hierarchical models with observation and process components, and (3) to introduce the concept of latent variables.
We will work with Marc Kery’s data from chapter 20 of his Introduction to WinBUGS for ecologists. The data describe detections of Gentianella germanica at 150 sites, each visited 3 times. We wish to estimate the relationship between \(P\) of occurence, and a humidity index. However, we also know that the detectability of the species depends on humidity itself.
Loading the data from the web:
gentiana <- read.csv("http://www.petrkeil.com/wp-content/uploads/2014/02/gentiana.csv")
gentiana <- gentiana[,-1]
Explore the data a little bit:
head(gentiana)
## humidity visit1 visit2 visit3 pres.abs
## 1 -0.99 0 0 0 0
## 2 -0.98 0 0 0 0
## 3 -0.96 0 0 0 0
## 4 -0.93 0 0 0 0
## 5 -0.89 0 0 0 0
## 6 -0.88 0 0 0 0
nrow(gentiana)
## [1] 150
This is the classical LOGISTIC REGRESSION for presence-absence data:
naive.m1 <- glm(pres.abs~humidity, data=gentiana, family="binomial")
Or the response can be number of successes out of 3 visits:
succ.fail <- cbind(rowSums(gentiana[,2:4]), rowSums(gentiana[,2:4]))
naive.m2 <- glm(succ.fail~humidity, data=gentiana, family="binomial")
Plotting the data and the predictions of the two naive models:
plot(gentiana$humidity, gentiana$pres.abs,
xlab="Humidity index", ylab="Gentiana detections")
lines(gentiana$humidity, predict(naive.m1, type="response"), col="blue")
lines(gentiana$humidity, predict(naive.m2, type="response"), col="red")
legend("right", lwd=c(2,2), col=c("blue","red"),
legend=c("model1","model2"))
\(z_i \sim Bernoulli(\psi_i)\)
\(y_{ij} \sim Bernoulli(z_i \times p_{i})\)
Where \(z_i\) is the unobserved but true occurrence, \(\psi_i\) is probability of occurrence, \(y_{ij}\) is the detection or non-detection (the actual data) at site \(i\) during visit \(j\), and \(p_{ij}\) is the detection probability of the species.
We also know that
\(p_{ij}=f(environment_{ij})\)
and that
\(\psi_i=f(environment_i)\)
Let’s prepare the data for JAGS:
# just make the 2D array as simple as possible
y <- as.matrix(gentiana[,2:4])
colnames(y) <- NULL
gentiana.data <- list(N.sites = nrow(y),
N.visit = ncol(y),
humidity = gentiana$humidity,
y = y)
Our favourite library:
library(R2jags)
The model:
cat("
model
{
# priors
alpha.occ ~ dnorm(0, 0.01)
beta.occ ~ dnorm(0, 0.01)
alpha.det ~ dnorm(0, 0.01)
beta.det ~ dnorm(0, 0.01)
# likelihood
for(i in 1:N.sites)
{
z[i] ~ dbern(psi[i]) # TRUE OCCUPANCY at site i
logit(psi[i]) <- alpha.occ + beta.occ*humidity[i]
logit(p[i]) <- alpha.det + beta.det*humidity[i] # DETECTION PROBABILITY
eff.p[i] <- z[i] * p[i] # p of observing the present individual
for(j in 1:N.visit)
{
# the ACTUAL DATA -- detection/non-detection at [i,j]
y[i,j] ~ dbern(eff.p[i])
}
}
}
", file="gentiana.txt")
Initial values are really important for models with latent variables! You have to specify them, otherwise it won’t work.
zst <- apply(X=y, MARGIN=1, FUN=max)
inits <- function(){list(z=zst, alpha.occ=rnorm(1),
beta.occ=rnorm(1),
alpha.det=rnorm(1),
beta.det=rnorm(1))}
Now we can run the model and monitor the parameter \(\psi\) (the actual probability of presence):
model.fit <- jags(data=gentiana.data,
model.file="gentiana.txt",
parameters.to.save=c("psi"),
n.chains=1,
n.iter=2000,
n.burnin=1000,
inits=inits,
DIC=FALSE)
## module glm loaded
## module dic loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 450
## Unobserved stochastic nodes: 154
## Total graph size: 1616
##
## Initializing model
Let’s plot data together with the expected value and its credible intervals:
plot(gentiana$humidity, gentiana$pres.abs,
xlab="Humidity index", ylab="Probability of presence")
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'50%'])
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'2.5%'], lty=2)
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'97.5%'], lty=2)
And let’s pull out the detectability:
Now we can run the model:
model.fit <- jags(data=gentiana.data,
model.file="gentiana.txt",
parameters.to.save=c("p"),
n.chains=1,
n.iter=2000,
n.burnin=1000,
inits=inits,
DIC=FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 450
## Unobserved stochastic nodes: 154
## Total graph size: 1616
##
## Initializing model
plot(gentiana$humidity, gentiana$pres.abs,
xlab="Humidity index", ylab="Detectability", type="n")
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'50%'])
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'2.5%'], lty=2)
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'97.5%'], lty=2)