Here we will go through a simple hierarchical model of species occupancy that accounts for imperfect detectability of a species.
We will work with Marc Kery’s data from chapter 20 of his Introduction to WinBUGS for ecologists. The data describe detections of Gentianella germanica at 150 sites, each visited 3 times. We wish to estimate the relationship between \(P\) of occurence, and a humidity index. However, we also know that the detectability of the species depends on humidity itself.
Loading the data from the web:
gentiana <- read.csv("http://www.petrkeil.com/wp-content/uploads/2014/02/gentiana.csv")
gentiana <- gentiana[,-1]
Explore the data a little bit:
head(gentiana)
## humidity visit1 visit2 visit3 pres.abs
## 1 -0.99 0 0 0 0
## 2 -0.98 0 0 0 0
## 3 -0.96 0 0 0 0
## 4 -0.93 0 0 0 0
## 5 -0.89 0 0 0 0
## 6 -0.88 0 0 0 0
nrow(gentiana)
## [1] 150
This is the classical LOGISTIC REGRESSION for presence-absence data:
naive.m1 <- glm(pres.abs~humidity, data=gentiana, family="binomial")
Or the response can be number of successes out of 3 visits:
succ.fail <- cbind(rowSums(gentiana[,2:4]), rowSums(gentiana[,2:4]))
naive.m2 <- glm(succ.fail~humidity, data=gentiana, family="binomial")
Plotting the data and the predictions of the two naive models:
plot(gentiana$humidity, gentiana$pres.abs,
xlab="Humidity index", ylab="Gentiana detections")
lines(gentiana$humidity, predict(naive.m1, type="response"), col="blue")
lines(gentiana$humidity, predict(naive.m2, type="response"), col="red")
legend("right", lwd=c(2,2), col=c("blue","red"),
legend=c("model1","model2"))
Model definition
\(z_i \sim Bernoulli(\psi_i)\)
\(y_{ij} \sim Bernoulli(z_i \times p_{i})\)
Where \(z_i\) is the unobserved but true occurrence, \(\psi_i\) is probability of occurrence, \(y_{ij}\) is the detection or non-detection (the actual data) at site \(i\) during visit \(j\), and \(p_{ij}\) is the detection probability of the species.
We also know that
\(p_{ij}=f(environment_{ij})\)
and that
\(\psi_i=f(environment_i)\)
Let’s prepare the data for JAGS:
# just make the 2D array as simple as possible
y <- as.matrix(gentiana[,2:4])
colnames(y) <- NULL
gentiana.data <- list(N.sites = nrow(y),
N.visit = ncol(y),
humidity = gentiana$humidity,
y = y)
Our favourite library:
library(R2jags)
Here is a little guide for specifying priors for logistic regression:
The model:
cat("
model
{
# priors
alpha.occ ~ dnorm(0, 0.01)
beta.occ ~ dnorm(0, 0.01)
alpha.det ~ dnorm(0, 0.01)
beta.det ~ dnorm(0, 0.01)
# likelihood
for(i in 1:N.sites)
{
z[i] ~ dbern(psi[i]) # true occupancy at site i
logit(psi[i]) <- alpha.occ + beta.occ*humidity[i]
logit(p[i]) <- alpha.det + beta.det*humidity[i] # detection probability
eff.p[i] <- z[i] * p[i] # p of observing the present individual
for(j in 1:N.visit)
{
y[i,j] ~ dbern(eff.p[i]) # detection/non-detection at [i,j]
}
}
}
", file="gentiana.txt")
Initial values are really important for models with latent variables! You have to specify them, otherwise it won’t work.
zst <- apply(X=y, MARGIN=1, FUN=max)
inits <- function(){list(z=zst, alpha.occ=rnorm(1),
beta.occ=rnorm(1),
alpha.det=rnorm(1),
beta.det=rnorm(1))}
Now we can run the model and monitor the parameter \(\psi\) (the actual probability of presence):
model.fit <- jags(data=gentiana.data,
model.file="gentiana.txt",
parameters.to.save=c("psi"),
n.chains=1,
n.iter=2000,
n.burnin=1000,
inits=inits,
DIC=FALSE)
## module glm loaded
## module dic loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 1610
##
## Initializing model
Let’s plot data together with the expected value and its credible intervals:
plot(gentiana$humidity, gentiana$pres.abs,
xlab="Humidity index", ylab="Probability of presence")
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'50%'])
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'2.5%'], lty=2)
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'97.5%'], lty=2)
And let’s pull out the detectability:
Now we can run the model:
model.fit <- jags(data=gentiana.data,
model.file="gentiana.txt",
parameters.to.save=c("p"),
n.chains=1,
n.iter=2000,
n.burnin=1000,
inits=inits,
DIC=FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 1610
##
## Initializing model
plot(gentiana$humidity, gentiana$pres.abs,
xlab="Humidity index", ylab="Detectability", type="n")
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'50%'])
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'2.5%'], lty=2)
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'97.5%'], lty=2)