Here we will go through a simple hierarchical model of species occupancy that accounts for imperfect detectability of a species.

1 The Data

We will work with Marc Kery’s data from chapter 20 of his Introduction to WinBUGS for ecologists. The data describe detections of Gentianella germanica at 150 sites, each visited 3 times. We wish to estimate the relationship between \(P\) of occurence, and a humidity index. However, we also know that the detectability of the species depends on humidity itself.

Loading the data from the web:

  gentiana <- read.csv("http://www.petrkeil.com/wp-content/uploads/2014/02/gentiana.csv")
  gentiana <- gentiana[,-1]

Explore the data a little bit:

head(gentiana)
##   humidity visit1 visit2 visit3 pres.abs
## 1    -0.99      0      0      0        0
## 2    -0.98      0      0      0        0
## 3    -0.96      0      0      0        0
## 4    -0.93      0      0      0        0
## 5    -0.89      0      0      0        0
## 6    -0.88      0      0      0        0
nrow(gentiana)
## [1] 150

1.1 The “naive” analysis by GLM:

This is the classical LOGISTIC REGRESSION for presence-absence data:

  naive.m1 <- glm(pres.abs~humidity, data=gentiana, family="binomial")

Or the response can be number of successes out of 3 visits:

  succ.fail <- cbind(rowSums(gentiana[,2:4]), rowSums(gentiana[,2:4]))
  naive.m2 <- glm(succ.fail~humidity, data=gentiana, family="binomial")

Plotting the data and the predictions of the two naive models:

  plot(gentiana$humidity, gentiana$pres.abs,
       xlab="Humidity index", ylab="Gentiana detections")
  lines(gentiana$humidity, predict(naive.m1, type="response"), col="blue")
  lines(gentiana$humidity, predict(naive.m2, type="response"), col="red")
  legend("right", lwd=c(2,2), col=c("blue","red"),
         legend=c("model1","model2"))

plot of chunk unnamed-chunk-6

2 Bayesian model with imperfect detection

Model definition

\(z_i \sim Bernoulli(\psi_i)\)

\(y_{ij} \sim Bernoulli(z_i \times p_{i})\)

Where \(z_i\) is the unobserved but true occurrence, \(\psi_i\) is probability of occurrence, \(y_{ij}\) is the detection or non-detection (the actual data) at site \(i\) during visit \(j\), and \(p_{ij}\) is the detection probability of the species.

We also know that

\(p_{ij}=f(environment_{ij})\)

and that

\(\psi_i=f(environment_i)\)

Let’s prepare the data for JAGS:

  # just make the 2D array as simple as possible
  y <- as.matrix(gentiana[,2:4])
  colnames(y) <- NULL

  gentiana.data <- list(N.sites = nrow(y),
                        N.visit = ncol(y),
                        humidity = gentiana$humidity,
                        y = y)

Our favourite library:

  library(R2jags)

Here is a little guide for specifying priors for logistic regression:

The model:

  cat("
    model
    {
    # priors
      alpha.occ ~ dnorm(0, 0.01)
      beta.occ ~ dnorm(0, 0.01)
      alpha.det ~ dnorm(0, 0.01)
      beta.det ~ dnorm(0, 0.01)
      
    # likelihood 
      for(i in 1:N.sites)
      {
        z[i] ~ dbern(psi[i]) # true occupancy at site i
        logit(psi[i]) <- alpha.occ + beta.occ*humidity[i] 
        logit(p[i]) <- alpha.det + beta.det*humidity[i] # detection probability
        eff.p[i] <- z[i] * p[i] # p of observing the present individual
        
        for(j in 1:N.visit)
        {           
           y[i,j] ~ dbern(eff.p[i]) # detection/non-detection at [i,j]
        }  
      }
    }
  ", file="gentiana.txt")

Initial values are really important for models with latent variables! You have to specify them, otherwise it won’t work.

  zst <- apply(X=y, MARGIN=1, FUN=max)
  
  inits <- function(){list(z=zst, alpha.occ=rnorm(1),
                      beta.occ=rnorm(1),
                      alpha.det=rnorm(1),
                      beta.det=rnorm(1))}

Now we can run the model and monitor the parameter \(\psi\) (the actual probability of presence):

  model.fit <- jags(data=gentiana.data, 
                    model.file="gentiana.txt",
                    parameters.to.save=c("psi"),
                    n.chains=1,
                    n.iter=2000,
                    n.burnin=1000,
                    inits=inits, 
                    DIC=FALSE)
## module glm loaded
## module dic loaded
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 1610
## 
## Initializing model

Let’s plot data together with the expected value and its credible intervals:

plot(gentiana$humidity, gentiana$pres.abs,
       xlab="Humidity index", ylab="Probability of presence")
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'50%'])
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'2.5%'], lty=2)
lines(gentiana$humidity, model.fit$BUGSoutput$summary[,'97.5%'], lty=2)

plot of chunk unnamed-chunk-12

And let’s pull out the detectability:

Now we can run the model:

  model.fit <- jags(data=gentiana.data, 
                 model.file="gentiana.txt",
                 parameters.to.save=c("p"),
                 n.chains=1,
                 n.iter=2000,
                 n.burnin=1000,
                 inits=inits, 
                 DIC=FALSE)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 1610
## 
## Initializing model
plot(gentiana$humidity, gentiana$pres.abs,
       xlab="Humidity index", ylab="Detectability", type="n")
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'50%'])
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'2.5%'], lty=2)
lines(gentiana$humidity, model.fit$BUGSoutput$summary[1:150,'97.5%'], lty=2)

plot of chunk unnamed-chunk-13