Here we will do classical linear regression in a Bayesian setting. We will also show the difference between credible intervals and prediction intervals.

In Part 1 the data and the model are introduced. Participants then try to implement the model in BUGS. In Part 2 the solution is exposed, and the difference between credible and prediction intervals is explained.

# 1 Fitting the model by MCMC in JAGS

First, we will put all of our data in a list object

linreg.data <- list(N=9,
tannin=catepil$tannin, growth=catepil$growth)
linreg.data
## $N ## [1] 9 ## ##$tannin
## [1] 0 1 2 3 4 5 6 7 8
##
## $growth ## [1] 12 10 8 11 6 7 2 3 3 The R library that connects R with JAGS: library(R2jags) This is the JAGS definition of the model: cat(" model { # priors a ~ dnorm(0, 0.001) # intercept b ~ dnorm(0, 0.001) # slope sigma ~ dunif(0, 100) # standard deviation tau <- 1/(sigma*sigma) # precision # likelihood for(i in 1:N) { growth[i] ~ dnorm(mu[i], tau) mu[i] <- a + b*tannin[i] } } ", file="linreg_model.bug") The MCMC sampling done by jags() function: model.fit <- jags(data=linreg.data, model.file="linreg_model.bug", parameters.to.save=c("a", "b", "sigma"), n.chains=3, n.iter=2000, n.burnin=1000) ## Compiling model graph ## Resolving undeclared variables ## Allocating nodes ## Graph information: ## Observed stochastic nodes: 9 ## Unobserved stochastic nodes: 3 ## Total graph size: 50 ## ## Initializing model And we can examine the results: plot(as.mcmc(model.fit)) model.fit plot(model.fit) # 2 The full Bayesian output, predictions and all that Beware: Here we will use library rjags that gives more control over the MCMC (as opposed to R2jags): library(rjags) cat(" model { # priors a ~ dnorm(0, 0.001) b ~ dnorm(0, 0.001) sigma ~ dunif(0, 100) tau <- 1/(sigma*sigma) # likelihood for(i in 1:N) { growth[i] ~ dnorm(mu[i], tau) mu[i] <- a + b*tannin[i] } # predictions for(i in 1:N) { prediction[i] ~ dnorm(mu[i], tau) } } ", file="linreg_model.bug") Initializing the model: jm <- jags.model(data=linreg.data, file="linreg_model.bug", n.chains = 3, n.adapt=1000) ## Compiling model graph ## Resolving undeclared variables ## Allocating nodes ## Graph information: ## Observed stochastic nodes: 9 ## Unobserved stochastic nodes: 12 ## Total graph size: 59 ## ## Initializing model The burn-in phase: update(jm, n.iter=1000) We will monitor the predicted values: params <- c("prediction") samps <- coda.samples(jm, params, n.iter=1000) predictions <- summary(samps)$quantiles

And here we will monitor the expected values:

params <- c("mu")
samps <- coda.samples(jm, params, n.iter=1000)
mu <- summary(samps)$quantiles Plotting the predictions: plot(c(0,8), c(0,18), type="n", xlab="tannin", ylab="growth") points(catepil$tannin, catepil$growth, pch=19) lines(catepil$tannin, mu[,"50%"], col="red")
lines(catepil$tannin, mu[,"2.5%"], col="red", lty=2) lines(catepil$tannin, mu[,"97.5%"], col="red", lty=2)
lines(catepil$tannin, predictions[,"2.5%"], lty=3) lines(catepil$tannin, predictions[,"97.5%"], lty=3)

The figure shows the median expected value (solid red), 95% credible intervals of the expected value (dashed red) and 95% prediction intervals (dotted).