Here we will do classical linear regression in a Bayesian setting. We will also show the difference between credible intervals and prediction intervals.
In Part 1 the data and the model are introduced. Participants then try to implement the model in BUGS. In Part 2 the solution is exposed, and the difference between credible and prediction intervals is explained.
We will use data from Michael Crawley’s R Book, Chapter 10 (Linear Regression). The data show the growth of catepillars fed on experimental diets differing in their tannin contnent.
To load the data to R directly from the web:
catepil <- read.table("http://www.petrkeil.com/wp-content/uploads/2016/01/regression.txt", sep="\t", header=TRUE)
catepil
## growth tannin
## 1 12 0
## 2 10 1
## 3 8 2
## 4 11 3
## 5 6 4
## 6 7 5
## 7 2 6
## 8 3 7
## 9 3 8
The data look like this:
plot(growth~tannin, data=catepil)
The classical notation: \[ growth_i = a + b \times tannin_i + \epsilon_i \] \[ \epsilon_i \sim Normal(0, \sigma)\]
An alternative (“Bayesian”“) version: \[ \mu_i = a + b \times tannin_i \] \[ growth_i \sim Normal(\mu_i, \sigma) \]
Note: The notations are mathematically equivalent, but the Bayesian notation shows, in my opinion, more directly how we think about the stochastic part of the model.
model.lm <- lm(growth~tannin, data=catepil)
plot(growth~tannin, data=catepil)
abline(model.lm)
summary(model.lm)
##
## Call:
## lm(formula = growth ~ tannin, data = catepil)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4556 -0.8889 -0.2389 0.9778 2.8944
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.7556 1.0408 11.295 9.54e-06 ***
## tannin -1.2167 0.2186 -5.565 0.000846 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared: 0.8157, Adjusted R-squared: 0.7893
## F-statistic: 30.97 on 1 and 7 DF, p-value: 0.0008461
Using the ideas from previous lessons:
Try to write the model in the BUGS language and dump it into a file using cat
.
Try to prepare the data for this model in the list
format.
Try to fit the model and estimate posterior distributions of \(a\) and \(b\).