Here we will do classical linear regression in a Bayesian setting. We will also show the difference between credible intervals and prediction intervals.

In Part 1 the data and the model are introduced. Participants then try to implement the model in BUGS. In Part 2 the solution is exposed, and the difference between credible and prediction intervals is explained.

# 1 The data

We will use data from Michael Crawley’s R Book, Chapter 10 (Linear Regression). The data show the growth of catepillars fed on experimental diets differing in their tannin contnent.

To load the data to R directly from the web:

  catepil <- read.table("http://www.petrkeil.com/wp-content/uploads/2016/01/regression.txt", sep="\t", header=TRUE)
catepil
##   growth tannin
## 1     12      0
## 2     10      1
## 3      8      2
## 4     11      3
## 5      6      4
## 6      7      5
## 7      2      6
## 8      3      7
## 9      3      8

The data look like this:

  plot(growth~tannin, data=catepil)

# 2 The model

The classical notation: $growth_i = a + b \times tannin_i + \epsilon_i$ $\epsilon_i \sim Normal(0, \sigma)$

An alternative (“Bayesian”“) version: $\mu_i = a + b \times tannin_i$ $growth_i \sim Normal(\mu_i, \sigma)$

Note: The notations are mathematically equivalent, but the Bayesian notation shows, in my opinion, more directly how we think about the stochastic part of the model.

# 3 Fitting ordinary least squares (OLS) regression

  model.lm <- lm(growth~tannin, data=catepil)
plot(growth~tannin, data=catepil)
abline(model.lm)

  summary(model.lm)
##
## Call:
## lm(formula = growth ~ tannin, data = catepil)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -2.4556 -0.8889 -0.2389  0.9778  2.8944
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
## tannin       -1.2167     0.2186  -5.565 0.000846 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared:  0.8157, Adjusted R-squared:  0.7893
## F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461

# 4 Tasks for you

Using the ideas from previous lessons:

• Try to write the model in the BUGS language and dump it into a file using cat.

• Try to prepare the data for this model in the list format.

• Try to fit the model and estimate posterior distributions of $$a$$ and $$b$$.