Here we will do classical linear regression in a Bayesian setting. We will also show the difference between credible intervals and prediction intervals.

In Part 1 the data and the model are introduced. Participants then try to implement the model in BUGS. In Part 2 the solution is exposed, and the difference between credible and prediction intervals is explained.

1 The data

We will use data from Michael Crawley’s R Book, Chapter 10 (Linear Regression). The data show the growth of catepillars fed on experimental diets differing in their tannin contnent.

danaus caterpillar figure danaus caterpillar figure

To load the data to R directly from the web:

  catepil <- read.table("http://www.petrkeil.com/wp-content/uploads/2016/01/regression.txt", sep="\t", header=TRUE)
  catepil
##   growth tannin
## 1     12      0
## 2     10      1
## 3      8      2
## 4     11      3
## 5      6      4
## 6      7      5
## 7      2      6
## 8      3      7
## 9      3      8

The data look like this:

  plot(growth~tannin, data=catepil)

2 The model

The classical notation: \[ growth_i = a + b \times tannin_i + \epsilon_i \] \[ \epsilon_i \sim Normal(0, \sigma)\]

An alternative (“Bayesian”“) version: \[ \mu_i = a + b \times tannin_i \] \[ growth_i \sim Normal(\mu_i, \sigma) \]

Note: The notations are mathematically equivalent, but the Bayesian notation shows, in my opinion, more directly how we think about the stochastic part of the model.

3 Fitting ordinary least squares (OLS) regression

  model.lm <- lm(growth~tannin, data=catepil)
  plot(growth~tannin, data=catepil)
  abline(model.lm)

  summary(model.lm)
## 
## Call:
## lm(formula = growth ~ tannin, data = catepil)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4556 -0.8889 -0.2389  0.9778  2.8944 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
## tannin       -1.2167     0.2186  -5.565 0.000846 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared:  0.8157, Adjusted R-squared:  0.7893 
## F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461

4 Tasks for you

Using the ideas from previous lessons: