Here we will do classical linear regression in a Bayesian setting. We will also show the difference between credible intervals and prediction intervals.

In Part 1 the data and the model are introduced. Participants then try to implement the model in BUGS. In Part 2 the solution is exposed, and the difference between credible and prediction intervals is explained.

1 The data

We will use data from Michael Crawley’s R Book, Chapter 10 (Linear Regression). The data show the growth of catepillars fed on experimental diets differing in their tannin contnent.

danaus caterpillar figure

To load the data to R directly from the web:

  catepil <- read.table("http://www.petrkeil.com/wp-content/uploads/2016/01/regression.txt", sep="\t", header=TRUE)
  catepil

##   growth tannin
## 1     12      0
## 2     10      1
## 3      8      2
## 4     11      3
## 5      6      4
## 6      7      5
## 7      2      6
## 8      3      7
## 9      3      8

The data look like this:

  plot(growth~tannin, data=catepil)

2 The model

The classical notation: \[ growth_i = a + b \times tannin_i + \epsilon_i \] \[ \epsilon_i \sim Normal(0, \sigma)\]

An alternative (“Bayesian”“) version: \[ \mu_i = a + b \times tannin_i \] \[ growth_i \sim Normal(\mu_i, \sigma) \]

Note: The notations are mathematically equivalent, but the Bayesian notation shows, in my opinion, more directly how we think about the stochastic part of the model.

3 Fitting ordinary least squares (OLS) regression

  model.lm <- lm(growth~tannin, data=catepil)
  plot(growth~tannin, data=catepil)
  abline(model.lm)

  summary(model.lm)

## 
## Call:
## lm(formula = growth ~ tannin, data = catepil)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4556 -0.8889 -0.2389  0.9778  2.8944 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7556     1.0408  11.295 9.54e-06 ***
## tannin       -1.2167     0.2186  -5.565 0.000846 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.693 on 7 degrees of freedom
## Multiple R-squared:  0.8157, Adjusted R-squared:  0.7893 
## F-statistic: 30.97 on 1 and 7 DF,  p-value: 0.0008461

4 Tasks for you

Using the ideas from previous lessons:

Try to write the model in the BUGS language and dump it into a file using cat.
Try to prepare the data for this model in the list format.
Try to fit the model and estimate posterior distributions of \(a\) and \(b\).

Linear regression - Part 1

Petr Keil

January 2016

1 The data

2 The model

3 Fitting ordinary least squares (OLS) regression

4 Tasks for you