The aim of this lesson is to (1) leave the participants to come up with their code for simple one-way ANOVA, and (2) to experiment with random effects ANOVA.

1 The Data

We will use modified data from the example from Marc Kery’s Introduction to WinBUGS for Ecologists, page 119 (Chapter 9 - ANOVA). The data describe snout-vent lengths in 5 populations of Smooth snake (Coronella austriaca) (Uzovka hladka in CZ).

Loading the data from the web:

  snakes <- read.csv("http://www.petrkeil.com/wp-content/uploads/2014/02/snakes.csv")

# we will artificially delete 9 data points in the first population
  snakes <- snakes[-(1:9),]
  
  summary(snakes)

##    population      snout.vent   
##  Min.   :1.000   Min.   :36.56  
##  1st Qu.:2.000   1st Qu.:43.02  
##  Median :3.000   Median :49.24  
##  Mean   :3.439   Mean   :50.07  
##  3rd Qu.:4.000   3rd Qu.:57.60  
##  Max.   :5.000   Max.   :61.37

Plotting the data:

  par(mfrow=c(1,2))
  plot(snout.vent ~ population, data=snakes,
       ylab="Snout-vent length [cm]")
  boxplot(snout.vent ~ population, data=snakes,
          ylab="Snout-vent length [cm]",
          xlab="population",
          col="grey")

2 Fixed-effects ANOVA

For a given snake \(i\) in population \(j\) the model can be written as:

\(y_{ij} \sim Normal(\alpha_j, \sigma)\)

Here is how we prepare the data:

  snake.data <- list(y=snakes$snout.vent,
                     x=snakes$population,
                     N=nrow(snakes), 
                     N.pop=5)

Loading the library that communicates with JAGS

  library(R2jags)

JAGS Model definition:

cat("
  model
  {
    # priors
    sigma ~ dunif(0,100)
    tau <- 1/(sigma*sigma)
    for(j in 1:N.pop)
    {
      alpha[j] ~ dnorm(0, 0.001)
    }
  
    # likelihood
    for(i in 1:N)
    {
      y[i] ~ dnorm(alpha[x[i]], tau)
    }
  }
", file="fixed_anova.txt")

And we will fit the model:

model.fit.fix <- jags(data=snake.data, 
                        model.file="fixed_anova.txt",
                        parameters.to.save=c("alpha"),
                        n.chains=3,
                        n.iter=2000,
                        n.burnin=1000,
                        DIC=FALSE)

## module glm loaded
## module dic loaded

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 41
##    Unobserved stochastic nodes: 6
##    Total graph size: 106
## 
## Initializing model

plot(as.mcmc(model.fit.fix))

model.fit.fix

## Inference for Bugs model at "fixed_anova.txt", fit using jags,
##  3 chains, each with 2000 iterations (first 1000 discarded)
##  n.sims = 3000 iterations saved
##          mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
## alpha[1]  45.244   3.235 38.930 43.124 45.288 47.376 51.577 1.001  2000
## alpha[2]  41.238   1.017 39.257 40.587 41.241 41.912 43.236 1.001  3000
## alpha[3]  45.868   1.026 43.784 45.209 45.884 46.537 47.878 1.002  1700
## alpha[4]  54.409   1.057 52.311 53.693 54.424 55.137 56.476 1.001  3000
## alpha[5]  59.019   1.031 57.018 58.357 59.026 59.695 61.065 1.001  3000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

3 Random-effects ANOVA

For a given snake \(i\) in population \(j\) the model can be written in a similar way as for the fixed-effects ANOVA above:

\(y_{ij} \sim Normal(\alpha_j, \sigma)\)

But now we will also add a random effect:

\(\alpha_j \sim Normal(\mu_{grand}, \sigma_{grand})\)

In short, a random effect means that the parameters itself come from (are outcomes of) a given distribution, here it is the Normal.

The data stay the same as in the fixed-effect example above.

Loading the library that communicates with JAGS

  library(R2jags)

JAGS Model definition:

cat("
  model
  {
    # priors
    grand.mean ~ dnorm(0, 0.001)
    grand.sigma ~ dunif(0,100)
    grand.tau <- 1/(grand.sigma*grand.sigma)
    group.sigma ~ dunif(0, 100)
    group.tau <- 1/(group.sigma*group.sigma)
  
    for(j in 1:N.pop)
    {
      alpha[j] ~ dnorm(grand.mean, grand.tau)
    }
  
    # likelihood
    for(i in 1:N)
    {
      y[i] ~ dnorm(alpha[x[i]], group.tau)
    }
  }
", file="random_anova.txt")

And we will fit the model:

model.fit.rnd <- jags(data=snake.data, 
               model.file="random_anova.txt",
               parameters.to.save=c("alpha"),
               n.chains=3,
               n.iter=2000,
               n.burnin=1000,
               DIC=FALSE)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 41
##    Unobserved stochastic nodes: 8
##    Total graph size: 106
## 
## Initializing model

plot(as.mcmc(model.fit.rnd))

model.fit.rnd

## Inference for Bugs model at "random_anova.txt", fit using jags,
##  3 chains, each with 2000 iterations (first 1000 discarded)
##  n.sims = 3000 iterations saved
##          mu.vect sd.vect   2.5%    25%    50%    75%  97.5%  Rhat n.eff
## alpha[1]  45.968   3.126 39.757 43.801 45.957 48.033 51.975 1.003   950
## alpha[2]  41.408   1.037 39.370 40.719 41.412 42.092 43.411 1.003   970
## alpha[3]  45.977   1.027 43.954 45.312 45.957 46.657 48.018 1.001  3000
## alpha[4]  54.436   1.022 52.459 53.746 54.454 55.098 56.449 1.001  3000
## alpha[5]  58.930   1.028 56.892 58.245 58.931 59.618 60.902 1.001  3000
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

4 Plotting the posteriors from both models

Let’s extract the medians posterior distributions of the expected values of \(\alpha_j\) and their 95% credible intervals:

  rnd.alphas <- model.fit.rnd$BUGSoutput$summary
  fix.alphas <- model.fit.fix$BUGSoutput$summary
  
  plot(snout.vent ~ population, data=snakes,
       ylab="Snout-vent length [cm]", col="grey", pch=19)
  points(rnd.alphas[,'2.5%'], col="red", pch="-", cex=1.5)
  points(fix.alphas[,'2.5%'], col="blue", pch="-", cex=1.5) 
  points(rnd.alphas[,'97.5%'], col="red", pch="-", cex=1.5)
  points(fix.alphas[,'97.5%'], col="blue", pch="-", cex=1.5) 
  points(rnd.alphas[,'50%'], col="red", pch="+", cex=1.5)
  points(fix.alphas[,'50%'], col="blue", pch="+", cex=1.5) 

  abline(h=mean(snakes$snout.vent), col="grey")
  legend("bottomright", pch=c(19,19), col=c("blue","red"),
         legend=c("classical","random effects"))

Note the shrinkage effect!

Also, how would you plot the grand.mean estimated in the random effects model? How would you extract the between- and within- group variances?

ANOVA – part 2

Petr Keil

January 2016

1 The Data

2 Fixed-effects ANOVA

3 Random-effects ANOVA

4 Plotting the posteriors from both models