In this lesson we will implement a Bayesian version of the classical T-test. We will also explore some ways to summarize the JAGS output, and we will introduce the concept of derived quantity.

1 The data

We will use the example from Marc Kery’s Introduction to WinBUGS for Ecologists, page 92 (Section 7.1 - t-test). The data describe wingspan of male and female Peregrine falcon (Falco peregrinus).

Let’s load the data and have a look at the data:

  falcon <- read.csv("http://www.petrkeil.com/wp-content/uploads/2014/02/falcon.csv")
  summary(falcon)
  boxplot(wingspan ~ male, data=falcon, 
          names=c("Female", "Male"),
          ylab="Wingspan [cm]",
          col="grey")

2 The classical frequentist solution in R

We can use the classical two-sample t.test():

x <- falcon$wingspan[falcon$male==1]
y <- falcon$wingspan[falcon$male==0]
t.test(x, y)
## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = -15.605, df = 82.954, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -29.14053 -22.55197
## sample estimates:
## mean of x mean of y 
##  79.85925 105.70550

Note: this can also be done by lm():

lm1 <- lm(wingspan ~ male, data=falcon)
summary(lm1)

… or by glm():

glm1 <- glm(wingspan ~ male, data=falcon)
summary(glm1)

3 The didactic Bayesian solution

We assume the each males and females each have their own Normal distribution, from which the wingspans are drawn:

\(y_m \sim Normal(\mu_m, \sigma)\)

\(y_f \sim Normal(\mu_f, \sigma)\)

Note that the variance (\(\sigma\)) is the same in both groups.

This is the hypothesis that we usually test in the frequentist setting: \(\delta = \mu_f - \mu_m \neq 0\)

But we can actually ask even more directly: What is the mean difference (\(\delta\)) between female and male wingspan?

Here is how we prepare the data for JAGS:

  y.male <- falcon$wingspan[falcon$male==1]
  y.female <- falcon$wingspan[falcon$male==0]
  falcon.data.1 <- list(y.f=y.female,
                        N.f=60,
                        y.m=y.male,
                        N.m=40)

Loading the necessary library:

  library(R2jags)

Definition of the model:

cat("
model
{
  # priors
    mu.f ~ dnorm(0, 0.001) # Note: tau = 1/variance
    mu.m ~ dnorm(0, 0.001)
    tau <- 1/(sigma*sigma)
    sigma ~ dunif(0,100)
  
  # likelihood - Females
    for(i in 1:N.f)
    {
      y.f[i] ~ dnorm(mu.f, tau)
    }
  
  # likelihood - Males
    for(j in 1:N.m)
    {
      y.m[j] ~ dnorm(mu.m, tau)
    }

  # derived quantity:
    delta <- mu.f - mu.m

}    
", file="t-test.bug")

The MCMC sampling done by jags() function:

model.fit <- jags(data=falcon.data.1, 
               model.file="t-test.bug",
               parameters.to.save=c("mu.f", "mu.m", "sigma", "delta"),
               n.chains=3,
               n.iter=2000,
               n.burnin=1000,
               DIC=FALSE)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 100
##    Unobserved stochastic nodes: 3
##    Total graph size: 115
## 
## Initializing model

And we can explore the posterior distributions:

plot(as.mcmc(model.fit))
model.fit

4 The conventional Bayesian solution

Alternativelly, you can also specify the model in a more conventional way:

\(\mu_i = \mu_f + \delta \times male_i\)

\(y_i \sim Normal(\mu_i, \sigma)\)

Preparing the data for JAGS is somewhat different than above:

falcon.data.2 <- list(y=falcon$wingspan,
                      male=falcon$male,
                      N=100)

Definition of the model:

cat("
model
{
  # priors
    mu.f ~ dnorm(0, 0.001)
    delta ~ dnorm(0, 0.001)
    tau <- 1/(sigma*sigma)
    sigma ~ dunif(0,100)
    
  # likelihood
    for(i in 1:N)
    {
      y[i] ~ dnorm(mu[i], tau)
      mu[i] <- mu.f + delta*male[i]
    }

  # derived quantity
    mu.m <- mu.f + delta
}    
", file="t-test2.bug")

The MCMC sampling done by jags() function:

model.fit <- jags(data=falcon.data.2, 
               model.file="t-test2.bug",
               parameters.to.save=c("mu.f", "mu.m", "sigma", "delta"),
               n.chains=3,
               n.iter=2000,
               n.burnin=1000,
               DIC=FALSE)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 100
##    Unobserved stochastic nodes: 3
##    Total graph size: 218
## 
## Initializing model

And we can explore the posterior distributions:

plot(as.mcmc(model.fit))
model.fit