In this lesson we will implement a Bayesian version of the classical T-test. We will also explore some ways to summarize the JAGS output, and we will introduce the concept of derived quantity.
We will use the example from Marc Kery’s Introduction to WinBUGS for Ecologists, page 92 (Section 7.1 - t-test). The data describe wingspan of male and female Peregrine falcon (Falco peregrinus).
Let’s load the data and have a look at the data:
falcon <- read.csv("http://www.petrkeil.com/wp-content/uploads/2014/02/falcon.csv")
summary(falcon)
boxplot(wingspan ~ male, data=falcon,
names=c("Female", "Male"),
ylab="Wingspan [cm]",
col="grey")
We can use the classical two-sample t.test()
:
x <- falcon$wingspan[falcon$male==1]
y <- falcon$wingspan[falcon$male==0]
t.test(x, y)
##
## Welch Two Sample t-test
##
## data: x and y
## t = -15.605, df = 82.954, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29.14053 -22.55197
## sample estimates:
## mean of x mean of y
## 79.85925 105.70550
Note: this can also be done by lm()
:
lm1 <- lm(wingspan ~ male, data=falcon)
summary(lm1)
… or by glm()
:
glm1 <- glm(wingspan ~ male, data=falcon)
summary(glm1)
We assume the each males and females each have their own Normal distribution, from which the wingspans are drawn:
\(y_m \sim Normal(\mu_m, \sigma)\)
\(y_f \sim Normal(\mu_f, \sigma)\)
Note that the variance (\(\sigma\)) is the same in both groups.
This is the hypothesis that we usually test in the frequentist setting: \(\delta = \mu_f - \mu_m \neq 0\)
But we can actually ask even more directly: What is the mean difference (\(\delta\)) between female and male wingspan?
Here is how we prepare the data for JAGS:
y.male <- falcon$wingspan[falcon$male==1]
y.female <- falcon$wingspan[falcon$male==0]
falcon.data.1 <- list(y.f=y.female,
N.f=60,
y.m=y.male,
N.m=40)
Loading the necessary library:
library(R2jags)
Definition of the model:
cat("
model
{
# priors
mu.f ~ dnorm(0, 0.001) # Note: tau = 1/variance
mu.m ~ dnorm(0, 0.001)
tau <- 1/(sigma*sigma)
sigma ~ dunif(0,100)
# likelihood - Females
for(i in 1:N.f)
{
y.f[i] ~ dnorm(mu.f, tau)
}
# likelihood - Males
for(j in 1:N.m)
{
y.m[j] ~ dnorm(mu.m, tau)
}
# derived quantity:
delta <- mu.f - mu.m
}
", file="t-test.bug")
The MCMC sampling done by jags()
function:
model.fit <- jags(data=falcon.data.1,
model.file="t-test.bug",
parameters.to.save=c("mu.f", "mu.m", "sigma", "delta"),
n.chains=3,
n.iter=2000,
n.burnin=1000,
DIC=FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 100
## Unobserved stochastic nodes: 3
## Total graph size: 115
##
## Initializing model
And we can explore the posterior distributions:
plot(as.mcmc(model.fit))
model.fit
Alternativelly, you can also specify the model in a more conventional way:
\(\mu_i = \mu_f + \delta \times male_i\)
\(y_i \sim Normal(\mu_i, \sigma)\)
Preparing the data for JAGS is somewhat different than above:
falcon.data.2 <- list(y=falcon$wingspan,
male=falcon$male,
N=100)
Definition of the model:
cat("
model
{
# priors
mu.f ~ dnorm(0, 0.001)
delta ~ dnorm(0, 0.001)
tau <- 1/(sigma*sigma)
sigma ~ dunif(0,100)
# likelihood
for(i in 1:N)
{
y[i] ~ dnorm(mu[i], tau)
mu[i] <- mu.f + delta*male[i]
}
# derived quantity
mu.m <- mu.f + delta
}
", file="t-test2.bug")
The MCMC sampling done by jags()
function:
model.fit <- jags(data=falcon.data.2,
model.file="t-test2.bug",
parameters.to.save=c("mu.f", "mu.m", "sigma", "delta"),
n.chains=3,
n.iter=2000,
n.burnin=1000,
DIC=FALSE)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 100
## Unobserved stochastic nodes: 3
## Total graph size: 218
##
## Initializing model
And we can explore the posterior distributions:
plot(as.mcmc(model.fit))
model.fit