- Understand the code and the output from a JAGS model
- Be able to write and run JAGS models for linear and logistic regression
- Be able to change the prior distributions in a JAGS model
- Be comfortable with plotting and manipulating the output from JAGS
R
package R2jags
There are several alternative ways of computing posterior distributions in R
other than JAGS:
jags
function to run the model and store the results in an object#
for comment, <-
for assignment, and ~
for distributionsExample:
model_code = ' model { # Likelihood y ~ dnorm(theta, 1) # Prior theta ~ dnorm(0, 1) } '
Example:
model_data = list(y = 1)
Example:
model_parameters = c('theta')
library(R2jags) model_run = jags(data = model_data, parameters.to.save = model_parameters, model.file = textConnection(model_code))
model_run
now stores the posterior distribution (and a load of other things)hist(model_run$BUGSoutput$sims.list$theta, breaks = 30)
The usual way to run JAGS is to set some small additional options:
If the algorithm has worked properly, we should see something like this for each parameter:
traceplot(model_run, varname = 'theta')
Aside from the trace plot, the R2jags
package also creates something called the Brooks-Gelman-Rubin (BGR) diagnostic which measures whether the mean/variance of each chain matches. It produces a value known as the R-hat value which should be close to 1 if we are sampling from the posterior distribution
print(model_run)
## Inference for Bugs model at "5", fit using jags, ## 3 chains, each with 2000 iterations (first 1000 discarded) ## n.sims = 3000 iterations saved ## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff ## theta 0.500 0.701 -0.891 0.037 0.511 0.958 1.907 1.001 2400 ## deviance 2.579 1.014 1.839 1.919 2.176 2.842 5.412 1.004 830 ## ## For each parameter, n.eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor (at convergence, Rhat=1). ## ## DIC info (using the rule, pD = var(deviance)/2) ## pD = 0.5 and DIC = 3.1 ## DIC is an estimate of expected predictive error (lower deviance is better).
There are other useful things in this output, some of which we will cover later
Consider a simple linear regression model with one response variable \(y_t\), \(t=1,\ldots,T\) and one explanatory variable \(x_t\). The usual linear regression model is written as:\[y_t = \alpha + \beta x_t + \epsilon_t;\; \epsilon_t \sim N(0,\sigma^2)\]This can be written in likelihood format as:\[y_t \sim N( \alpha + \beta x_t, \sigma^2)\]so we have three parameters, the intercept \(\alpha\), the slope \(\beta\), and the residual standard deviation \(\sigma\). This latter parameter must be positive
In the absence of further information, we might use the priors \(\alpha \sim N(0,10^2)\), \(\beta \sim N(0, 10^2)\), and \(\sigma \sim U(0, 10)\)
model_code = ' model { # Likelihood for (t in 1:T) { y[t] ~ dnorm(alpha + beta * x[t], tau) } # Priors alpha ~ dnorm(0, 0.01) # = N(0,10^2) beta ~ dnorm(0, 0.01) tau <- 1/pow(sigma, 2) # Turn precision into standard deviation sigma ~ dunif(0, 10) } '
See the file jags_linear_regression.R
for more on this example
sea_level = read.csv('https://raw.githubusercontent.com/andrewcparnell/tsme_course/master/data/church_and_white_global_tide_gauge.csv') with(sea_level,plot(year_AD,sea_level_m))
real_data = with(sea_level, list(T = nrow(sea_level), y = sea_level_m, x = year_AD)) model_parameters = c("alpha", "beta", "sigma") real_data_run = jags(data = real_data, parameters.to.save = model_parameters, model.file=textConnection(model_code), n.chains=4, # 4 chains n.iter=1000, # 1000 iterations n.burnin=200, # Discard first 200 n.thin=2) # Take every other iteration
print(real_data_run)
## Inference for Bugs model at "6", fit using jags, ## 4 chains, each with 1000 iterations (first 200 discarded), n.thin = 2 ## n.sims = 1600 iterations saved ## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% ## alpha -3.062 0.040 -3.141 -3.089 -3.062 -3.034 -2.981 ## beta 0.002 0.000 0.001 0.002 0.002 0.002 0.002 ## sigma 0.009 0.001 0.008 0.008 0.009 0.009 0.010 ## deviance -864.880 2.346 -867.680 -866.622 -865.507 -863.761 -858.956 ## Rhat n.eff ## alpha 1.003 1000 ## beta 1.003 1000 ## sigma 1.001 1600 ## deviance 1.001 1600 ## ## For each parameter, n.eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor (at convergence, Rhat=1). ## ## DIC info (using the rule, pD = var(deviance)/2) ## pD = 2.8 and DIC = -862.1 ## DIC is an estimate of expected predictive error (lower deviance is better).
alpha_mean = mean(real_data_run$BUGSoutput$sims.list$alpha) beta_mean = mean(real_data_run$BUGSoutput$sims.list$beta) with(sea_level,plot(year_AD,sea_level_m)) with(sea_level, lines(year_AD, alpha_mean + beta_mean * year_AD, col = 'red')) legend('topleft', legend = c('Data', 'Posterior mean'), lty=c(-1,1), pch=c(1,-1), col=c('black','red'))
Calculate residuals and create a QQ-plot
res = with(sea_level, sea_level_m - alpha_mean - beta_mean * year_AD) qqnorm(res) qqline(res)
jags_logistic_regression.R
has two explanatory variables, so:\[\mbox{logit}(p_t) = \alpha + \beta_1 x_{1t} + \beta_2 x_{2t}\]model_code = ' model { # Likelihood for (t in 1:T) { y[t] ~ dbin(p[t], K) logit(p[t]) <- alpha + beta_1 * x_1[t] + beta_2 * x_2[t] } # Priors alpha ~ dnorm(0.0,0.01) beta_1 ~ dnorm(0.0,0.01) beta_2 ~ dnorm(0.0,0.01) } '
T = 12 K = 20 y = c(1,4,9,13,18,20, 0,2,6,10,12,16) sex = c(rep('male',6), rep('female',6)) dose = rep(0:5, 2) sexcode = as.integer(sex == 'male') real_data = list(T = T, K = K, y = y, x_1 = sexcode, x_2 = dose) model_parameters = c("alpha", "beta_1", "beta_2") real_data_run = jags(data = real_data, parameters.to.save = model_parameters, model.file = textConnection(model_code), n.chains = 4, n.iter = 1000, n.burnin = 200, n.thin = 2)
print(real_data_run)
## Inference for Bugs model at "7", fit using jags, ## 4 chains, each with 1000 iterations (first 200 discarded), n.thin = 2 ## n.sims = 1600 iterations saved ## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff ## alpha -3.421 0.539 -4.372 -3.769 -3.434 -3.149 -2.325 1.009 330 ## beta_1 1.065 0.366 0.332 0.834 1.071 1.308 1.786 1.010 250 ## beta_2 1.053 0.151 0.760 0.973 1.060 1.150 1.327 1.008 390 ## deviance 40.524 6.000 37.071 38.112 39.336 41.216 49.209 1.013 230 ## ## For each parameter, n.eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor (at convergence, Rhat=1). ## ## DIC info (using the rule, pD = var(deviance)/2) ## pD = 17.9 and DIC = 58.4 ## DIC is an estimate of expected predictive error (lower deviance is better).
par(mfrow=c(1,3)) hist(real_data_run$BUGSoutput$sims.list$alpha, breaks = 30) hist(real_data_run$BUGSoutput$sims.list$beta_1, breaks = 30) hist(real_data_run$BUGSoutput$sims.list$beta_2, breaks = 30)
par(mfrow=c(1,1))
jags_linear_regression.R
and jags_logistic_regression.R
files