Doing it Bayesian

Petr Keil, Jan Smycka
January 2016

Conditional probability

Rule for joint (AND) probability is \[ P(A \cap B) = P(A) \times P(B|A) \] \( A \) and \( B \) can be swapped arbitrarily \[ P(A \cap B) = P(B) \times P(A|B) \] and so \[ P(B) \times P(A|B) = P(A) \times P(B|A) \] which we can rearrange to get \[ P(A|B) = \frac {P(A) \times P(B|A)}{P(B)} \] which is the Bayes rule.

Bayes rule in statistics

We can replace \( A \) and \( B \) by model parameters \( \theta \) and the data \( y \) to get

\( p(\theta|y) = \frac {p(\theta) \times p(y|\theta)}{p(y)} \)

where

\( p(y|\theta) \) … likelihood

\( p(\theta) \) … prior

\( p(\theta|y) \) … posterior

\( p(y) \) … the horrible thing

Why is p(y) horrible?

\[ p(y)=\sum_\theta p(\theta) \times p(y|\theta) \]

\[ p(y)=\int_\theta p(\theta) \times p(y|\theta) d\theta \]

Avoiding the horrible thing

In most cases we can't calculate \( p(y) \). But we can calculate the ratio of \( p(\theta_1|y) \) and \( p(\theta_2|y) \):

\[ \frac{p(\theta_1|y) }{ p(\theta_2|y)}=\frac{p(\theta_1) \times p(y|\theta_1)}{p(\theta_2) \times p(y|\theta_2)} = \alpha \]

We can also say that

\[ p(\theta|y) \propto p(\theta) \times p(y|\theta) \]

Sampling from the posterior

We can use the ratio \( \frac{p(\theta_1|y) }{ p(\theta_2|y)} \) to sample from the posterior distribution by a numerical sampling algorithm called Markov Chain Monte Carlo (MCMC).

  • Metropolis-Hastings algorithm – See YouTube video, time 1:00 onwards.
  • Gibbs algorithm

Sampling from the posterior

An example of Metropolis-Hastings MCMC estimation of mean (\( \mu \)) and standard deviation (\( \sigma \)) of a Normal distribution (by M. Joseph; R code here) mcmc illustration

Sampling from the posterior

The result of three Markov chains running on the 3D Rosenbrock function using the Metropolis-Hastings algorithm. (Source: Wikipedia article). mcmc jags

To summarize, we only need

\[ p(\theta|y) \propto p(\theta) \times p(y|\theta) \]

and not

\[ p(\theta|y) = \frac {p(\theta) \times p(y|\theta)}{p(y)} \]

Common MCMC samplers

WinBUGS, OpenBUGS www.openbugs.net – uses BUGS language

JAGS mcmc-jags.sourceforge.net/ – uses BUGS language

STAN - mc-stan.org

STAN - properties

mc-stan.org

  • uses more sophisticated MCMC sampling than Gibbs or M-H

  • suffers less from steps autocorrelation

  • may be more efficient for complicated posteriors (cigars, camels)

  • language is more difficult to command then BUGS

INLA

(Integrated Nested Laplace Approximation)

www.r-inla.org

Uses the idea that in certain types of models \( p(\theta_a|\theta_b, y) \) can be aproximated by normal distribution (note: \( \theta_a \) and \( \theta_b \) are two different parameters in the same model).

This makes the expression \( p(y|\theta) \times p(\theta) \) integrable.

INLA

plot of chunk unnamed-chunk-1

INLA - properties