Intro to ML and Bayesian statistics for ecologists

Petr Keil

March 2017, iDiv

Preface

I am not a statistician.
I will show the basics, you figure out the rest.
Do ask questions and interrupt!

Preface

It would be wonderful if, after the course, you would:

Not be intimidated by Bayesian and ML papers.
Get the foundations and some of useful connections between concepts to build on.
See statistics as a simple construction set (e.g. Lego), rather than as a series of recipes.
Have a statistical satori.

DAY 1

Likelihood, probability distributions
First Bayesian steps

DAY 2

First Bayesian steps
Classical models (regression, ANOVA)

DAY 3

Advanced models (mixed, latent variables)
Inference, uncertainty, model selection.

Statistical models are stories about how the data came to be.

Parametric statistical modeling means describing a caricature of the “machine” that plausibly could have produced the nubmers we observe.

Kéry 2010

Data

            x          y
1  -1.6902124 -2.8312840
2  -1.5927444 -2.1346018
3  -1.3144798 -3.5481984
4  -1.2741388 -0.6909243
5  -1.1868903 -3.0635968
6  -0.8540381 -1.5809843
7  -0.7117748 -0.5379842
8  -0.6501826  2.0892109
9  -0.3334035  2.9319640
10 -0.2988843  0.6457664
11  0.1374639  2.8685802
12  0.3842709  3.7274582
13  0.5925691  3.1164421
14  0.6984226  6.9814234
15  0.9002922  6.6296795
16  1.0339445  3.8036975
17  1.0944699  5.4047010
18  1.4270767  6.1245379
19  1.9464882  8.0623618
20  2.2952422  8.1494960

Data

plot of chunk unnamed-chunk-2

Data, model, parameters

plot of chunk unnamed-chunk-3

\( y_i \sim Normal(\mu_i, \sigma) \)

\( \mu_i = a + b \times x_i \)

Can you separate the deterministic and the stochastic part?

Data

plot of chunk unnamed-chunk-4

Data, model, parameters

plot of chunk unnamed-chunk-5

Can you separate the deterministic and the stochastic part?

\( x_i \sim Normal(\mu, \sigma) \)

Can you tell what is based on a parametric model?

Permutation tests
Normal distribution
Kruskall-Wallis test
Histogram
t-test
Neural networks, random forests
ANOVA
Survival analysis
Pearson correlation
PCA (principal components analysis)

Elementary notation

\( P(A) \) vs \( p(A) \) … Probability vs probability density
\( P(A \cap B) \) … Joint (intersection) probability (AND)
\( P(A \cup B) \) … Union probability (OR)
\( P(A|B) \) … Conditional probability (GIVEN THAT)
\( \sim \) … is distributed as
\( x \sim N(\mu, \sigma) \) … x is a normally distributed random variable
\( \propto \) … is proportional to (related by constant multiplication)

Elementary notation

\( P(A) \) vs \( p(A) \)
\( P(A \cap B) \)
\( P(A \cup B) \)
\( P(A|B) \)
\( \sim \)
\( \propto \)

Data, model, parameters

Let's use \( y \) for data, and \( \theta \) for parameters.

\( p(\theta | y, model) \) or \( p(y | \theta, model) \)

The model is always given (assumed), and usually omitted:

\( p(y|\theta) \) … “likelihood-based” or “frequentist” statistics

\( p(\theta|y) \) … Bayesian statistics

Maximum Likelihood Estimation (MLE)

Used for most pre-packaged models (GLM, GLMM, GAM, …)
Great for complex models
Relies on optimization (relatively fast)
Can have problems with local optima
Not great with uncertainty

Why go Bayesian?

Numerically tractable for models of any complexity.
Unbiased for small sample sizes.
It works with uncertainty.
Extremely simple inference.
The option of using prior information.
It gives perspective.

The pitfalls

Steep learning curve.
Tedious at many levels.
You will have to learn some programming.
It can be computationally intensive, slow.
Problematic model selection.
Not an exploratory analysis or data mining tool.

To be thrown away

Null hypotheses formulation and testing
P-values, significance at \( \alpha=0.05 \), …
Degrees of freedom, test statistics
Post-hoc comparisons
Sample size corrections

Remains

Regression, t-test, ANOVA, ANCOVA, MANOVA
Generalized Linear Models (GLM)
GAM, GLS, autoregressive models
Mixed-effects (multilevel, hierarchical) models

Are hierarchical models always Bayesian?

Myths about Bayes

It is a 'subjective' statistics.
The main reason to go Bayesian is to use the Priors.
Bayesian statistics is heavy on equations.

Elementary notation

\( P(A) \) vs \( p(A) \)
\( P(A \cap B) \)
\( P(A \cup B) \)
\( P(A|B) \)
\( \sim \)
\( \propto \)

Indexing in R and BUGS: 1 dimension

  x <- c(2.3, 4.7, 2.1, 1.8, 0.2)
  x

[1] 2.3 4.7 2.1 1.8 0.2

  x[3]

[1] 2.1

Indexing in R and BUGS: 2 dimensions

  X <- matrix(c(2.3, 4.7, 2.1, 1.8), 
              nrow=2, ncol=2)
  X

     [,1] [,2]
[1,]  2.3  2.1
[2,]  4.7  1.8

  X[2,1]

[1] 4.7

Lists in R

  x <- c(2.3, 4.7, 2.1, 1.8, 0.2)
  N <- 5
  data <- list(x=x, N=N)
  data

$x
[1] 2.3 4.7 2.1 1.8 0.2

$N
[1] 5

  data$x # indexing by name

[1] 2.3 4.7 2.1 1.8 0.2

For loops in R (and BUGS)

for (i in 1:5)
{
  statement <- paste("Iteration", i)
  print(statement)
}

[1] "Iteration 1"
[1] "Iteration 2"
[1] "Iteration 3"
[1] "Iteration 4"
[1] "Iteration 5"

Intro to ML and Bayesian statistics for ecologists

Preface

Preface

Contents

Data

Data

Data, model, parameters

Data

Data, model, parameters

Can you tell what is based on a parametric model?

Elementary notation

Elementary notation

Data, model, parameters

Maximum Likelihood Estimation (MLE)

Why go Bayesian?

The pitfalls

To be thrown away

Remains

Are hierarchical models always Bayesian?

Myths about Bayes

Elementary notation

Indexing in R and BUGS: 1 dimension

Indexing in R and BUGS: 2 dimensions

Lists in R

For loops in R (and BUGS)