Intro to ML and Bayesian statistics for ecologists

Petr Keil

March 2017, iDiv

Preface

  • I am not a statistician.
  • I will show the basics, you figure out the rest.
  • Do ask questions and interrupt!

Preface

It would be wonderful if, after the course, you would:

  • Not be intimidated by Bayesian and ML papers.
  • Get the foundations and some of useful connections between concepts to build on.
  • See statistics as a simple construction set (e.g. Lego), rather than as a series of recipes.
  • Have a statistical satori.

Contents

DAY 1

  • Likelihood, probability distributions
  • First Bayesian steps

DAY 2

  • First Bayesian steps
  • Classical models (regression, ANOVA)

DAY 3

  • Advanced models (mixed, latent variables)
  • Inference, uncertainty, model selection.

eu

Statistical models are stories about how the data came to be.

eu

Parametric statistical modeling means describing a caricature of the “machine” that plausibly could have produced the nubmers we observe.

Kéry 2010

Data

            x          y
1  -1.6902124 -2.8312840
2  -1.5927444 -2.1346018
3  -1.3144798 -3.5481984
4  -1.2741388 -0.6909243
5  -1.1868903 -3.0635968
6  -0.8540381 -1.5809843
7  -0.7117748 -0.5379842
8  -0.6501826  2.0892109
9  -0.3334035  2.9319640
10 -0.2988843  0.6457664
11  0.1374639  2.8685802
12  0.3842709  3.7274582
13  0.5925691  3.1164421
14  0.6984226  6.9814234
15  0.9002922  6.6296795
16  1.0339445  3.8036975
17  1.0944699  5.4047010
18  1.4270767  6.1245379
19  1.9464882  8.0623618
20  2.2952422  8.1494960

Data

plot of chunk unnamed-chunk-2

Data, model, parameters

plot of chunk unnamed-chunk-3

\( y_i \sim Normal(\mu_i, \sigma) \)

\( \mu_i = a + b \times x_i \)

Can you separate the deterministic and the stochastic part?

Data

plot of chunk unnamed-chunk-4

Data, model, parameters

plot of chunk unnamed-chunk-5

Can you separate the deterministic and the stochastic part?

\( x_i \sim Normal(\mu, \sigma) \)

Can you tell what is based on a parametric model?

  • Permutation tests
  • Normal distribution
  • Kruskall-Wallis test
  • Histogram
  • t-test
  • Neural networks, random forests
  • ANOVA
  • Survival analysis
  • Pearson correlation
  • PCA (principal components analysis)

Elementary notation

  • \( P(A) \) vs \( p(A) \) … Probability vs probability density
  • \( P(A \cap B) \) … Joint (intersection) probability (AND)
  • \( P(A \cup B) \) … Union probability (OR)
  • \( P(A|B) \) … Conditional probability (GIVEN THAT)
  • \( \sim \) … is distributed as
  • \( x \sim N(\mu, \sigma) \) … x is a normally distributed random variable
  • \( \propto \) … is proportional to (related by constant multiplication)

Elementary notation

  • \( P(A) \) vs \( p(A) \)
  • \( P(A \cap B) \)
  • \( P(A \cup B) \)
  • \( P(A|B) \)
  • \( \sim \)
  • \( \propto \)

Data, model, parameters

Let's use \( y \) for data, and \( \theta \) for parameters.

\( p(\theta | y, model) \) or \( p(y | \theta, model) \)

The model is always given (assumed), and usually omitted:

\( p(y|\theta) \) … “likelihood-based” or “frequentist” statistics

\( p(\theta|y) \) … Bayesian statistics

Maximum Likelihood Estimation (MLE)

  • Used for most pre-packaged models (GLM, GLMM, GAM, …)
  • Great for complex models
  • Relies on optimization (relatively fast)
  • Can have problems with local optima
  • Not great with uncertainty

Why go Bayesian?

  • Numerically tractable for models of any complexity.
  • Unbiased for small sample sizes.
  • It works with uncertainty.
  • Extremely simple inference.
  • The option of using prior information.
  • It gives perspective.

The pitfalls

  • Steep learning curve.
  • Tedious at many levels.
  • You will have to learn some programming.
  • It can be computationally intensive, slow.
  • Problematic model selection.
  • Not an exploratory analysis or data mining tool.

To be thrown away

  • Null hypotheses formulation and testing
  • P-values, significance at \( \alpha=0.05 \), …
  • Degrees of freedom, test statistics
  • Post-hoc comparisons
  • Sample size corrections

Remains

  • Regression, t-test, ANOVA, ANCOVA, MANOVA
  • Generalized Linear Models (GLM)
  • GAM, GLS, autoregressive models
  • Mixed-effects (multilevel, hierarchical) models

Are hierarchical models always Bayesian?

  • No

Myths about Bayes

  • It is a 'subjective' statistics.
  • The main reason to go Bayesian is to use the Priors.
  • Bayesian statistics is heavy on equations.

Elementary notation

  • \( P(A) \) vs \( p(A) \)
  • \( P(A \cap B) \)
  • \( P(A \cup B) \)
  • \( P(A|B) \)
  • \( \sim \)
  • \( \propto \)

Indexing in R and BUGS: 1 dimension

  x <- c(2.3, 4.7, 2.1, 1.8, 0.2)
  x
[1] 2.3 4.7 2.1 1.8 0.2
  x[3] 
[1] 2.1

Indexing in R and BUGS: 2 dimensions

  X <- matrix(c(2.3, 4.7, 2.1, 1.8), 
              nrow=2, ncol=2)
  X
     [,1] [,2]
[1,]  2.3  2.1
[2,]  4.7  1.8
  X[2,1] 
[1] 4.7

Lists in R

  x <- c(2.3, 4.7, 2.1, 1.8, 0.2)
  N <- 5
  data <- list(x=x, N=N)
  data
$x
[1] 2.3 4.7 2.1 1.8 0.2

$N
[1] 5
  data$x # indexing by name
[1] 2.3 4.7 2.1 1.8 0.2

For loops in R (and BUGS)

for (i in 1:5)
{
  statement <- paste("Iteration", i)
  print(statement)
}
[1] "Iteration 1"
[1] "Iteration 2"
[1] "Iteration 3"
[1] "Iteration 4"
[1] "Iteration 5"