Petr Keil
January 2016
DAY 1
DAY 2
DAY 3
Statistical models are stories about how the data came to be.
Parametric statistical modeling means describing a caricature of the “machine” that plausibly could have produced the nubmers we observe.
Kéry 2010
x y
1 -3.36036438 -8.2255986
2 -1.67415781 -3.4254527
3 -1.28843583 -1.6274842
4 -1.12398265 -1.6607849
5 -0.79909384 -0.3644841
6 -0.66600733 0.3086723
7 -0.59372833 0.1873957
8 -0.59034664 0.8194618
9 -0.58946921 0.8746883
10 -0.11633386 0.9574701
11 -0.05839936 1.5326324
12 -0.04541250 1.4010884
13 0.01851941 1.7617596
14 0.17032248 2.7476527
15 0.47701840 3.4969438
16 0.91837569 3.3880732
17 1.12012158 5.7335285
18 1.15159940 5.5316939
19 1.29460706 6.3070198
20 1.42823681 5.5571250
\( y_i \sim Normal(\mu_i, \sigma) \)
\( \mu_i = a + b \times x_i \)
Can you separate the deterministic and the stochastic part?
Can you separate the deterministic and the stochastic part?
\( x_i \sim Normal(\mu, \sigma) \)
Let's use \( y \) for data, and \( \theta \) for parameters.
\( p(\theta | y, model) \) or \( p(y | \theta, model) \)
The model is always given (assumed), and usually omitted:
\( p(y|\theta) \) … “likelihood-based” or “frequentist” statistics
\( p(\theta|y) \) … Bayesian statistics
x <- c(2.3, 4.7, 2.1, 1.8, 0.2)
x
[1] 2.3 4.7 2.1 1.8 0.2
x[3]
[1] 2.1
X <- matrix(c(2.3, 4.7, 2.1, 1.8),
nrow=2, ncol=2)
X
[,1] [,2]
[1,] 2.3 2.1
[2,] 4.7 1.8
X[2,1]
[1] 4.7
x <- c(2.3, 4.7, 2.1, 1.8, 0.2)
N <- 5
data <- list(x=x, N=N)
data
$x
[1] 2.3 4.7 2.1 1.8 0.2
$N
[1] 5
data$x # indexing by name
[1] 2.3 4.7 2.1 1.8 0.2
for (i in 1:5)
{
statement <- paste("Iteration", i)
print(statement)
}
[1] "Iteration 1"
[1] "Iteration 2"
[1] "Iteration 3"
[1] "Iteration 4"
[1] "Iteration 5"