Demonstration of simple model we trust. In this case, cross-validation is not needed and we we can get better accuracy using the explicit model.

Load libraries

```
library(tidyr)
library(rstanarm)
library(rstan)
options(mc.cores = parallel::detectCores())
library(loo)
library(shinystan)
library(ggplot2)
library(ggridges)
```

An experiment was performed to estimate the effect of beta-blockers on mortality of cardiac patients (this example is used also in BDA3 chapter 3). A group of patients were randomly assigned to treatment and control groups:

- out of 674 patients receiving the control, 39 died
- out of 680 receiving the treatment, 22 died

Data, where grp2 is a dummy variable that captures the difference of the intercepts in the first and the second group.

`d_bin2 <- data.frame(N = c(674, 680), y = c(39,22), grp2 = c(0,1))`

To analyse whether the treatment is useful, we can use Binomial model for both groups and compute odds-ratio.

```
fit_bin2 <- stan_glm(y/N ~ grp2, family = binomial(), data = d_bin2,
weights = N, refresh=0)
```

Plot odds ratio

```
samples_bin2 <- rstan::extract(fit_bin2$stanfit)
theta1 <- plogis(samples_bin2$alpha)
theta2 <- plogis(samples_bin2$alpha + samples_bin2$beta)
oddsratio <- (theta2/(1-theta2))/(theta1/(1-theta1))
ggplot() + geom_histogram(aes(oddsratio), bins = 50, fill = 'darkblue', color = 'black') +
labs(y = '') + scale_y_continuous(breaks = NULL)
```

The probability that odds-ratio is less than 1:

`print(mean(oddsratio<1),2)`

`[1] 0.99`

This posterior distribution of the odds-ratio (or some transformation of it) is the simplest and the most accurate way to analyse the effectivness of the treatment. In this case, there is high probability that the treatment is effective.

Sometimes it is better to focus on observable space (we can’t observe \(\theta\) or odds-ratio directly, but we can observe \(y\)). In leave-one-out cross-validation, model is fitted \(n\) times with each observation left out at time in fitting and used to evaluate the predictive performance. This corresponds to using the already seen observations as pseudo Monte Carlo samples from the future data distribution, with the leave-trick used to avoid double use of data. With the often used log-score we get \[\mathrm{LOO} = \frac{1}{n} \sum_{i=1}^n \log {p(y_i|x_i,D_{-i},M_k)}\].

Next we demonstrate one of the weakneses of cross-validation (same holds for WAIC etc).

To use leave-one-out where “one” refers to an individual patient, we need to change the model formulation a bit. In the above model formulation, the individual observations have been aggregated to group observations and running `loo(fit_bin2)`

would try to leave one group completely. In case of having more groups, this could be what we want, but in case of just two groups it is unlikely. Thus, in the following we switch a Bernoulli model with each individual as it’s own observation.

```
d_bin2 <- data.frame(N = 674+680, y = c(rep(1,39), rep(0,674-39), rep(1,22), rep(0,680-22)), grp2 = c(rep(0, 674), rep(1, 680)))
fit_bin2 <- stan_glm(y ~ grp2, family = binomial(), data = d_bin2, seed=180202538, refresh=0)
```

We fit also a “null” model which doesn’t use the group variable and thus has common parameter for both groups.’

`fit_bin2null <- stan_glm(y ~ 1, family = binomial(), data = d_bin2, seed=180202538, refresh=0)`

We can then use cross-validation to compare whether adding the treatment variable improves predictive performance.

`(loo_bin2 <- loo(fit_bin2))`

```
Computed from 4000 by 1354 log-likelihood matrix
Estimate SE
elpd_loo -248.1 23.4
p_loo 2.0 0.3
looic 496.3 46.7
All Pareto k estimates are good (k < 0.5)
See help('pareto-k-diagnostic') for details.
```

`(loo_bin2null <- loo(fit_bin2null))`

```
Computed from 4000 by 1354 log-likelihood matrix
Estimate SE
elpd_loo -249.7 23.4
p_loo 1.0 0.1
looic 499.4 46.7
All Pareto k estimates are good (k < 0.5)
See help('pareto-k-diagnostic') for details.
```

All k<0.5 and we can trust PSIS-LOO result.

Let’s make pairwise comparison.

`compare(loo_bin2null,loo_bin2)`

```
elpd_diff se
1.5 2.3
```

elpd_diff is small compared to se, and thus cross-validation is uncertain whether would be significant improvement in predictive performance! To put this in persepective, we have N1=674 and N2=680, and 5.8% and 3.2% deaths, and this is now too weak information for cross-validation.

Simulation experiment is `binom_odds_comparison.R`

runs 100 simulations with simulated data with varying oddsratio (0.1,…,1.0) and computes for each run the probability that oddsratio<1 and loo comparison. The follwing figures show the variation in the results.

Variation in probability that oddsratio<1 when true oddsratio is varied.

```
load(file="binom_test_densities.RData")
ggplot(betaprobs_densities, aes(x = values, y = ind, height = scaled)) +
geom_density_ridges(stat = "identity", scale=0.6)
```

Variation in loo comparison when true oddsratio is varied.

```
ggplot(looprobs_densities, aes(x = values, y = ind, height = scaled)) +
geom_density_ridges(stat = "identity", scale=0.6)
```

We see that using the posterior distribution from the model is more sensitive to detect the effect, but cross-validation will detect it eventually too. The difference here comes that cross-validation doesn’t trust the model, compares the model predictions to the “future data” using very weak assumption about the future. The weak assumption about the future is also the cross-validation strength as we we’ll see in another notebook.

We can also do predictive performance estimates using stronger assumption about the future. A reference predictive estimate with log-score can be computed as \[\mathrm{elpd}_{\mathrm{ref}} = \int p(\tilde{y}|D,M_*) \log p(\tilde{y}|D,M_k) d\tilde{y}, \] where \(M_*\) is a reference model we trust. The next figure shows the results from the same simulation study using a reference predictive approach with the `fit_bin2`

model used as the reference.

```
ggplot(refprobs_densities, aes(x = values, y = ind, height = scaled)) +
geom_density_ridges(stat = "identity", scale=0.6)
```

We can see better accuracy than for cross-validation. We come back to this later in the tutorial.

`sessionInfo()`

```
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=fi_FI.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggridges_0.4.1 shinystan_2.4.0 shiny_1.0.5
[4] loo_1.1.0 rstan_2.17.3 StanHeaders_2.17.2
[7] ggplot2_2.2.1 rstanarm_2.17.3 Rcpp_0.12.15
[10] tidyr_0.8.0
loaded via a namespace (and not attached):
[1] lattice_0.20-35 zoo_1.8-1 gtools_3.5.0
[4] assertthat_0.2.0 rprojroot_1.3-2 digest_0.6.15
[7] mime_0.5 R6_2.2.2 plyr_1.8.4
[10] backports_1.1.2 stats4_3.4.3 evaluate_0.10.1
[13] colourpicker_1.0 pillar_1.1.0 rlang_0.1.6
[16] lazyeval_0.2.1 minqa_1.2.4 miniUI_0.1.1
[19] nloptr_1.0.4 Matrix_1.2-12 DT_0.4
[22] rmarkdown_1.8 labeling_0.3 shinythemes_1.1.1
[25] splines_3.4.3 shinyjs_1.0 lme4_1.1-15
[28] stringr_1.2.0 htmlwidgets_1.0 igraph_1.1.2
[31] munsell_0.4.3 compiler_3.4.3 httpuv_1.3.5
[34] pkgconfig_2.0.1 base64enc_0.1-3 rstantools_1.4.0
[37] htmltools_0.3.6 tibble_1.4.2 gridExtra_2.3
[40] codetools_0.2-15 matrixStats_0.53.0 threejs_0.3.1
[43] dplyr_0.7.4 MASS_7.3-49 grid_3.4.3
[46] nlme_3.1-131 xtable_1.8-2 gtable_0.2.0
[49] magrittr_1.5 scales_0.5.0 stringi_1.1.6
[52] reshape2_1.4.3 bindrcpp_0.2 dygraphs_1.1.1.4
[55] xts_0.10-1 tools_3.4.3 glue_1.2.0
[58] markdown_0.8 purrr_0.2.4 crosstalk_1.0.0
[61] survival_2.41-3 parallel_3.4.3 rsconnect_0.8.5
[64] yaml_2.1.16 inline_0.3.14 colorspace_1.3-2
[67] bayesplot_1.4.0 knitr_1.19 bindr_0.1
```