This brief lesson introduces the concept of discrete distribution (probability mass function) on the example of Poisson distribution. We will also do more likelihood stuff.
Optional functions for interactive plotting for those with R-studio:
library(manipulate)
source("https://raw.githubusercontent.com/petrkeil/ML_and_Bayes_2017_iDiv/master/Univariate%20Poisson%20Model/plotting_poisson.r")
Poisson distribution is a simple model for discrete and positive count data.
\[p\left( x \right) = \frac{{e^{ - \lambda } \lambda ^x }}{{x!}}\]
Where \(\lambda\) is the only parameter, which is also the mean of the distribution.
Let’s explore the Poisson model a bit and simulate some data using rpois()
:
rpois(n=100, lambda=1)
## [1] 0 2 1 4 0 2 0 1 0 0 2 0 0 0 0 2 2 0 0 4 0 2 2 1 0 1 0 0 0 1 2 1 2 1 1
## [36] 0 1 0 1 0 1 1 1 0 1 2 0 1 2 0 1 0 1 3 1 1 1 1 1 0 1 1 1 2 0 0 1 0 0 2
## [71] 1 3 0 0 1 0 0 1 1 2 2 0 2 4 1 1 2 2 3 0 3 0 3 0 0 1 0 5 2 0
Now try different values of \(\lambda\).
We can plot the “empirical frequency histogram” for simulated data with 3 different values of parameter \(\lambda\):
plot(c(0,35), c(0,1), type="n",
xlab="x", ylab="Frequency")
hist(rpois(n=100, lambda=1), add=T, freq=FALSE, col="blue")
hist(rpois(n=100, lambda=5), add=T, freq=FALSE, col="red")
hist(rpois(n=100, lambda=20), add=T, freq=FALSE, col="green")
Let’s plot the respective probability mass functions:
plot(0:35, dpois(0:35, lambda=1),
col="blue", pch=19,
ylab="Probability mass",
xlab="x")
points(0:35, dpois(0:35, lambda=5), col="red", pch=19)
points(0:35, dpois(0:35, lambda=20), col="green", pch=19)
Interactive version
manipulate(move.poisson(lambda),
lambda=slider(1,100, step=0.01))
I will generate the artificial data again:
x <- rpois(n=100, lambda=8)
x
## [1] 6 3 10 4 12 8 6 10 10 8 8 11 14 10 5 6 11 10 9 7 9 4 6
## [24] 9 8 8 9 5 9 5 6 10 7 4 8 9 7 7 4 12 11 10 9 7 6 6
## [47] 5 9 10 7 8 10 13 8 8 2 6 10 4 8 7 7 8 7 7 9 8 4 10
## [70] 8 5 6 12 9 12 6 4 7 10 3 5 9 12 9 12 12 8 8 8 4 8 11
## [93] 5 8 7 7 3 7 8 8
Poisson distribution is didactically convenient as it only has the one parameter \(\lambda\). We can write the negative log-likelihood function for a given dataset \(x\):
negLL.function <- function(x, lambda)
{
LL <- dpois(x, lambda, log=TRUE) # the log likelihood
negLL <- -sum(LL)
return(negLL)
}
We can try different values of \(\lambda\):
negLL.function(x, lambda=10)
## [1] 261.0663
And we can evaluate the negLL.function
for values of \(\lambda\) between 0 and 35:
lambdas <- seq(0, 30, by=0.1)
negLL.vector <- numeric(0) # empty numeric vector
for(i in 1:length(lambdas))
{
negLL.vector[i] <- negLL.function(x, lambdas[i])
}
Let’s check what we have:
negLL.vector
## [1] Inf 2858.4939 2328.5323 2022.6749 1808.5706 1644.7418 1512.7133
## [8] 1402.6299 1308.6090 1226.8560 1154.7801 1090.5335 1032.7516 980.3984
## [15] 932.6683 888.9228 848.6473 811.4207 776.8943 744.7759 714.8185
## [22] 686.8109 660.5718 635.9439 612.7900 590.9896 570.4367 551.0370
## [29] 532.7066 515.3705 498.9612 483.4179 468.6856 454.7145 441.4591
## [36] 428.8778 416.9327 405.5889 394.8143 384.5794 374.8568 365.6213
## [43] 356.8493 348.5190 340.6102 333.1038 325.9823 319.2289 312.8283
## [50] 306.7659 301.0280 295.6017 290.4751 285.6365 281.0753 276.7814
## [57] 272.7449 268.9570 265.4088 262.0922 258.9995 256.1232 253.4562
## [64] 250.9920 248.7240 246.6462 244.7529 243.0384 241.4974 240.1249
## [71] 238.9161 237.8663 236.9710 236.2260 235.6272 235.1707 234.8526
## [78] 234.6695 234.6177 234.6940 234.8952 235.2180 235.6596 236.2171
## [85] 236.8876 237.6686 238.5574 239.5515 240.6485 241.8462 243.1422
## [92] 244.5344 246.0206 247.5989 249.2673 251.0238 252.8667 254.7941
## [99] 256.8042 258.8956 261.0663 263.3150 265.6401 268.0400 270.5134
## [106] 273.0588 275.6749 278.3603 281.1137 283.9339 286.8197 289.7699
## [113] 292.7833 295.8588 298.9953 302.1918 305.4472 308.7604 312.1306
## [120] 315.5567 319.0378 322.5731 326.1615 329.8023 333.4946 337.2375
## [127] 341.0303 344.8722 348.7623 352.7001 356.6846 360.7152 364.7912
## [134] 368.9119 373.0767 377.2849 381.5358 385.8288 390.1633 394.5387
## [141] 398.9545 403.4100 407.9046 412.4379 417.0094 421.6183 426.2644
## [148] 430.9469 435.6656 440.4197 445.2090 450.0329 454.8910 459.7828
## [155] 464.7078 469.6657 474.6561 479.6784 484.7324 489.8175 494.9335
## [162] 500.0799 505.2564 510.4625 515.6980 520.9624 526.2554 531.5767
## [169] 536.9260 542.3028 547.7069 553.1380 558.5957 564.0798 569.5898
## [176] 575.1256 580.6869 586.2733 591.8845 597.5204 603.1805 608.8647
## [183] 614.5727 620.3042 626.0590 631.8367 637.6373 643.4603 649.3056
## [190] 655.1730 661.0622 666.9729 672.9050 678.8583 684.8324 690.8273
## [197] 696.8426 702.8782 708.9339 715.0095 721.1047 727.2194 733.3534
## [204] 739.5065 745.6784 751.8691 758.0784 764.3060 770.5517 776.8155
## [211] 783.0971 789.3964 795.7132 802.0473 808.3986 814.7669 821.1520
## [218] 827.5539 833.9723 840.4070 846.8581 853.3252 859.8082 866.3071
## [225] 872.8216 879.3517 885.8971 892.4579 899.0337 905.6245 912.2301
## [232] 918.8505 925.4855 932.1350 938.7988 945.4768 952.1689 958.8751
## [239] 965.5951 972.3288 979.0762 985.8371 992.6114 999.3990 1006.1999
## [246] 1013.0138 1019.8406 1026.6804 1033.5329 1040.3981 1047.2759 1054.1661
## [253] 1061.0687 1067.9835 1074.9105 1081.8496 1088.8007 1095.7636 1102.7384
## [260] 1109.7249 1116.7229 1123.7325 1130.7535 1137.7859 1144.8296 1151.8844
## [267] 1158.9503 1166.0272 1173.1151 1180.2137 1187.3232 1194.4434 1201.5741
## [274] 1208.7154 1215.8671 1223.0292 1230.2016 1237.3843 1244.5771 1251.7799
## [281] 1258.9928 1266.2156 1273.4483 1280.6908 1287.9430 1295.2048 1302.4763
## [288] 1309.7573 1317.0477 1324.3475 1331.6567 1338.9751 1346.3027 1353.6395
## [295] 1360.9853 1368.3401 1375.7039 1383.0766 1390.4581 1397.8484 1405.2474
Our maximum likelihood exstimate (MLE) of \(\lambda\) is:
MLE <- lambdas[which.min(negLL.vector)]
MLE
## [1] 7.8
And this is the plot of the negative log-likelihood function, together with the MLE (vertical line):
plot(lambdas, negLL.vector, ylab="Negative Log Likelihood")
abline(v=MLE)
Interactive visualization:
manipulate(plot.poisson.ll(x, lambda),
lambda = slider(1, 20, step = 0.01))
Lunn et al. (2013, p. 138) define deviance as:
\[D = -2 \log p(y|\theta)\] which is two times the negative likelihood!
Simply:
\[AIC = D + 2k\]
where \(k\) is number of parameters. So for our best estimate (\(\lambda = 8\)), AIC will be:
AIC = 2*negLL.function(x, lambda=MLE) + 2*1
AIC
## [1] 471.2355
We can compare this with the output of glm
function:
glm(x~1, family="poisson")
##
## Call: glm(formula = x ~ 1, family = "poisson")
##
## Coefficients:
## (Intercept)
## 2.053
##
## Degrees of Freedom: 99 Total (i.e. Null); 99 Residual
## Null Deviance: 83.72
## Residual Deviance: 83.72 AIC: 471.2
Note, the deviance here is different from our definition, since GLM seems to report saturated version of deviance (Lunn et al. 2013, p. 144)