Background

The probability model

Ott (1994)

A stochastic process is a process which includes or comprises random components. We describe the outcome of such processes with random variables, which can take on a range of values. A probability model is a set of rules describing the probabilities of all possible outcomes in the sample space. The values generated by such models are called probability distributions. We will discuss probability distributions for continuous random variables (environmental concentrations).

For a random variable \(X\), we can describe probability ranges/distributions by the continuous distribution function (CDF) \(F_X\) and probability distribution function (PDF) \(f_X\): \[ \begin{aligned} F_X(x) &= P(X \leq x) = \int_{-\infty}^x f_X(u)du\\ f_X(x) &= \frac{dP}{dx} = \lim_{\Delta x \to 0} \frac{P(x < X \leq x + \Delta x)}{\Delta x} = \frac{d}{dx} F_X(x) \end{aligned} \] Any physical observation we make can be considered as “sampling” from this distribution. The actual value that we observe will depend on a large number of stochastic processes, but the likelihood of drawing a particular value will follow \(f_x(x)\).

Appearance of randomness can arise from

  • variability: natural variations
  • uncertainty: “incomplete scientific or technical knowledge” (Morgan, Henrion, and Small 1992), or our lack of capability for accurate/precise observation. (our ignorance regarding functional dependences among variables may lead to the appearance of randomness)

Characterizing random variables with probability distributions, expected value, and variance

In this lesson, we will introduce nonparameteric and three parameteric distributions: uniform, normal, and lognormal distributions.

In addition, we will discuss two main properties of random variables.

  • The expected value (also: average or arithmetic mean) of a random variable \({\operatorname{E}}(X)\) is a measure of the central tendency.
  • The variance \({\operatorname{Var}}(X)\) is the second moment about the mean, \({\operatorname{E}}\{[X-{\operatorname{E}}(X)]^2\} = {\operatorname{E}}(X^2)-{\operatorname{E}}(X)^2\).

When we describe the probability model of a random variable \(X\) with a parameteric distribution, we can express \({\operatorname{E}}(X)\) and \({\operatorname{Var}}(X)\) as a function of distribution parameters. The sample mean, \(\bar{X}\), is also a random variable.

Parametric distributions

Uniform distribution

Random variable \(X\) can take on any value between \(a\) and \(b\) with equal probability.

For \(x \in [a,b]\), the PDF and CDF are \[ \begin{aligned} f_X(x) &= \frac{1}{b-a}\\ F_X(x) &= \frac{x-a}{b-a} \end{aligned} \]

Mean and variance: \[ \begin{aligned} {\operatorname{E}}(X) &= \frac{a+b}{2}\\ {\operatorname{Var}}(X) &= \frac{(b-a)^2}{12} \end{aligned} \]

Ott_Fig_3-4
Figure 3.4 from Ott (1994)


Normal distribution

PDF and CDF: \[ \begin{aligned} f_X(x) &= \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\\ F_X(x) &= \frac{1}{2} \left[ {\operatorname{erf}}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)+1\right] \end{aligned} \] The following ratio is also called the normal standard variable; often designated as \(z\): \[ z = \frac{x-\mu}{\sigma} \] The value of \(z\) is also called the \(z\)-score.

Mean and variance: \[ \begin{aligned} {\operatorname{E}}(X) &= \mu \\ {\operatorname{Var}}(X) &= \sigma^2 \end{aligned} \]