Ott (1994)

A *stochastic process* is a process which includes or comprises random components. We describe the outcome of such processes with *random variables*, which can take on a range of values. A *probability model* is a set of rules describing the probabilities of all possible outcomes in the sample space. The values generated by such models are called *probability distributions*. We will discuss probability distributions for continuous random variables (environmental concentrations).

For a random variable \(X\), we can describe probability ranges/distributions by the continuous distribution function (CDF) \(F_X\) and probability distribution function (PDF) \(f_X\): \[ \begin{aligned} F_X(x) &= P(X \leq x) = \int_{-\infty}^x f_X(u)du\\ f_X(x) &= \frac{dP}{dx} = \lim_{\Delta x \to 0} \frac{P(x < X \leq x + \Delta x)}{\Delta x} = \frac{d}{dx} F_X(x) \end{aligned} \] Any physical observation we make can be considered as “sampling” from this distribution. The actual value that we observe will depend on a large number of stochastic processes, but the likelihood of drawing a particular value will follow \(f_x(x)\).

Appearance of randomness can arise from

*variability*: natural variations*uncertainty*: “incomplete scientific or technical knowledge” (Morgan, Henrion, and Small 1992), or our lack of capability for accurate/precise observation. (our ignorance regarding functional dependences among variables may lead to the appearance of randomness)

In this lesson, we will introduce nonparameteric and three parameteric distributions: uniform, normal, and lognormal distributions.

In addition, we will discuss two main properties of random variables.

- The expected value (also: average or arithmetic mean) of a random variable \({\operatorname{E}}(X)\) is a measure of the central tendency.
- The variance \({\operatorname{Var}}(X)\) is the second moment about the mean, \({\operatorname{E}}\{[X-{\operatorname{E}}(X)]^2\} = {\operatorname{E}}(X^2)-{\operatorname{E}}(X)^2\).

When we describe the probability model of a random variable \(X\) with a parameteric distribution, we can express \({\operatorname{E}}(X)\) and \({\operatorname{Var}}(X)\) as a function of distribution parameters. The sample mean, \(\bar{X}\), is also a random variable.

Random variable \(X\) can take on any value between \(a\) and \(b\) with equal probability.

For \(x \in [a,b]\), the PDF and CDF are \[ \begin{aligned} f_X(x) &= \frac{1}{b-a}\\ F_X(x) &= \frac{x-a}{b-a} \end{aligned} \]

Mean and variance: \[ \begin{aligned} {\operatorname{E}}(X) &= \frac{a+b}{2}\\ {\operatorname{Var}}(X) &= \frac{(b-a)^2}{12} \end{aligned} \]

PDF and CDF: \[ \begin{aligned} f_X(x) &= \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\\ F_X(x) &= \frac{1}{2} \left[ {\operatorname{erf}}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)+1\right] \end{aligned} \] The following ratio is also called the normal standard variable; often designated as \(z\): \[ z = \frac{x-\mu}{\sigma} \] The value of \(z\) is also called the \(z\)-score.

Mean and variance: \[ \begin{aligned} {\operatorname{E}}(X) &= \mu \\ {\operatorname{Var}}(X) &= \sigma^2 \end{aligned} \]