In this lesson:

- How do we compare two sets of concentrations? Are weekend concentrations higher than weekday? Are concentrations in Zurich higher than Lausanne? There are many ways to answer these questions. We will discuss how to compare average values and declare whether they might be different, given the distribution of concentrations in the measurements.
- Can our data tell us something about the dominant atmospheric process responsible for the observed concentrations of a pollutant? What might that say about the nature of the source of the pollutant? We will evaluate whether dispersion might be a dominant source.

To answer these questions, we turn to inferential statistics. We will pose a hypothesis, and let the data tell us whether it may be true or not.

The objective of inferential statistics is to determine underlying mechanisms that generated the data, or draw conclusions regarding the value of metrics or estimates observed given the variability in the data.

Topics:

- Normal processes and the central limit theorem
- Lognormal processes and environmental dispersion
- Hypothesis testing framework
- Sampling distribution of the mean
- Testing differences in means
- Fitting probability distributions
- Considerations for extension to other tests

Two special processes lead to statistical distributions we presented in an earlier lesson.

*A normal [or Gaussian] process results when a number of unrelated, continuous random variable are added together*. (Ott 1994)

Particles suspended in a fluid are continuously bombarded by the surrounding fluid molecules \(\Rightarrow\) results in a random, irregular motion (â€śrandom walkâ€ť) of the particles known as Brownian motion. (Hinds 1999)

The spread of particles with time can be determined by solving the one-dimensional equation of diffusion (Fickâ€™s second law) for \(n(x=0,t=0) = n_0\), \[ \frac{\partial n}{\partial t} = D \frac{\partial^2n}{\partial x^2} \] The solution for the concentration distribution is given by the Gaussian form, \[ n(x,t) = \frac{n_0}{2\sqrt{\pi Dt}}\exp\left(\frac{-x^2}{4Dt}\right) \]

The mean square displacement of particles of the particles from \(x=0\) at time \(t\) is given by \[ \langle{x^2}\rangle = \frac{1}{n_0} \int_{-\infty}^\infty x^2 n(x,t) dx = 2 Dt \]

The fractional distribution of particles can be expressed as a function of position \(x\) at time \(t\), \[ f(x,t) = \frac{n(x,t)}{n_0} = \frac{1}{\sqrt{2\pi\langle{x^2}\rangle}} \exp\left(-\frac{x^2}{2\langle{x^2}\rangle}\right) \]

Other normal processes

- Analytical measurement errors
- Mean estimation

This leads to the normal distribution that we discussed in our previous lesson:

*A lognormal process is one in which the random variable of interest results from the product of many independent random variables multiplied together* (Ott 1994)

- The variable of interest can be expressed as a linear proportion of the value it attains in each previous state. The values of the linear proportions are boundedâ€”if any proportion is zero, the product generated by this process will be zero regardless of the values of subsequent proportions.
- Each linear proportion is assumed to be independent of all successive linear proportions
- Many successive states have occurred between initial state and point in time in which the variable is observed.

**Theory of successive dilutions** (Ott 1990):

\(c_0\) is the initial concentration, \(D_m\) is the \(m\)th dilution factor, and \(C_m\) is **the concentration in \(m\)th cell** (increasing dilution with increasing \(m\)). \[
\begin{aligned}
C_1 &= c_0 D_1 \\
C_2 &= C_1 D_2 = c_0 D_1 D_2 \\
C_m &= c_0 D_1 D_2 \ldots D_m = c_0 \prod_{i=1}^m D_i \\
\log C_m &= \log c_0 + \log D_1 + \log D_2 + \ldots + \log D_m = c_0 \sum_{i=1}^m \log D_i
\end{aligned}
\] If \(D_i\) is a random variable (e.g., uniformly distributed), r.h.s. of last equation is sum of \(m\) independent random variables \(\Rightarrow\) \(\log C_m\) normally distributed \(\Rightarrow\) \(C_m\), the concentration at point of observation, .