Image Analysis for Food

Kevin Mader
3 September 2015, VT Scientific Retreat

Image Processing of Various Food Images

Overview

  • Who am I?
  • Who are you?
  • Why does this workshop exist?
  • General Philosophy
  • Why is quantitative imaging important?
  • Introduction
  • Images
  • Filtering
  • Segmenting
  • Statistics

Who am I?

  • Kevin Mader (mader@biomed.ee.ethz.ch)
    • Lecturer at ETH Zurich
    • Postdoc in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute
    • Spin-off 4Quant for Big Data with Images

Kevin Mader

What have I done with images?

  • Quantitative Big Imaging Course
    • kmader.github.io/Quantitative-Big-Imaging-2015/

Projects

  • Relationship between structure and genetic background for 1300 bones
  • Tracking bubbles in a liquid foam
  • Finding throat and neck cancer in 100s of patients

Who are you?

A wide spectrum of backgrounds

  • Biologists, Biomedical Engineers, Physicists, Chemists, Mechanical Engineers, …

A wide range of skills

  • I think I've heard of Matlab before \( \rightarrow \) I write template C++ code and hand optimize it afterwards

So how will this ever work?

Adaptive assignments

  1. Conceptual, graphical assignments with practical examples
    • Emphasis on chosing correct steps and understanding workflow
  2. Opportunities to create custom implementations, plugins, and perform more complicated analysis on larger datasets if interested
    • Emphasis on performance, customizing analysis, and scalability

Pre-requisites

Install this software

  • KNIME Setup
  • Install the latest Image Processing tools

Software Know-how

  • Comfortable with the first few exercises (2,3)
  • Able to install KNIME

Literature / Useful References

General Material

  • Jean Claude, Morphometry with R
  • John C. Russ, “The Image Processing Handbook”,(Boca Raton, CRC Press)
    • Available online within domain ethz.ch (or proxy.ethz.ch / public VPN)
  • J. Weickert, Visualization and Processing of Tensor Fields

Today

Motivation

Crazy Workflow

  • To understand what, why and how from the moment an image is produced until it is finished (published, used in a report, …)
  • To learn how to go from one analysis on one image to 10, 100, or 1000 images (without working 10, 100, or 1000X harder)

On Science

What is the purpose?

  • Discover and validate new knowledge

How?

  • Use the scientific method as an approach to convince other people
  • Build on the results of others so we don't start from the beginning

Important Points

  • While qualitative assessment is important, it is difficult to reliably produce and scale
    • Quantitative analysis is far from perfect, but provides metrics which can be compared and regenerated by anyone

Inspired by: imagej-pres

Science and Imaging

  • Images are great for qualitative analyses since our brains can quickly interpret them without large programming investements.
  • Proper processing and quantitative analysis is however much more difficult with images.

    • If you measure a temperature, quantitative analysis is easy, \( 50K \), the average temperature also has a meaning
    • If you measure an image it is much more difficult and much more prone to mistakes, subtle setup variations, and confusing analyses
  • What does the average of an image even mean?

  • AvgBrain

    Overload

    Furthermore in image processing there is a plethora of tools available

    AvgBrain

    • Thousands of algorithms available
    • Thousands of tools
    • Many images require multi-step processing
    • Experimenting is time-consuming

    Lung Imaging

    Look for potentially cancerous nodules in the following lung image, taken from NPR

    • Lung Scan

    Lung Imaging

    Lung Scan

    Why quantitative?

    Human vision system is imperfect

    Which center square seems brighter? plot of chunk unnamed-chunk-1

    Are the intensities constant in the image?

    plot of chunk unnamed-chunk-2

    Overwhelmed

    • Count how many cells are in the bone slice
    • Ignore the ones that are ‘too big’ or shaped ‘strangely’
    • Are there more on the right side or left side?
    • Are the ones on the right or left bigger, top or bottom?

    cells in bone tissue

    More overwhelmed

    • Do it all over again for 96 more samples, this time with 2000 slices instead of just one!

    more samples

    Bring on the pain

    • Now again with 1090 samples!

    even more samples

    It gets better

    • Those metrics were quantitative and could be easily visually extracted from the images
    • What happens if you have softer metrics

    alignment

    • How aligned are these cells?
    • Is the group on the left more or less aligned than the right?
    • errr?

    Reproducibility

    Science demands repeatability! and really wants reproducability

    • Experimental conditions can change rapidly and are difficult to make consistent
    • Animal and human studies are prohibitively time consuming and expensive to reproduce
    • Terabyte datasets cannot be easily passed around many different groups
    • Privacy concerns can also limit sharing and access to data
    • Science is already difficult enough
    • Image processing makes it even more complicated
    • Many image processing tasks are multistep, have many parameters, use a variety of tools, and consume a very long time

    How can we keep track of everything for ourselves and others?

    • We can make the data analysis easy to repeat by an independent 3rd party

    Soup Example

    Easy to follow the list, anyone with the right steps can execute and repeat (if not reproduce) the soup

    Simple Soup

    1. Buy {carrots, peas, tomatoes} at market
    2. then Buy meat at butcher
    3. then Chop carrots into pieces
    4. then Chop potatos into pieces
    5. then Heat water
    6. then Wait until boiling then add chopped vegetables
    7. then Wait 5 minutes and add meat

    More complicated soup

    Here it is harder to follow and you need to carefully keep track of what is being performed

    Steps 1-4

    1. then Mix carrots with potatos \( \rightarrow mix_1 \)
    2. then add egg to \( mix_1 \) and fry for 20 minutes
    3. then Tenderize meat for 20 minutes
    4. then add tomatoes to meat and cook for 10 minutes \( \rightarrow mix_2 \)
    5. then Wait until boiling then add \( mix_1 \)
    6. then Wait 5 minutes and add \( mix_2 \)

    Using flow charts / workflows

    Simple Soup

    plot of chunk unnamed-chunk-3

    Complicated Soup

    plot of chunk unnamed-chunk-4

    Workflows

    Clearly a linear set of instructions is ill-suited for even a fairly easy soup, it is then even more difficult when there are dozens of steps and different pathsways

    plot of chunk unnamed-chunk-5

    Furthermore a clean workflow allows you to better parallelize the task since it is clear which tasks can be performed independently plot of chunk unnamed-chunk-6

    The standard workflow

    Images

    \[ \Downarrow \textrm{Represented in KNIME} \]

    Images

    What is an image?

    A very abstract definition: A pairing between spatial information (position) and some other kind of information (value).

    In most cases this is a 2 dimensional position (x,y coordinates) and a numeric value (intensity)

    x y Intensity
    1 1 44
    2 1 12
    3 1 13
    4 1 48
    5 1 97
    1 2 1

    This can then be rearranged from a table form into an array form and displayed as we are used to seeing images

    plot of chunk unnamed-chunk-8

    2D Intensity Images

    The next step is to apply a color map (also called lookup table, LUT) to the image so it is a bit more exciting

    plot of chunk unnamed-chunk-9

    Which can be arbitrarily defined based on how we would like to visualize the information in the image

    plot of chunk unnamed-chunk-10

    plot of chunk unnamed-chunk-11

    Lookup Tables

    Formally a lookup table is a function which \[ f(\textrm{Intensity}) \rightarrow \textrm{Color} \]

    plot of chunk unnamed-chunk-12

    These transformations can also be non-linear as is the case of the graph below where the mapping between the intensity and the color is a \( \log \) relationship meaning the the difference between the lower values is much clearer than the higher ones

    plot of chunk unnamed-chunk-13

    On a real image the difference is even clearer

    plot of chunk unnamed-chunk-14

    Lookup Table / Publication

    Changing a 'lookup table' can also be called “Normalization”, “Equalization”, “Auto-Enhance” and many other names. It must be used very carefully when displaying scientific results.

    • Standard Image
    • \[ \Downarrow \textrm{After} \]

    • Standard Image

    3D Images

    For a 3D image, the position or spatial component has a 3rd dimension (z if it is a spatial, or t if it is a movie)

    x y z Intensity
    1 1 1 31
    2 1 1 93
    3 1 1 26
    1 2 1 98
    2 2 1 99
    3 2 1 28

    This can then be rearranged from a table form into an array form and displayed as a series of slices

    plot of chunk unnamed-chunk-16

    Our Case Study

    Control of in vitro tissue-engineered bone-like structures using human mesenchymal stem cells and porous silk scaffolds in Biomaterials 2007 by Sandra Hofmann, et. al

    Hypothesis

    tissue engineered bone-like structure resulting from silk fibroin (SF) implants is pre-determined by the scaffolds’ geometry

    Experiment

    SF scaffolds with different pore diameters were prepared and seeded with human mesenchymal stem cells (hMSC). As compared to static seeding, dynamic cell seeding in spinner flasks resulted in equal cell viability and proliferation, and better cell distribution throughout the scaffold as visualized by histology and confocal microscopy

    Natural bone consists of cortical and trabecular morphologies, the latter having variable pore sizes. This study aims at engineering different bone-like structures using scaffolds with small pores (112–224 μm) in diameter on one side and large pores (400–500 μm) on the other, while keeping scaffold porosities constant among groups. We hypothesized that tissue engineered bone-like structure resulting from silk fibroin (SF) implants is pre-determined by the scaffolds’ geometry. To test this hypothesis, SF scaffolds with different pore diameters were prepared and seeded with human mesenchymal stem cells (hMSC). As compared to static seeding, dynamic cell seeding in spinner flasks resulted in equal cell viability and proliferation, and better cell distribution throughout the scaffold as visualized by histology and confocal microscopy, and was, therefore, selected for subsequent differentiation studies. Differentiation of hMSC in osteogenic cell culture medium in spinner flasks for 3 and 5 weeks resulted in increased alkaline phosphatase activity and calcium deposition when compared to control medium. Micro-computed tomography (μCT) detailed the pore structures of the newly formed tissue and suggested that the structure of tissue-engineered bone was controlled by the underlying scaffold geometry.
    

    Our Case Study

    Qualitative

    Qualitative

    Quantitative

    Quantitative

    Traditional Imaging

    Traditional Imaging

    Copyright 2003-2013 J. Konrad in EC520 lecture, reused with permission

    Traditional Imaging: Model

    Traditional Imaging Model

    \[ \left[\left([b(x,y)*s_{ab}(x,y)]\otimes h_{fs}(x,y)\right)*h_{op}(x,y)\right]*h_{det}(x,y)+d_{dark}(x,y) \]

    \( s_{ab} \) is the only information you are really interested in, so it is important to remove or correct for the other components

    For color (non-monochromatic) images the problem becomes even more complicated \[ \int_{0}^{\infty} {\left[\left([b(x,y,\lambda)*s_{ab}(x,y,\lambda)]\otimes h_{fs}(x,y,\lambda)\right)*h_{op}(x,y,\lambda)\right]*h_{det}(x,y,\lambda)}\mathrm{d}\lambda+d_{dark}(x,y) \]

    Imaging Modality: Microscopy

    Since we know the modality, standard microscopy, we can simplify these equations a bit and basically say we have 4 primary sources of problems (warning this is a very strong oversimplification).

    • Fundamentally our output image is the result of this calculation

    \[ \textrm{Output}_{image}= \left(\textbf{Illumination} * \textrm{Object}+\textbf{Dirt}\right) \otimes \textbf{PointSpreadFunction} + \textbf{CameraNoise} \]

    • We will need to try to improve it based on that information.

    Imaging Problem Sources

    • uneven illumination

    Uneven illumination

    • blur from the optical system

     here the measurement is supposed to be from a typical microscope which blurs, flips and otherwise distorts the image but the original representation is still visible

    • dirt on the lens

    Dirt

    • noise from the camera

    Color Noise

    Your Contrasts

    We have refered to an object up until now, but what exactly does that mean? It depends heavily on what is being measured and how. the terms we use for this is contrast.

    • Reflectivity

    - Absorption

    Reflectivity

    The light which is reflected by the object is measured by the camera.

    • Bright\( \rightarrow \) very reflective
    • Dark \( \downarrow \) not reflective

    • Can be quantitative for flat objects

    • Not quantitative for uneven surfaces (shadows)

    Bread

    Absorption

    The light which passes through is measured

    Foam

    • Quantitative for single planes
    • More difficult with multiple planes

    Foam

    Other "Contrasts"

    Modality Impulse..Characteristic Response Detection
    Light Microscopy White Light Electronic interactions Absorption Film, Camera
    Phase Contrast Coherent light Electron Density (Index of Refraction) Phase Shift Phase stepping, holography, Zernike
    Confocal Microscopy Laser Light Electronic Transition in Fluorescence Molecule Absorption and reemission Pinhole in focal plane, scanning detection
    X-Ray Radiography X-Ray light Photo effect and Compton scattering Absorption and scattering Scintillator, microscope, camera
    Ultrasound High frequency sound waves Molecular mobility Reflection and Scattering Transducer
    MRI Radio-frequency EM Unmatched Hydrogen spins Absorption and reemission RF coils to detect
    Atomic Force Microscopy Sharp Point Surface Contact Contact, Repulsion Deflection of a tiny mirror

    Understanding an Image

    It is important to understand the contrasts well since that determines everything else for our further processing and interpretation of the images. Specifically we can focus on quantitative and qualitative contrasts.

    Qualitative (H&E)

    • Shows anatomy / structures
    • but more intense/darker structures are not more significant
    • Automatic thresholding is practical

    Quantitative (von Kossa)

    • Shows functionality
    • Intensity is correlated to a metric
    • For X-ray tomography the intensity is proportional to the absorption coefficient which is related to the calcification density
    • Rescaling / shifting is not allowed
    • Automatic thresholding is not practical

    Image Enhancement & Beyond

    20 minute break

    Unbuilding the equation

    \[ \textrm{Output}_{image}= \left(\textbf{Illumination} * \textrm{Contrast}_{Object}+\textbf{Dirt}\right) \otimes \textbf{PointSpreadFunction} + \textbf{CameraNoise} \]

    How can we go from \( \textrm{Output}_{image} \) to just \( \textrm{Contrast}_{Object} \)?

    Particularly when we have NO idea what

    • \( \textbf{CameraNoise} \)

    And very little idea about

    • \( \textbf{Dirt} \)
    • \( \textbf{Illumination} \) or
    • \( \textbf{PointSpreadFunction} \)

    Image Enhancement

    All about recovering the object or contrast from the image \[ \textrm{Output}_{image}= \left(\textbf{Illumination} * \textrm{Contrast}_{Object}+\textbf{Dirt}\right) \otimes \textbf{PointSpreadFunction} +\\ \textbf{CameraNoise} \]

    • What would the perfect filter be

      • \[ \textit{Filter} \ast \textrm{Output}_{image} \Rightarrow \textrm{Contrast}_{Object} \]
    • What most filters end up doing

      • \[ \textit{Filter} \ast \textrm{Output}_{image} \Rightarrow 90\% \textrm{Contrast}_{Object} +\\ 10\% \left(\textbf{Dirt}+\textbf{CameraNoise}\right) \]

    A Machine Learning Approach to Image Processing

    Segmentation and all the steps leading up to it are really a specialized type of learning problem.

    Returning to the ring image we had before, we start with our knowledge or ground-truth of the ring

    plot of chunk unnamed-chunk-19

    Which we want to identify the from the following image by using image processing tools

    plot of chunk unnamed-chunk-20

    What does identify mean?

    • Classify the pixels in the ring as Foreground
    • Classify the pixels outside of the ring as Background

    How do we quantify this?

    • True Positive values in the ring that are classified as Foreground
    • True Negative values outside the ring that are classified as Background
    • False Positive values outside the ring that are classified as Foreground
    • False Negative values in the ring that are classified as Background

    plot of chunk unnamed-chunk-21

    We can then apply a threshold to the image to determine the number of points in each category

    plot of chunk unnamed-chunk-22

    Ring Threshold Example

    Try a number of different threshold values on the image and compare them to the original classification

    plot of chunk unnamed-chunk-24

    Thresh TP TN FP FN
    0.0 224 0 217 0
    0.2 224 26 191 0
    0.4 214 88 129 10
    0.6 148 174 43 76
    0.8 57 215 2 167
    1.0 0 217 0 224

    plot of chunk unnamed-chunk-26

    Apply Precision and Recall

    • Recall (sensitivity)= \( TP/(TP+FN) \)
    • Precision = \( TP/(TP+FP) \)
    Thresh TP TN FP FN Recall Precision
    0.30 222 54 163 2 99 58
    0.38 217 82 135 7 97 62
    0.46 204 115 102 20 91 67
    0.54 174 151 66 50 78 72
    0.62 137 182 35 87 61 80
    0.70 105 205 12 119 47 90

    ROC Curve

    Reciever Operating Characteristic (first developed for WW2 soldiers detecting objects in battlefields using radar). The ideal is the top-right (identify everything and miss nothing) plot of chunk unnamed-chunk-28

    Comparing Different Filters

    We can then use this ROC curve to compare different filters (or even entire workflows), if the area is higher the approach is better.

    plot of chunk unnamed-chunk-29

    Different approaches can be compared by area under the curve

    plot of chunk unnamed-chunk-30

    True Positive Rate and False Positive Rate

    Another way of showing the ROC curve (more common for machine learning rather than medical diagnosis) is using the True positive rate and False positive rate

    • True Positive Rate (recall)= \( TP/(TP+FN) \)
    • False Positive Rate = \( FP/(FP+TN) \)

    These show very similar information with the major difference being the goal is to be in the upper left-hand corner. Additionally random guesses can be shown as the slope 1 line. Therefore for a system to be useful it must lie above the random line. plot of chunk unnamed-chunk-31

    Practical Example: Calcifications in Breast Tissue

    While finding a ring might be didactic, it is not really a relevant problem and these terms are much more meaningful when applied to medical images where every False Positives and False Negative can be mean life-threatening surgery or the lack thereof. (Data courtesy of Zhentian Wang)

    plot of chunk unnamed-chunk-32

    From these images, an expert labeled the calcifications by hand, so we have ground truth data on where they are:

    plot of chunk unnamed-chunk-33

    Applying a threshold

    We can perform the same analysis on an image like this one, again using a simple threshold to evalulate how accurately we identify the calcifications

    plot of chunk unnamed-chunk-34

    plot of chunk unnamed-chunk-35

    Examining the ROC Curve

    Thresh TP TN FP FN Recall Precision
    7 2056 13461 74483 0 100 3
    23 2030 25806 62138 26 99 3
    34 1950 38744 49200 106 95 4
    42 1726 51676 36268 330 84 5
    48 1435 64161 23783 621 70 6
    54 1043 76363 11581 1013 51 8

    plot of chunk unnamed-chunk-37

    plot of chunk unnamed-chunk-38

    Why do we perform segmentation?

    Cell image

    • In model-based analysis every step we peform, simple or complicated is related to an underlying model of the system we are dealing with
    • Occam's Razor is very important here : The simplest solution is usually the right one

    Qualitative Metrics: What did people used to do?

    • What comes out of our detector / enhancement process Single Cell
    • Identify objects by eye
      • Count, describe qualitatively: “many little cilia on surface”, “long curly flaggelum”, “elongated nuclear structure”
    • Morphometrics
      • Trace the outline of the object (or sub-structures)
      • Can calculate the area by using equal-weight-paper
      • Employing the “cut-and-weigh” method

    Model-based Analysis

    Traditional Imaging

    • Many different imaging modalities ( \( \mu \textrm{CT} \) to MRI to Confocal to Light-field to AFM).
    • Similarities in underlying equations
      • different coefficients, units, and mechanism

    \[ I_{measured}(\vec{x}) = F_{system}(I_{stimulus}(\vec{x}),S_{sample}(\vec{x})) \]

    Absorption Imaging

    • \( F_{system}(a,b) = a*b \)
    • \( I_{stimulus} = \textrm{Beam}_{profile} \)
    • \( S_{system} = \alpha(\vec{x}) \)

    \( \longrightarrow \alpha(\vec{x})=\frac{I_{measured}(\vec{x})}{\textrm{Beam}_{profile}(\vec{x})} \)

    Single Cell

    Nonuniform Beam-Profiles

    In many setups there is un-even illumination caused by incorrectly adjusted equipment and fluctations in power and setups

    Gradient Profile

    Frequently there is a fall-off of the beam away from the center (as is the case of a Gaussian beam which frequently shows up for laser systems). This can make extracting detail away from the center challenging

    Gaussian Beam

    Absorption Imaging (Optical, X-ray, Ultrasound)

    • For absorption/attenuation imaging \( \rightarrow \) Beer-Lambert Law \[ I_{detector} = \underbrace{I_{source}}_{I_{stimulus}}\underbrace{\exp(-\alpha d)}_{S_{sample}} \]

      • Different components have a different \( \alpha \) based on the strength of the interaction between the light and the chemical / nuclear structure of the material \[ I_{sample}(x,y) = I_{source}\exp(-\alpha(x,y) d) \] \[ \alpha = f(N,Z,\sigma,\cdots) \]
    • For segmentation this model is:

      • there are 2 (or more) distinct components that make up the image
      • these components are distinguishable by their values (or vectors, colors, tensors, …)

    Attenuation to Intensity

    Image Histogram

    Where does segmentation get us?

    • We convert a decimal value (or something even more complicated like 3 values for RGB images, a spectrum for hyperspectral imaging, or a vector / tensor in a mechanical stress field)
    • to a single, discrete value (usually true or false, but for images with phases it would be each phase, e.g. bone, air, cellular tissue)

    • 2560 x 2560 x 2160 x 32 bit = 56GB / sample \[ \downarrow \]

    • 2560 x 2560 x 2160 x 1 bit = 1.75GB / sample

    Applying a threshold to an image

    Start out with a simple image of a cross with added noise \[ I(x,y) = f(x,y) \]

    The intensity can be described with a probability density function \[ P_f(x,y) \] Probability density function

    Applying a threshold to an image

    By examining the image and probability distribution function, we can deduce that the underyling model is a whitish phase that makes up the cross and the darkish background

    With Threshold Overlay

    Applying the threshold is a deceptively simple operation

    \[ I(x,y) = \begin{cases} 1, & f(x,y)\geq0.5 \\ 0, & f(x,y)<0.5 \end{cases} \]

    With Threshold Overlay

    Various Thresholds

    Threshold Histograms

    Threshold Images

    Segmenting Cells

    Cell Colony plot of chunk unnamed-chunk-53

    • We can peform the same sort of analysis with this image of cells
    • This time we can derive the model from the basic physics of the system
      • The field is illuminated by white light of nearly uniform brightness
      • Cells absorb light causing darker regions to appear in the image
      • Lighter regions have no cells
      • Darker regions have cells

    Different Threshold Values

    Cell Colony

    plot of chunk unnamed-chunk-55

    Statistics

    Correlation and Causation

    One of the most repeated criticisms of scientific work is that correlation and causation are confused.

    1. Correlation
      • means a statistical relationship
      • very easy to show (single calculation)
    2. Causation
      • implies there is a mechanism between A and B
      • very difficult to show (impossible to prove)

    Controlled and Observational

    There are two broad classes of data and scientific studies.

    Observational

    • Exploring large datasets looking for trends
    • Population is random
    • Not always hypothesis driven
    • Rarely leads to causation

    We examined 100 people and the ones with blue eyes were on average 10cm taller

    In 100 cake samples, we found a 0.9 correlation between cooking time and bubble size

    Controlled

    • Most scientific studies fall into this category
    • Specifics of the groups are controlled
    • Can lead to causation

    We examined 50 mice with gene XYZ off and 50 gene XYZ on and as the foot size increased by 10%

    We increased the temperature and the number of pores in the metal increased by 10%

    Simple Model: Magic / Weighted Coin

    Since most of the experiments in science are usually specific, noisy, and often very complicated and are not usually good teaching examples

    • Magic / Biased Coin
      • You buy a magic coin at a shop
      • How many times do you need to flip it to prove it is not fair?
      • If I flip it 10 times and another person flips it 10 times, is that the same as 20 flips?
      • If I flip it 10 times and then multiple the results by 10 is that the same as 100 flips?
      • If I buy 10 coins and want to know which ones are fair what do I do?

    Simple Model: Magic / Weighted Coin

    1. Each coin represents a stochastic variable \( \mathcal{X} \) and each flip represents an observation \( \mathcal{X}_i \).
    2. The act of performing a coin flip \( \mathcal{F} \) is an observation \( \mathcal{X}_i = \mathcal{F}(\mathcal{X}) \)

    We normally assume

    1. A fair coin has an expected value of \( E(\mathcal{X})=0.5 \rightarrow \) 50% Heads, 50% Tails
    2. An unbiased flip(er) means
      • each flip is independent of the others \[ P(\mathcal{F}_1(\mathcal{X})*\mathcal{F}_2(\mathcal{X}))= P(\mathcal{F}_1(\mathcal{X}))*P(\mathcal{F}_2(\mathcal{X})) \]
      • the expected value of the flip is the same as that of the coin \[ E(\prod_{i=0}^\infty \mathcal{F}_i(\mathcal{X})) = E(\mathcal{X}) \]

    Simple Model to Reality

    Coin Flip

    1. Each flip gives us a small piece of information about the flipper and the coin
    2. More flips provides more information
      • Random / Stochastic variations in coin and flipper cancel out
      • Systematic variations accumulate

    Real Experiment

    1. Each measurement tells us about our sample, out instrument, and our analysis
    2. More measurements provide more information
      • Random / Stochastic variations in sample, instrument, and analysis cancel out
      • Normally the analysis has very little to no stochastic variation
      • Systematic variations accumulate

    Comparing Groups: Tests

    Once the reproducibility has been measured, it is possible to compare groups. The idea is to make a test to assess the likelihood that two groups are the same given the data

    1. List assumptions
    2. Establish a null hypothesis
      • Usually both groups are the same
    3. Calculate the probability of the observations given the truth of the null hypothesis
      • Requires knowledge of probability distribution of the data
      • Modeling can be exceptionally complicated

    Loaded Coin

    We have 1 coin from a magic shop

    • our assumptions are
      • we flip and observe flips of coins accurately and independently
      • the coin is invariant and always has the same expected value
    • our null hypothesis is the coin is unbiased \( E(\mathcal{X})=0.5 \)
    • we can calculate the likelihood of a given observation given the number of flips (p-value)
    Number of Flips Probability of All Heads Given Null Hypothesis (p-value)
    1 50 %
    5 3.1 %
    10 0.1 %

    How good is good enough?

    Comparing Groups: Student's T Distribution

    Since we do not usually know our distribution very well or have enough samples to create a sufficient probability model

    Student T Distribution

    We assume the distribution of our stochastic variable is normal (Gaussian) and the t-distribution provides an estimate for the mean of the underlying distribution based on few observations.

    • We estimate the likelihood of our observed values assuming they are coming from random observations of a normal process

    Student T-Test

    Incorporates this distribution and provides an easy method for assessing the likelihood that the two given set of observations are coming from the same underlying process (null hypothesis)

    • Assume unbiased observations
    • Assume normal distribution

    Multiple Testing Bias

    Back to the magic coin, let's assume we are trying to publish a paper, we heard a p-value of < 0.05 (5%) was good enough. That means if we get 5 heads we are good!

    Number of Flips Probability of All Heads Given Null Hypothesis (p-value)
    1 50 %
    4 6.2 %
    5 3.1 %
    Number of Friends Flipping Probability Someone Flips 5 heads
    1 3.1 %
    10 27.2 %
    20 47 %
    40 71.9 %
    80 92.1 %

    Clearly this is not the case, otherwise we could keep flipping coins or ask all of our friends to flip until we got 5 heads and publish

    The p-value is only meaningful when the experiment matches what we did.

    • We didn't say the chance of getting 5 heads ever was < 5%
    • We said if we have exactly 5 observations and all of them are heads the likelihood that a fair coin produced that result is <5%

    Many methods to correct, most just involve scaling \( p \). The likelihood of a sequence of 5 heads in a row if you perform 10 flips is 5x higher.

    Multiple Testing Bias: Experiments

    This is very bad news for us. We have the ability to quantify all sorts of interesting metrics

    • cell distance to other cells
    • cell oblateness
    • cell distribution oblateness

    So lets throw them all into a magical statistics algorithm and push the publish button

    With our p value of less than 0.05 and a study with 10 samples in each group, how does increasing the number of variables affect our result

    plot of chunk unnamed-chunk-59

    Multiple Testing Bias: Correction

    Using the simple correction factor (number of tests performed), we can make the significant findings constant again plot of chunk unnamed-chunk-60

    So no harm done there we just add this correction factor right? Well what if we have exactly one variable with shift of 1.0 standard deviations from the other.

    plot of chunk unnamed-chunk-61

    Multiple Testing Bias: Sample Size

    plot of chunk unnamed-chunk-62

    Advanced Topics in Image Processing

    • Video Analysis
    • Background Subtraction

    Video Analysis

    Video analysis is slightly different than standard image processing because it includes an additional dimension that needs to be taken care of. Additionally there are issues involved with file-sizes and processing all of the data

    VLC - Video Lan Client

    This tool is one of the most flexible that can plan and transform a large variety of video content.

    Example

    /Applications/VLC.app/Contents/MacOS/VLC ../DroneVideo.mov --video-filter=scene --vout=dummy --no-repeat --scene-ratio=6 --rate=0.01 --scene-prefix="drone"  --scene-path=./ vlc://quit
    

    \( PathToVLC PathToVideo \)

    • -video-filter=scene --vout=dummy --no-repeat fixed standard settings (no repeat prevents looping)
    • -scene-ratio is the number of frames it skips (30 is every 30 frames \( \rightarrow \) once every second)
    • --rate is the playback rate (1 is 100%, 0.01 is 1%, the more frames you capture the slower this needs to be to make sure you keep up)

    A list of images or a movie

    There are two representations for videos

    • a list of images
    • a single 3- or 4-D image \( (X,Y,Z,t) \)

    List of Images

    • More processing and memory efficient
    • More options for looking through

    3-/4-D Image

    • Large sizes in memory
    • Possibility to combine neighbor images

    In-between

    • A number of smaller 3-/4D Images
    • Chunked together (eg. 10 frames each)

    Merge

    The Merge node can be used to combine the data together from individual images into a time-sequence.