5.4.1 Linear dimensionality reduction

Eigenvectors

Mathematically, the principal components are the eigenvectors of the covariance matrix of the original dataset.

The principal components (eigenvectors) correspond to the direction (in the original n-dimensional space) with the greatest variance in the data.

Covariance

Covariance
- Calculated pairwise between each variable,
- to summarise how much the change in X, predicts the change in Y
- is the mathematical way to find the eigenvectors.
The covariance matrix is just a matrix of pairwise covariances (think of the correlation matrix from the heatmap exercise).

Covariance formula

Square expanded in variance:

var

… is very similar to covariance:

covar

Eigenvalues

Each eigenvector has a corresponding eigenvalue.
An eigenvalue is a number that indicates how much variance there is in the data along that eigenvector (or principal component).
- A principal component with a small eigenvalue does not do a good job…
- If a principal component had an eigenvalue of zero, then it would mean that it explained none of the variance in the data.
- For dimensionality reduction: discard any principal components withnear-zero eigenvalues.

You can use the “PCA” idea for classificaiton

Linear Discriminant Analysis (LDA) aims to find the directions that maximize the separation (or discrimination) between different classes, which can be useful in pattern classification problem (PCA “ignores” class labels).

In other words,

PCA projects the entire dataset onto a different feature (sub)space explain the highest variance, while
LDA tries to determine a suitable feature (sub)space in order to distinguish between patterns that belong to different classes.

Comparison of PCA and LDA

Independent Component Analysis (ICA) to decompose signals that map to the same dimension

ICA

Does not components are not orthogonal.
Used for “demixing”: Find the 2 original signals that resulted in the observed variable.
In layman terms PCA helps to compress data and ICA helps to separate data.

Multidimensional Scaling is a broader term with overlap to PCA

Multidimensional Scaling (MDS) is a dimension-reduction technique designed to project high dimensional data down to 2 dimensions while preserving relative distances between observations.
It is most useful when the observations are significant and relatively small (basically to the limits of scatter plots).

MDS vs PCA

You can define the distance metric however you want, and MDS with euclidean distance is equivalent to extracting two principal components from a PCA analysis. In general, other distance metrics can be used.

Edited from: umich.edu/~jerrick

5.4.1 Linear dimensionality reduction

High dimensional data

You can think of your data as vectors

High dimensional data

Dimensionality reduction

Linear and non-linear dimensionality reduction methods

Introduction to Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

When should you use PCA?

What does PCA do?

Let’s PCA!

Original data

Find the axis with the largest variation!

Calculate the 2nd principal component?

The second principal component

PCA projection of the data

Steps of PCA

OK, but…

Let’s see what happens if we reduce the dimensions 2D → 1D

1D data after PCA

How many principal components do you need?

Normalise your data before PCA!

In mathematical terms…