2D Vector
3D Vector
When working with large datasets, you need to reduce dimensionality to be able to analyse and interpret.
You “project” the data onto a lower-dimensional “sub-space”.
Linear method assume the independent dimensions are a linear combination of the original dimensions. - PCA
Non-linear methods: - tSNE (more resource hungry)
Principal Component Analysis (PCA) is a dimensionality-reduction technique that is often used to transform a high-dimensional dataset into a smaller-dimensional subspace prior to running a machine learning algorithm on the data.
Wikipedia:
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
Reducing the dimensions is useful:
Let’s say your original dataset has two variables, x1 and x2:
… by rotating the axes.
For our two-dimensional dataset, there can be only two principal components, therefore 2nd one is given:
The second principal component must be orthogonal to the first principal component. In otherwords, it does its best to capture the variance in the data that is not captured by the first principal component.
… is just a linear transformation (rotation) of the original dimensions.
We can calculate the variance in PC1 direction, if we collapse the datapoint on to the axis.
Mathematically, the principal components are the eigenvectors of the covariance matrix of the original dataset.
The principal components (eigenvectors) correspond to the direction (in the original n-dimensional space) with the greatest variance in the data.
Square expanded in variance:
… is very similar to covariance:
Linear Discriminant Analysis (LDA) aims to find the directions that maximize the separation (or discrimination) between different classes, which can be useful in pattern classification problem (PCA “ignores” class labels).
In other words,
PCA projects the entire dataset onto a different feature (sub)space explain the highest variance, while
LDA tries to determine a suitable feature (sub)space in order to distinguish between patterns that belong to different classes.
You can define the distance metric however you want, and MDS with euclidean distance is equivalent to extracting two principal components from a PCA analysis. In general, other distance metrics can be used.