principle component analysis

Backlinks
- statistics

PCA is a data analysis technique that:

input

take a set of data points of arbitary many features

output

a series of principle components(unit vectors)

meaning

the first few principle components would account for most of the variations of the data¹

how

center the data points aronud origin with mean of each feature ([1 50] [2 10] [3 0] -> every body apply [-2 -20])
find a line with the maximum sum squared projected distance \(\sum_{p \in P}{d_p^2}\)

an unit vector along it would be the first principle component
find another line that is perpendicular to the previous line that have the maximum sum squared projected distance, the unit vector along it would be the second p.c.
find a third line that is perpendicular to the previous two, that have the maximum sum squared projected distance, the unit vector alont it would be the third p.c.
and so on and so on, until either each feature or each data point have their own dimension(\(\#p.c. \le min(\#features,\#datapoints\)))

Backlinks

statistics

principle component analysis

Footnotes:

if there indeed is imbalanced distribution, like with a 4 features, the points are actually all on a 2-d plane, then the 3rd and 4th principle component would account for no variation at all, while the first 2 acounts for 100%.

principle component analysis

Table of Contents

Backlinks

statistics

Footnotes: