See more related video:Principal Component Analysis
Principal Component Analysis (PCA) is used to explain the variance-covariance structure of a set of variables through linear combinations. It is often used as a dimensionality-reduction technique.
There are two primary reasons for using PCA:
PCA is typically used as an intermediate step in data analysis when the number of input variables is otherwise too large for useful analysis.
PCA should be used mainly for variables which are strongly correlated. If the relationship is weak between variables, PCA does not work well to reduce data. Refer to the correlation matrix to determine. In general, if most of the correlation coefficients are smaller than 0.3, PCA will not help.
There is always the question of how many components to retain. Please refer to the scree plot and the Eigenvalues of the Correlation Matrix for more information.
The correlation matrix is simply the covariance matrix standardized by setting all variances equal to one. When scales of variables are similar, the covariance matrix is always preferred, as the correlation matrix will lose information when standardizing the variance. The correlation matrix is recommended when variables are measured in different scales.
The use of pairwise or listwise exclusion of missing data depends on the nature of the missing values. If there are only a few missing values for a single variable, it often makes sense to delete an entire row of data. This is listwise exclusion. If there are missing values for two and more variables, it is typically best to employ pairwise exclusion.
Topics covered in this section: