18.104.22.168 Algorithms (Discriminant Analysis)
Discriminant Analysis is used to allocate observations to groups using information from observations whose group memberships are known (i.e., training data).
Let be the training data with n observations and p variables on groups. is a row vector of the sample mean for the jth group, is the number of observations for the jth group. The within-group covariance matrix for group j can be expressed as:
The pooled within-group covariance matrix is:
Note that missing values are excluded in a listwise way in the analysis (i.e., an observation containing one or more missing values will be excluded in the analysis).
Test for Equality of Within-group Covariance Matrices
If training data are assumed to follow a multivariate normal distribution, the following likelihood-ratio test statistic G can be used to test for equality of within-group covariance matrices.
For large n, G is approximately distributed as a variable with degrees of freedom.
Canonical Discriminant Analysis
Canonical discriminant analysis is used to find the linear combination of the p variables that maximizes the ratio of between-group to within-group variation. The formed canonical variates can then be used to discriminate between groups.
Let the training data with total means subtracted be X, and its rank be k, then the orthogonal matrix Q can be calculated from QR decomposition (for full column rank) or SVD from X. And is the first k columns of Q. Let be an n by orthogonal matrix to define groups. Then let the k by matrix V be
The SVD of V is:
Non-zero diagonal elements of the matrix are the l canonical correlations associated with the l canonical variates, i=1,2,...,l and .
Eigenvalues of the within-group sums of squares matrix are:
- Testing for a significant dimensionality greater than i,
- A statistic with degrees of freedom is used:
- Unstandardized Canonical Coefficients
- Loading matrix B for canonical variates can be calculated from . It is scaled so that the canonical variates have unit pooled within-group variance. i.e.
- Note that eigenvector's sign in the SVD result is not unique, which means each column in B can be multiplied by -1. Origin normalizes its sign by forcing the sum of each column in to be positive, where R is the Cholesky factorization of S.
- Constant items can be calculated as follows.
- where is a row vector of means for variables.
- Standardized Canonical Coefficients
- where is a diagonal matrix, whose diagonal elements are the square roots of the diagonal elements of pooled within group covariance matrix S.
- Canonical Structure Matrix
- where and are row vectors of the canonical group mean and group mean for the jth group, respectively.
- where is the canonical score for the ith observation .
- Note that here the ith observation can be training data and test data.
Mahalanobis distance is a measure of the distance of an observation from a group. It has two forms. For an observation from the jth group, the distance is:
- Using within-group covariance matrix
- Using pooled within-group covariance matrix
The prior probabilities reflect the user’s view as to the likelihood of the observations coming from the different groups. Origin supports two kinds of prior probabilities:
- Proportional to Group Size
- where is the number of observations in the jth group of the training data.
The p variables of observations are assumed to follow a multivariate Normal distribution with mean and covariance matrix if the observation comes from the jth group.
If is the probability of observing the observation from group j, then the posterior probability of belonging to group j is:
The parameters and are estimated from training data . And the observation is allocated to the group with the highest posterior probability. Origin provides two methods to calculate posterior probability.
- Linear Discriminant Function
- Within-group covariance matrices are assumed equal.
- where is the the Mahalanobis distance of the ith observation from the jth group using pooled with-group covariance matrix, and is a constant.
- Quadratic Discriminant Function
- Within-group covariance matrices are not assumed equal.
- where is the the Mahalanobis distance of the ith observation from the jth group using with-group covariance matrices, and is a constant.
are standardized as follows and will be determined from the standardization.
Atypicality Index indicates the probability of obtaining an observation more typical of group
j than the ith observation. If it is close to 1 for all groups, it implies that the observation may come from a grouping not represented in the training data. Atypicality Index is calculated as:
where is the lower tail probability from a beta distribution, for equal within-group covariance matrices,
for unequal within-group covariance matrices,
Linear Discriminant Function Coefficients
Linear discriminant function (also known as Fisher's linear discriminant functions) can be calculated as:
- Linear Coefficient for the jth Group.
- where is a column vector with size of p.
- Constant Coefficient for the jth Group.
Classify Training Data
Each observation in training data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability). Squared Mahalanobis distance from each group and Atypicality Index of each group can also be calculated.
Classification result for training data is summarized by comparing given group membership and predicted group membership. Misclassified error rate is calculated by the percentage of misclassified observations weighted by the prior probabilities of groups. i.e.
where is the percentage of misclassified observations for the jth group.
Cross Validation for Training Data
It follows the same procedure as Classify Training Data except that to predict an observation's membership in training data, the observation is excluded during calculating within-group covariance matrices or pooled within-group covariance matrix.
Classify Test Data
Within-group covariance matrices and pooled within-group covariance matrix are calculated from training data. Each observation in test data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability).