17.7.4.3 Algorithms (Discriminant Analysis)DiscAnalysisAlgorithm
Discriminant Analysis is used to allocate observations to groups using information from observations whose group memberships are known (i.e., training data).
Let be the training data with n observations and p variables on groups. is a row vector of the sample mean for the jth group, is the number of observations for the jth group. The withingroup covariance matrix for group j can be expressed as:
The pooled withingroup covariance matrix is:
Note that missing values are excluded in a listwise way in the analysis (i.e., an observation containing one or more missing values will be excluded in the analysis).
Test for Equality of Withingroup Covariance Matrices
If training data are assumed to follow a multivariate normal distribution, the following likelihoodratio test statistic G can be used to test for equality of withingroup covariance matrices.
where
For large n, G is approximately distributed as a variable with degrees of freedom.
Canonical Discriminant Analysis
Canonical discriminant analysis is used to find the linear combination of the p variables that maximizes the ratio of betweengroup to withingroup variation. The formed canonical variates can then be used to discriminate between groups.
Let the training data with total means subtracted be X, and its rank be k, then the orthogonal matrix Q can be calculated from QR decomposition (for full column rank) or SVD from X. And is the first k columns of Q. Let be an n by orthogonal matrix to define groups. Then let the k by matrix V be
The SVD of V is:
Nonzero diagonal elements of the matrix are the l canonical correlations associated with the l canonical variates, i=1,2,...,l and .
Eigenvalues of the withingroup sums of squares matrix are:
 Testing for a significant dimensionality greater than i,
 A statistic with degrees of freedom is used:
 Unstandardized Canonical Coefficients
 Loading matrix B for canonical variates can be calculated from . It is scaled so that the canonical variates have unit pooled withingroup variance. i.e.
 Note that eigenvector's sign in the SVD result is not unique, which means each column in B can be multiplied by 1. Origin normalizes its sign by forcing the sum of each column in to be positive, where R is the Cholesky factorization of S.
 Constant items can be calculated as follows.
 where is a row vector of means for variables.
 Standardized Canonical Coefficients
 where is a diagonal matrix, whose diagonal elements are the square roots of the diagonal elements of pooled within group covariance matrix S.
 Canonical Structure Matrix
 where and are row vectors of the canonical group mean and group mean for the jth group, respectively.
 where is the canonical score for the ith observation .
 Note that here the ith observation can be training data and test data.
Mahalanobis Distance
Mahalanobis distance is a measure of the distance of an observation from a group. It has two forms. For an observation from the jth group, the distance is:
 Using withingroup covariance matrix
 Using pooled withingroup covariance matrix
Classify
Prior Probabilities
The prior probabilities reflect the userâ€™s view as to the likelihood of the observations coming from the different groups. Origin supports two kinds of prior probabilities:
 Proportional to Group Size
 where is the number of observations in the jth group of the training data.
Posterior Probability
The p variables of observations are assumed to follow a multivariate Normal distribution with mean and covariance matrix if the observation comes from the jth group.
If is the probability of observing the observation from group j, then the posterior probability of belonging to group j is:
The parameters and are estimated from training data . And the observation is allocated to the group with the highest posterior probability. Origin provides two methods to calculate posterior probability.
 Linear Discriminant Function
 Withingroup covariance matrices are assumed equal.
 where is the the Mahalanobis distance of the ith observation from the jth group using pooled withgroup covariance matrix, and is a constant.
 Quadratic Discriminant Function
 Withingroup covariance matrices are not assumed equal.
 where is the the Mahalanobis distance of the ith observation from the jth group using withgroup covariance matrices, and is a constant.
are standardized as follows and will be determined from the standardization.
Atypicality Index
Atypicality Index indicates the probability of obtaining an observation more typical of group
j than the ith observation. If it is close to 1 for all groups, it implies that the observation may come from a grouping not represented in the training data. Atypicality Index is calculated as:
where is the lower tail probability from a beta distribution, for equal withingroup covariance matrices,
for unequal withingroup covariance matrices,
Linear Discriminant Function Coefficients
Linear discriminant function (also known as Fisher's linear discriminant functions) can be calculated as:
 Linear Coefficient for the jth Group.
 where is a column vector with size of p.
 Constant Coefficient for the jth Group.
Classify Training Data
Each observation in training data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability). Squared Mahalanobis distance from each group and Atypicality Index of each group can also be calculated.
Classification result for training data is summarized by comparing given group membership and predicted group membership. Misclassified error rate is calculated by the percentage of misclassified observations weighted by the prior probabilities of groups. i.e.
where is the percentage of misclassified observations for the jth group.
Cross Validation for Training Data
It follows the same procedure as Classify Training Data except that to predict an observation's membership in training data, the observation is excluded during calculating withingroup covariance matrices or pooled withingroup covariance matrix.
Classify Test Data
Withingroup covariance matrices and pooled withingroup covariance matrix are calculated from training data. Each observation in test data can be classified by posterior probabilities (i.e., it is allocated to the group with the highest posterior probability).
