X.2 Linear Discriminant Analysis: 2-Class Supervised versus unsupervised approaches for dimension reduction. Initial attempt based on axis between the means of a 2-class data set. Adaptation to account for in-class variance: the Fisher discriminant. Algorithm for identifying the optimal axis corresponding to this discriminant. 6.7 : 1/9
Limitations of PCA Often, we wish to discriminate between classes (e.g., healthy vs. diseased, molecular identification based on compiled Raman or MS data, determination of origin of complex samples, chemical fingerprinting, etc.). The outcome of the measurement is therefore: i) assignment of an unknown into a class, ii) and assessment of the confidence of that assignment. PCA is “unsupervised”, in that the algorithm does not utilize class information in dimension reduction. The entire data set, including both samples of known and unknown origin, are treated equally in the analysis. The directions of maximum variance in the data do not necessarily correspond to the directions of maximum class discrimination / resolution. 6.7 : 2/9
LDA: 2-Class Let’s revisit the earlier example, in which just two discrete wavelengths out of UV-Vis spectra are plotted to allow visualization in 2 D. PC1 Class a Class b Dimension reduction by PCA does not provide clear discrimination in this case. Abs(2) Abs(1) 6.7 : 2/9
LDA: 2-Class (2) Using class information, the simplest first choice for a new coordinate to discriminate between a and b is the axis connecting the means of each class. Class a Class b µa - µb Abs(2) Abs(1) 6.7 : 2/9
LDA: 2-Class (3) The mean vector (spectrum) of each class is calculated independently, in which na and nb correspond to the number of measured spectra obtained for classes a and b, respectively. For later stages, it is useful to define a single scalar value that can be used for optimization. If we let “w” be the new test axis we are considering for discrimination, the scalar property J(w) can be calculated by the projections of each mean vector onto w. µa and µb are scalars. 6.7 : 2/9
LDA: 2-Class (5) However, a selection based on the separation of the means neglects to include the influence of the variance about the mean. In the example below, the two classes are not resolved using the difference in means. Class a Class b µa - µb Abs(2) Abs(1) 6.7 : 2/9
LDA: 2-Class (4) Weighting by the in-class variance along particular w selections shifts the optimal resolution/discrimination away from that expected just by the difference in means alone. Class a Class b J µa - µb Abs(2) w Abs(1) 6.7 : 2/9
LDA: 2-Class (5) How about a separation direction based on the definition of RESOLUTION? 6.7 : 2/9
LDA: 2-Class (6) The Fisher linear discriminant includes a weighting by an equivalent of the within-class variance along the test axis w. Improved discrimination corresponds to minimizing the in-class variance along w, such that the within-class variance term should appear in the denominator. yi is the scalar projection of the ith data vector x onto the w axis. s2a is the variance about the projected mean in class a along the w axis. s2a can be rewritten in matrix notation to isolate the parts that are dependent and independent of w. 6.7 : 2/9
LDA: 2-Class (7) We can perform additional manipulations to express the numerator in terms of w-dependent and w-independent contributions. …such that… 6.7 : 2/9
LDA: 2-Class In summary: -SB is the between-class scatter, with maximization of J corresponding to maximizing the separation between the means of classes. -Sw is the within-class scatter, which increases the chances for better separation between classes when minimized. 6.7 : 2/9
LDA: 2-Class Two ways to optimize (derivative = 0 , or selection of maximum of eigenvalues): Because J is simply a scalar, we can move it around within the equation. 6.7 : 2/9
LDA: 2-Class (9) The optimal axis w* obtained by setting the derivative of J equal to zero. Equivalently, this result can be cast as the identifying the eigenvector that maximizes the value of the corresponding eigenvalue J. The eigenvector(s) of the matrix Sw-1SB correspond to the optimal direction(s) of w, with those corresponding to the greater values of J providing greater discrimination/resolution. For the two-class system, solving the eigenvector, eigenvalue problem leads to a concise analytical expression for the optimal direction based on the Fisher discriminant. 6.7 : 2/9
LDA: 2-Class Example W 6.7 : 2/9
LDA: 2-Class Example (2) Now, we can quantify our ability to resolve the two classes and assess the reliability of assignment with scalar values by using the mean and variance projected along that new axis. Note – maximizing J is almost mathematically equivalent to maximizing the resolution R. J is the nonzero eigenvalue corresponding to resolution. 6.7 : 2/9