Generalizing Linear Discriminant Analysis
Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller -Maintain the class separation Reason -Reduce computational costs -Minimize overfitting
Linear Discriminant Analysis Want to reduce dimensionality while preserving ability to discriminate Figures from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equation from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equations from [1]
Linear Discriminant Analysis Figure from [1]
Linear Discriminant Analysis Fisher’s solution.
Linear Discriminant Analysis Fisher’s solution… Scatter: Equation from [1]
Linear Discriminant Analysis Fisher’s solution… Scatter: Maximize: Equations from [1]
Linear Discriminant Analysis Fisher’s solution… Figure from [1]
Linear Discriminant Analysis How to get optimum w*?
Linear Discriminant Analysis How to get optimum w*? ◦Must express J(w) as a function of w. Equation from [1]
Linear Discriminant Analysis How to get optimum w*8… Equation from [1]
Linear Discriminant Analysis How to get optimum w*… Equations modified from [1]
Linear Discriminant Analysis How to get optimum w*… Equation from [1]
Linear Discriminant Analysis How to get optimum w*… Equation from [1]
Linear Discriminant Analysis How to get optimum w*… Equations from [1]
Linear Discriminant Analysis How to generalize for >2 classes: -Instead of a single projection, we calculate a matrix of projections.
Linear Discriminant Analysis How to generalize for >2 classes: -Instead of a single projection, we calculate a matrix of projections. -Within-class scatter becomes: -Between-class scatter becomes: Equations from [1]
Linear Discriminant Analysis How to generalize for >2 classes… Here, W is a projection matrix. Equation from [1]
Linear Discriminant Analysis Limitations of LDA: -Parametric method -Produces at most (C-1) projections Benefits of LDA: -Linear Decision Boundaries ◦Human interpretation ◦Implementation -Good classification results
Flexible Discriminant Analysis
-Turns the LDA problem into a linear regression problem.
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish)
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) ◦Linear regression can be generalized into more flexible, nonparametric forms of regression. ◦ (Parametric – mean, variance…)
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) ◦Linear regression can be generalized into more flexible, nonparametric forms of regression. ◦ (Parametric – mean, variance…) ◦Expands the set of predictors via basis expansions
Flexible Discriminant Analysis Figure from [2]
Penalized Discriminant Analysis
-Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. ◦Directly curbing ‘overfitting’ problem
Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. ◦Directly curbing ‘overfitting’ problem Positively correlated predictors lead to noisy, negatively correlated coefficient estimates, and this noise results in unwanted sampling variance. ◦Example: images
Penalized Discriminant Analysis Images from [2]
Mixture Discriminant Analysis
-Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian:
Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian: -Model each class as a mixture of two or more Gaussian components. -All components sharing the same covariance matrix
Mixture Discriminant Analysis Image from [2]
Sources 1.Gutierrez-Osuna, Ricardo– “CSCE 666 Pattern Analysis – Lecture 10” Hastie, Trever, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 3.Raschka, Sebastian - “Linear Discriminant Analysis bit by bit”
END.