Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.

Slides:



Advertisements
Similar presentations
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Advertisements

Component Analysis (Review)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Pattern Recognition and Machine Learning
Dimension reduction (1)
Model assessment and cross-validation - overview
Linear Discriminant Analysis
Face Recognition and Biometric Systems
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Pattern Recognition and Machine Learning
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Principal Component Analysis
Fisher’s Linear Discriminant  Find a direction and project all data points onto that direction such that:  The points in the same class are as close.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Eigenfaces As we discussed last time, we can reduce the computation by dimension reduction using PCA –Suppose we have a set of N images and there are c.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Data mining and statistical learning - lecture 13 Separating hyperplane.
Ch. 10: Linear Discriminant Analysis (LDA) based on slides from
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Presented By Wanchen Lu 2/25/2013
Outline Separating Hyperplanes – Separable Case
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Basics of Neural Networks Neural Network Topologies.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Lecture 4 Linear machine
Section 2.6 – Draw Scatter Plots and Best Fitting Lines A scatterplot is a graph of a set of data pairs (x, y). If y tends to increase as x increases,
Discriminant Analysis
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
LDA (Linear Discriminant Analysis) ShaLi. Limitation of PCA The direction of maximum variance is not always good for classification.
Feature Extraction 主講人:虞台文.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Component Analysis (PCA)
Machine Learning Fisher’s Criteria & Linear Discriminant Analysis
LECTURE 10: DISCRIMINANT ANALYSIS
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Machine Learning Dimensionality Reduction
Classification Discriminant Analysis
Classification Discriminant Analysis
Computational Intelligence: Methods and Applications
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Presentation transcript:

Generalizing Linear Discriminant Analysis

Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller -Maintain the class separation Reason -Reduce computational costs -Minimize overfitting

Linear Discriminant Analysis Want to reduce dimensionality while preserving ability to discriminate Figures from [1]

Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equation from [1]

Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equations from [1]

Linear Discriminant Analysis Figure from [1]

Linear Discriminant Analysis Fisher’s solution.

Linear Discriminant Analysis Fisher’s solution… Scatter: Equation from [1]

Linear Discriminant Analysis Fisher’s solution… Scatter: Maximize: Equations from [1]

Linear Discriminant Analysis Fisher’s solution… Figure from [1]

Linear Discriminant Analysis How to get optimum w*?

Linear Discriminant Analysis How to get optimum w*? ◦Must express J(w) as a function of w. Equation from [1]

Linear Discriminant Analysis How to get optimum w*8… Equation from [1]

Linear Discriminant Analysis How to get optimum w*… Equations modified from [1]

Linear Discriminant Analysis How to get optimum w*… Equation from [1]

Linear Discriminant Analysis How to get optimum w*… Equation from [1]

Linear Discriminant Analysis How to get optimum w*… Equations from [1]

Linear Discriminant Analysis How to generalize for >2 classes: -Instead of a single projection, we calculate a matrix of projections.

Linear Discriminant Analysis How to generalize for >2 classes: -Instead of a single projection, we calculate a matrix of projections. -Within-class scatter becomes: -Between-class scatter becomes: Equations from [1]

Linear Discriminant Analysis How to generalize for >2 classes… Here, W is a projection matrix. Equation from [1]

Linear Discriminant Analysis Limitations of LDA: -Parametric method -Produces at most (C-1) projections Benefits of LDA: -Linear Decision Boundaries ◦Human interpretation ◦Implementation -Good classification results

Flexible Discriminant Analysis

-Turns the LDA problem into a linear regression problem.

Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish)

Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) ◦Linear regression can be generalized into more flexible, nonparametric forms of regression. ◦ (Parametric – mean, variance…)

Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish) ◦Linear regression can be generalized into more flexible, nonparametric forms of regression. ◦ (Parametric – mean, variance…) ◦Expands the set of predictors via basis expansions

Flexible Discriminant Analysis Figure from [2]

Penalized Discriminant Analysis

-Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. ◦Directly curbing ‘overfitting’ problem

Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. ◦Directly curbing ‘overfitting’ problem Positively correlated predictors lead to noisy, negatively correlated coefficient estimates, and this noise results in unwanted sampling variance. ◦Example: images

Penalized Discriminant Analysis Images from [2]

Mixture Discriminant Analysis

-Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian:

Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian: -Model each class as a mixture of two or more Gaussian components. -All components sharing the same covariance matrix

Mixture Discriminant Analysis Image from [2]

Sources 1.Gutierrez-Osuna, Ricardo– “CSCE 666 Pattern Analysis – Lecture 10” Hastie, Trever, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 3.Raschka, Sebastian - “Linear Discriminant Analysis bit by bit”

END.