Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Lecture 3. Linear Models for Classification
Component Analysis (Review)
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Lecture 8,9 – Linear Methods for Classification Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Dimension reduction (1)
Chapter 4: Linear Models for Classification
Classification and risk prediction
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Linear Discriminant Analysis (Part II) Lucian, Joy, Jie.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Linear Methods for Classification
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Outline Separating Hyperplanes – Separable Case
Principles of Pattern Recognition
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Non-Bayes classifiers. Linear discriminants, neural networks.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Lecture 4 Linear machine
Linear Models for Classification
Discriminant Analysis
Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
Linear Classifier Team teaching.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Classification Discriminant Analysis
Chapter 4: Linear Classifiers
Classification Discriminant Analysis
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
REMOTE SENSING Multispectral Image Classification
دسته بندی با استفاده از مدل های خطی
Pattern Recognition and Machine Learning
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Linear Classifiers2 Outline Linear Regression Linear and Quadratic Discriminant Functions Reduced Rank Linear Discriminant Analysis Logistic Regression Separating Hyperplanes

Linear Classifiers3 Linear Regression YThe response classes is coded by a indicator variable. A K classes depend on K indicator variables, as Y(k), k=1,2,…, K, each indicate a class. And N train instances of the indicator vector could form a indicator response matrix Y. To the Data set X(k), there is map: X X : X(k) Y(k) According to the linear regression model: X f(x)=(1,X T )B ^ Y XXXXY Y ^ =X(X T X) -1 X T Y

Linear Classifiers4 Linear Regression Given X, the classification should be: In another form, a target t k is constructed for each class, the t k presents the k th column of a K identity matrix, according to the a sum-of-squared-norm criterion : and the classification is:

Linear Classifiers5 The data come from three classes in IR 2 and easily separated by linear decision boundaries. The left plot shows the boundaries found by linear regression of indicator response variables. The middle class is completely masked. Problems of the linear regression The rug plot at bottom indicates the positions and the class membership of each observations. The 3 curves are the fitted regressions to the 3-class indicator variables.

Linear Classifiers6 Problems of the linear regression The left plot shows the boundaries found by linear discriminant analysis. And the right shows the fitted regressions to the 3-class indicator variables.

Linear Classifiers7 Linear Discriminant Analysis According to the Bayes optimal classification mentioned in chapter 2, the posteriors is needed. post probability : assume: ——condition-density of X in class G=k. ——prior probability of class k, with Bayes theorem give us the discriminant:

Linear Classifiers8 Linear Discriminant Analysis Multivariate Gaussian density: Comparing two classes k and l, assume

Linear Classifiers9 Linear Discriminant Analysis The linear log-odds function above implies that the class k and l is linear in x; in p dimension a hyperplane. Linear Discriminant Function: So we estimate

Linear Classifiers10 Parameter Estimation

Linear Classifiers11 LDA Rule

Linear Classifiers12 Linear Discriminant Analysis Three Gaussian distribution with the same covariance and different means. The Bayes boundaries are shown on the left (solid lines). On the right is the fitted LDA boundaries on a sample of 30 drawn from each Gaussian distribution.

Linear Classifiers13 Quadratic Discriminant Analysis When the covariance are different This is the Quadratic Discriminant Function The decision boundary is described by a quadratic equation

Linear Classifiers14 LDA & QDA X 1, X 2 X 1, X 2, X 12, X 1 2, X 2 2Boundaries on a 3-classes problem found by both the linear discriminant analysis in the original 2-dimensional space X 1, X 2 (the left) and in a 5-dimensional space X 1, X 2, X 12, X 1 2, X 2 2 (the right).

Linear Classifiers15 LDA & QDA Boundaries on the 3-classes problem found by LDA in the 5-dimensional space above (the left) and by Quadratic Discriminant Analysis (the right).

Linear Classifiers16 Regularized Discriminant Ana. Shrink the separate covariances of QDA toward a common covariance as in LDA. Regularized QDA was allowed to be shrunk toward the scalar covariance. Regularized LDA Together :

Linear Classifiers17 Regularized Discriminant Ana. Could use In recent micro expression work, we can use where in a “SHRUNKEN CENTROID”

Linear Classifiers18 Test and training errors for the vowel data, using regularized discriminant analysis with a series of values of. The optimum for the test data occurs around, close to quadratic discriminant analysis

Linear Classifiers19 Computations for LDA The eigen-decomposition for each where is orthonormal, and is a diagonal matrix of positive eigenvalues. So the ingredients for are:

Linear Classifiers20 Reduced Rank LDA Let LDA: Closest centroid in sphered space( apart from ) Can project data onto K-1 dim subspace spanned by, and lose nothing! Can project even lower dim using principal components of, k=1, …, K.

Linear Classifiers21 Reduced Rank LDA Compute matrix M of centroids Compute Compute B*, cov matrix of M*, and is l - th discriminant variable ( canonical variable )

Linear Classifiers22 A two-dimensional plot of the vowel training data. There are eleven classes with,and this is the best view in terms of a LDA model. The heavy circles are the projected mean vectors for each class.

Linear Classifiers23 Projections onto different pairs of canonical varieties

Linear Classifiers24 Reduced Rank LDA Although the line joining the centroids defines the direction of greatest centroid spread, the projected data overlap because of the covariance ( left panel). The discriminant direction minimizes this overlap for Gaussian data ( right panel).

Linear Classifiers25 Fisher’s problem Find “between-class var” is maximized relative to “within-class var”. Maximize “Rayleigh quotient” :

Linear Classifiers26 Training and test error rates for the vowel data, as a function of the dimension of the discriminant subspace. In this case the best rate is for dimension 2.

Linear Classifiers27

Linear Classifiers28 South African Heart Disease Data

Linear Classifiers29 Logistic Regression Model:

Linear Classifiers30 Logistic Regression / LDA Same form as Logistic Regression Conditional Likelihood Probability

Linear Classifiers31 Rosenblatt’s Perceptron Learning Algorithm --distance of points in M to boundary Stochastic Gradient Descent

Linear Classifiers32 Separating Hyperplanes A toy example with two classes separable by hyperplane. The orange line is the least squares solution, which misclassifies one of the training points. Also shown are two blue separating hyperplanes found by the perceptron learning algorithm with different random starts.

Linear Classifiers33 Optimal Separating Hyperplanes