Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.
Component Analysis (Review)
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning

CHAPTER 10: Linear Discrimination
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Chapter 4: Linear Models for Classification
Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.
Pattern Recognition and Machine Learning
x – independent variable (input)
Chapter 5: Linear Discriminant Functions
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Linear Methods for Classification
Decision Theory Naïve Bayes ROC Curves
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 10: Support Vector Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Principles of Pattern Recognition
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Lecture 4 Linear machine
Linear Models for Classification
Discriminant Analysis
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Probability Theory and Parameter Estimation I
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Classification Discriminant Analysis
Classification Discriminant Analysis
Pattern Recognition and Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Linear Discrimination
Presentation transcript:

Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models

Linear Discriminant Functions A discriminant function is a linear combination of the components of x: g(x) = w t x + w 0 where w t is the weight vector w 0 is the bias or threshold weight For the two-class problem we can use the following decision rule: Decide c1 if g(x) > 0 and c2 if g(x) < 0. For the general case we will have one discriminant function for each class.

Figure 5.1

The Normal Vector w The hyperplane H divides the feature space into two regions: Region R1 for class c1 and Region R2 for class c2. For two points x1 and x2 on the decision boundary: w t x1 + w 0 = w t x2 + w 0 which means w t (x1 – x2) = 0 Thus w is normal to any vector in the hyperplane.

Geometry for Linear Models

The Problem with Multiple Classes How do we use a linear discriminant when we have more than two classes? There are two approaches: 1.Learn one discriminant function for each class 2.Learn a discriminant function for all pairs of classes If c is the number of classes, in the first case we have c functions and in the second case we have c(c-1) / 2 functions. In both cases we are left with ambiguous regions.

Figure 5.3

Linear Machines To avoid the problem of ambiguous regions we can use linear machines: We define c linear discriminant functions and choose the one with highest value for a given x. gk(x) = wk t x + wk 0 k = 1, …, c In this case the decision regions are convex and thus are limited in flexibility and accuracy.

Figure 5.4

Generalized Linear Discriminant Functions A linear discriminant function g(x) can be written as: g(x) = w 0 + Σ i wixi i = 1, …, d (d is the number of features). We could add additional terms to obtain a quadratic discriminant function: g(x) = w 0 + Σ i wixi + Σ i Σ j wij xixj The quadratic discriminant function introduces d(d-1)/2 coefficients corresponding to the products of attributes. The surfaces are thus more complicated (hyperquadric surfaces).

Generalized Linear Discriminant Functions We could even add more terms wijk xi xj xk and obtain the class of polynomial discriminant functions. The generalized form is g(x) = Σ i wi yi(x) g(x) = w t y Where the summation goes over all functions yi(x). The yi(x) functions are called the phi or φ functions. The function is now linear on the yi(x). The functions map a d-dimensional x-space into a d’ dimensional y-space. Example: g(x) = w1 + w2x + w3x 2 y = (1 x x 2 ) t

Figure 5.5

Mapping to other space Mapping from x to y:  If x follows certain probability distribution, the corresponding distribution on the new space will be degenerate.  Even with simple functions for y, the decision surfaces in x can be quite complicated.  With a larger space we have more degrees of freedom (parameters to specify). Thus, we need larger samples.

Figure 5.6

Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models

Least Squares And how do we compute y(x)? How do we find the values of w0, w1, w2, …, wd? We can simply find the w that minimizes an error function E(w): E(w) = ½ Σ (g(x,w) – t) 2 Problems: Lacks robustness; assumes target vector is Gaussian.

Least Squares Least squares vs Logistic regression

Least Squares Least squares Logistic regression

Carl Friedrich Gauss German 1777 – 1855 Carl F. Gauss is known as the scientist who developed the idea of “least squares method”. He came up with this idea at the early age of eighteen years old! He is considered one of the greatest mathematician of all times. He made major discoveries in geometry, number theory, magnetism, astronomy, among other fields. Anecdotes: solved a problem posed by his teacher (sum all integers from 1 – 100).

Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models

Fisher’s Linear Discriminant The idea is to project the data on one single dimension. We choose a projection that maximizes class separation, and minimizes the variance within each class. Find w that maximizes an error function J(w) = (m2 – m1) 2 / s1 2 + s2 2 J(w) = w T S B w / w T S w w Where S B is the between-class covariance matrix And S W is the within class covariance matrix

Fisher’s Linear Discriminant S B = (m2 – m1) (m2 – m1) T S W = ∑ (x – m1)(x – m1) T + ∑(x – m2)(x – m2) T

Fisher’s Linear Discriminant Wrong Right

Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models

Probabilistic Generative Models We first compute g(x) = w1x1 + w2x2 + … + wdxd + w0 But instead we wish to have P(Ck|x). To get conditional probabilities we compute a logistic function: L(g(x)) = 1 / ( 1 + exp(-g(x)) ) And L(y) = P(Ck|x) if the two classes can be modeled as a Gaussian distribution with equal covariance matrix.