Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona.

Similar presentations


Presentation on theme: "Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona."— Presentation transcript:

1 Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona

2 What is Classification? We want to assign objects to classes based on a selection of attributes (features). Examples: –(age, income)  {credit worthy, not credit worthy} –(blood cell count, body temp)  {flue, hepatitis B, hepatitis C} –(pixel vector)  {Bill Clinton, coffee cup} Feature vector can be continuous, discrete or mixed.

3 What is Classification? Want to find a function from measurements to class labels  decision boundary. x1x1 x2x2 Signal 1 Noise Signal 2 Statistical methods use pdf: p(C,x) Assume p(C,x) known for now Space of Feature Vectors

4 Some Terminology p(C) is called a prior or a priori probability p(x|C) is called a class-conditional density or likelihood of C with respect to x p(C|x) is called a posterior or a posteriori probability

5 Examples One measurement, symmetric cost, equal priors bad x p(x|C 1 ) p(x|C 2 )

6 Examples One measurement, symmetric cost, equal priors good x p(x|C 1 ) p(x|C 2 )

7 How to Make the Best Decision? (Bayes Decision Theory) Define a cost function for mistakes, e.g. Minimize expected loss (risk) over entire p(C,x). Sufficient to assure optimal decision for each individual x. Result: decide according to maximum posterior probability:

8 Two Classes, C 1, C 2 It is helpful to consider the likelihood ratio: Use known priors p(C i ) or ignore them. For more elaborate loss function (proof is easy): g(x) is called a discriminant function ?

9 Discriminant Functions for Multivariate Gaussian Class Conditional Densities Two multivariate Gaussians in d dimensions Since log is monotonic, we can look at log g(x). Mahalanobis Distance 2 superfluous

10 Mahalanobis Distance iso-distance lines = iso- probability lines Decision surface: x1x1 x2x2 11 22 decision surface

11

12 Case 1:  i =  2 I Discriminant functions… …simplify to:

13 Decision Boundary If  2 =0, we obtain... The matched filter! With an expression for the threshold.

14 Two Signals and Additive White Gaussian Noise Signal 1 Signal 2 x 11 22 1-21-2 x-  2 x1x1 x2x2

15 Case 2:  i =  Two classes, 2D measurements, p(x|C) are multivariate Gaussians with equal covariance matrices. Derivation is similar –Quadratic term vanishes since it is independent of class –We obtain a linear decision surface Matlab demo

16 Case 3: General Covariance Matrix See transparency

17 Isn’t this to simple? Not at all… It is true that images form complicated manifolds (from a pixel point of view, translation, rotation and scaling are all highly non-linear operations) The high dimensionality helps

18 Assume Unknown Class Densitites In real life, we do not know the class conditional densities. But we do have example data. This puts us in the typical machine learning scenario: We want to learn a function, c(x), from examples. Why not just estimate class densities from examples and apply the previous ideas? –Learn Gaussian (simple density): in N dimensions need N 2 samples at least! 10x10 pixels  10,000 examples! –Avoid estimating densities whenever you can! (too general) –posterior is generally simpler than class conditional (see transparency)

19 Remember PCA? Principal components are eigenvectors of covariance matrix Use reconstruction error for recognition (e.g. Eigenfaces) –good reduces dimensionality –bad no model within subspace linearity may be inappropriate covariance not appropriate to optimize discrimination x1x1 x2x2 u1u1  x

20 Fisher’s Linear Discriminant Goal: Reduce dimensionality before training classifiers etc. (Feature Selection) Similar goal as PCA! Fisher has classification in mind… Find projection directions such that separation is easiest Eigenfaces vs. Fisherfaces x1x1 x2x2

21 Fisher’s Linear Discriminant Assume we have n d-dimensional samples x 1,…,x n n 1 from set (class) X 1 and n 2 from set X 2 we form linear combinations: and obtain y 1 …,y n only direction of w is important

22 Objective for Fisher Measure the separation as the distance between the means after projecting (k = 1,2): Measure the scatter after projecting: Objective becomes to maximize

23 We need to make the dependence on w explicit: Defining the within-class scatter matrix, S W =S 1 +S 2, we obtain Similarly for the separation (between-class scatter matrix) Finally we can write

24 Fisher’s Solution Is called a generalized Rayleigh quotient. Any w that maximizes J must satisfy the generalized eigenvalue problem Since S B is very singular (rank 1), and S B w is in the direction of (m 1 -m 2 ), we are done:

25 Comments on FLD We did not follow Bayes Decision Theory FLD is useful for many types of densities Fisher can be extended (see demo): –more than one projection direction –more than two clusters Let’s try it out: Matlab Demo

26 Fisher vs. Bayes Assume we do have identical Gaussian class densities, then Bayes says: while Fisher says: Since S W is proportional to the covariance matrix, w is in the same direction in both cases. Comforting...

27 What have we achieved? Found out that maximum posterior strategy is optimal. Always. Looked at different cases of Gaussian class densities, where we could derive simple decision rules. Gaussian classifiers do reasonable jobs! Learned about FLD which is useful and often preferable to PCA.

28 Just for Fun: Support Vector Machine Very fashionable…s.o.t.a? Does not model densities Fits decision surface directly Maximizes margin  reduces “complexity” Decision surface only depends on nearby samples Matlab Demo x1x1 x2x2

29 Learning Algorithms Set of functions Learning Algorithm Examples: (x i,y i ) p(x,y) Learned function y = f(x) f = ?

30 Assume Unknown Class Densitites SVM Examples Densitites are hard to estimate -> avoid it –example from Ripley Give intuitions on overfitting Need to learn –Standard machine learning problem –Training/Test sets


Download ppt "Elements of Pattern Recognition CNS/EE-148 -- Lecture 5 M. Weber P. Perona."

Similar presentations


Ads by Google