Download presentation
Presentation is loading. Please wait.
Published byTiffany Ellis Modified over 9 years ago
1
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher
2
Pattern Classification Chapter 2 (Part 2) 1 Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5) Minimum-Error-Rate Classification Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces Classifiers, Discriminant Functions and Decision Surfaces The Normal Density The Normal Density
3
Pattern Classification Chapter 2 (Part 2)2 2.3 Minimum-Error-Rate Classification Actions are decisions on classes Actions are decisions on classes If action i is taken and the true state of nature is j then: the decision is correct if i = j and in error if i j Seek a decision rule that minimizes the probability of error which is the error rate Seek a decision rule that minimizes the probability of error which is the error rate
4
Pattern Classification Chapter 2 (Part 2)3 Introduction of the zero-one loss function: Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error”
5
Pattern Classification Chapter 2 (Part 2)4 Minimize the risk requires maximize P( i | x) Minimize the risk requires maximize P( i | x) (since R( i | x) = 1 – P( i | x)) For Minimum error rate For Minimum error rate Decide i if P ( i | x) > P( j | x) j i Decide i if P ( i | x) > P( j | x) j i
6
Pattern Classification Chapter 2 (Part 2)5 Regions of decision and zero-one loss function, therefore: Regions of decision and zero-one loss function, therefore: If is the zero-one loss function which means: If is the zero-one loss function which means:
7
Pattern Classification Chapter 2 (Part 2)6
8
7 2.4 Classifiers, Discriminant Functions and Decision Surfaces The multi-category case The multi-category case Set of discriminant functions g i (x), i = 1,…, c Set of discriminant functions g i (x), i = 1,…, c The classifier assigns a feature vector x to class i The classifier assigns a feature vector x to class iif: g i (x) > g j (x) j i
9
Pattern Classification Chapter 2 (Part 2)8
10
9 For the minimum risk case For the minimum risk case Let g i (x) = - R( i | x) (max. discriminant corresponds to min. risk!) For the minimum error rate cae, we take For the minimum error rate cae, we take g i (x) = P( i | x) (max. discrimination corresponds to max. posterior!) g i (x) p(x | i ) P( i ) g i (x) = ln p(x | i ) + ln P( i ) (ln: natural logarithm!)
11
Pattern Classification Chapter 2 (Part 2)10 Feature space divided into c decision regions Feature space divided into c decision regions if g i (x) > g j (x) j i then x is in R i ( R i means assign x to i ) The two-category case The two-category case A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 Let g(x) g 1 (x) – g 2 (x) Decide 1 if g(x) > 0 ; Otherwise decide 2
12
Pattern Classification Chapter 2 (Part 2)11 The computation of g(x) The computation of g(x)
13
Pattern Classification Chapter 2 (Part 2)12
14
Pattern Classification Chapter 2 (Part 2)13 2.5 The Normal Density Univariate density Univariate density Continuous density Continuous density A lot of processes are asymptotically Gaussian A lot of processes are asymptotically Gaussian Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem) Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem)Where: = mean (or expected value) of x = mean (or expected value) of x 2 = expected squared deviation or variance 2 = expected squared deviation or variance
15
Pattern Classification Chapter 2 (Part 2)14
16
Pattern Classification Chapter 2 (Part 2)15 Multivariate density Multivariate density Multivariate normal density in d dimensions is: Multivariate normal density in d dimensions is:where: x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form) x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form) = ( 1, 2, …, d ) t mean vector = ( 1, 2, …, d ) t mean vector = d*d covariance matrix = d*d covariance matrix | | and -1 are determinant and inverse respectively | | and -1 are determinant and inverse respectively
17
Pattern Classification Chapter 2 (Part 2)16
18
Pattern Classification Chapter 2 (Part 2)17 Linear combinations of normally distributed random variables are normally distributed Linear combinations of normally distributed random variables are normally distributed if if then then Whitening transform: Whitening transform: :The orthonormal eigenvectors of :The orthonormal eigenvectors of :The diagonal matrix of the eigenvalues :The diagonal matrix of the eigenvalues
19
Pattern Classification Chapter 2 (Part 2)18 Mahalanobis distance from x to Mahalanobis distance from x to
20
Pattern Classification Chapter 2 (Part 2)19 Entropy Entropy Measure the fundamental uncertainty in the values of points selected randomly Measure the fundamental uncertainty in the values of points selected randomly Measured in nats, bit, Measured in nats, bit, Normal distribution has max entropy of all distributions having a given mean and variance Normal distribution has max entropy of all distributions having a given mean and variance
21
Pattern Classification Chapter 2 (Part 2)20 Assignment :2.3.12, 2.4.14,2.5.22,2.5.23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.