Download presentation
Presentation is loading. Please wait.
Published byEleanore Henry Modified over 9 years ago
1
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher
2
Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density
3
Pattern Classification, Chapter 2 (Part 2) 2 Minimum-Error-Rate Classification Actions are decisions on classes If action i is taken and the true state of nature is j then: the decision is correct if i = j and in error if i j Seek a decision rule that minimizes the probability of error which is the error rate
4
Pattern Classification, Chapter 2 (Part 2) 3 Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error”
5
Pattern Classification, Chapter 2 (Part 2) 4 Minimize the risk requires maximize P( i | x) (since R( i | x) = 1 – P( i | x)) For Minimum error rate Decide i if P ( i | x) > P( j | x) j i
6
Pattern Classification, Chapter 2 (Part 2) 5 Regions of decision and zero-one loss function, therefore: If is the zero-one loss function which means:
7
Pattern Classification, Chapter 2 (Part 2) 6
8
7 Classifiers, Discriminant Functions and Decision Surfaces The multi-category case Set of discriminant functions g i (x), i = 1,…, c The classifier assigns a feature vector x to class i if: g i (x) > g j (x) j i
9
Pattern Classification, Chapter 2 (Part 2) 8
10
9 Let g i (x) = - R( i | x) (max. discriminant corresponds to min. risk!) For the minimum error rate, we take g i (x) = P( i | x) (max. discrimination corresponds to max. posterior!) g i (x) P(x | i ) P( i ) g i (x) = ln P(x | i ) + ln P( i ) (ln: natural logarithm!)
11
Pattern Classification, Chapter 2 (Part 2) 10 Feature space divided into c decision regions if g i (x) > g j (x) j i then x is in R i ( R i means assign x to i ) The two-category case A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 Let g(x) g 1 (x) – g 2 (x) Decide 1 if g(x) > 0 ; Otherwise decide 2
12
Pattern Classification, Chapter 2 (Part 2) 11 The computation of g(x)
13
Pattern Classification, Chapter 2 (Part 2) 12
14
Pattern Classification, Chapter 2 (Part 2) 13 The Normal Density Univariate density Density which is analytically tractable Continuous density A lot of processes are asymptotically Gaussian Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem) Where: = mean (or expected value) of x 2 = expected squared deviation or variance
15
Pattern Classification, Chapter 2 (Part 2) 14
16
Pattern Classification, Chapter 2 (Part 2) 15 Multivariate density Multivariate normal density in d dimensions is: where: x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form) = ( 1, 2, …, d ) t mean vector = d*d covariance matrix | | and -1 are determinant and inverse respectively
17
Pattern Classification, Chapter 2 (Part 2) 16 Appendix Variance=S 2 Standard Deviation=S
18
Pattern Classification, Chapter 2 (Part 2) 17 Bays theorem A ﹁ A BA and B ﹁ A and B ﹁ BA and ﹁ B ﹁ A and ﹁ B
19
Pattern Classification, Chapter 2 (Part 2) 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.