Download presentation
Presentation is loading. Please wait.
Published byDoreen Osborne Modified over 9 years ago
1
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density All materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001 with the permission of the authors and the publisher
2
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 2 Minimum-Error-Rate Classification Actions are decisions on classes If action i is taken and the true state of nature is j then: the decision is correct if i = j and in error if i j Seek a decision rule that minimizes the probability of error which is the error rate 3
3
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 3 Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error” 3
4
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 4 Minimize the risk requires maximize P( i | x) (since R( i | x) = 1 – P( i | x)) For Minimum error rate Decide i if P ( i | x) > P( j | x) j i 3
5
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 5 Regions of decision and zero-one loss function, therefore: If is the zero-one loss function wich means: 3
6
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 6 3
7
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 7 Classifiers, Discriminant Functions and Decision Surfaces The multi-category case Set of discriminant functions g i (x), i = 1,…, c The classifier assigns a feature vector x to class i if: g i (x) > g j (x) j i 4
8
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 8 4
9
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 9 Let g i (x) = - R( i | x) (max. discriminant corresponds to min. risk!) For the minimum error rate, we take g i (x) = P( i | x) (max. discrimination corresponds to max. posterior!) g i (x) P(x | i ) P( i ) g i (x) = ln P(x | i ) + ln P( i ) (ln: natural logarithm!) 4
10
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 10 Feature space divided into c decision regions if g i (x) > g j (x) j i then x is in R i ( R i means assign x to i ) The two-category case A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 Let g(x) g 1 (x) – g 2 (x) Decide 1 if g(x) > 0; Otherwise decide 2 4
11
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 11 The computation of g(x) 4
12
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 12 4
13
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 13 The Normal Density Univariate density Density which is analytically tractable Continuous density A lot of processes are asymptotically Gaussian Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem) Where: = mean (or expected value) of x 2 = expected squared deviation or variance 5
14
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 14 5
15
Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 15 Multivariate density Multivariate normal density in d dimensions is: where: x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form) = ( 1, 2, …, d ) t mean vector = d*d covariance matrix | | and -1 are determinant and inverse respectively 5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.