Download presentation
1
240-650 Principles of Pattern Recognition
Montri Karnjanadecha : Chapter 2: Bayesian Decision Theory
2
Bayesian Decision Theory
Chapter 2 Bayesian Decision Theory : Chapter 2: Bayesian Decision Theory
3
Statistical Approach to Pattern Recognition
: Chapter 2: Bayesian Decision Theory
4
240-650: Chapter 2: Bayesian Decision Theory
A Simple Example Suppose that we are given two classes w1 and w2 P(w1) = 0.7 P(w2) = 0.3 No measurement is given Guessing What shall we do to recognize a given input? What is the best we can do statistically? Why? : Chapter 2: Bayesian Decision Theory
5
A More Complicated Example
Suppose that we are given two classes A single measurement x P(w1|x) and P(w2|x) are given graphically : Chapter 2: Bayesian Decision Theory
6
240-650: Chapter 2: Bayesian Decision Theory
A Bayesian Example Suppose that we are given two classes A single measurement x We are given p(x|w1) and p(x|w2) this time : Chapter 2: Bayesian Decision Theory
7
A Bayesian Example – cont.
: Chapter 2: Bayesian Decision Theory
8
Bayesian Decision Theory
Bayes formula In case of two categories In English, it can be expressed as : Chapter 2: Bayesian Decision Theory
9
Bayesian Decision Theory – cont.
A posterior probability The probability of the state of nature being given that feature value x has been measured Likelihood is the likelihood of with respect to x Evidence The evidence factor can be viewed as a scaling factor that guarantees that the posterior probabilities sum to one. : Chapter 2: Bayesian Decision Theory
10
Bayesian Decision Theory – cont.
Whenever we observe a particular x, the prob. of error is The average prob. of error is given by : Chapter 2: Bayesian Decision Theory
11
Bayesian Decision Theory – cont.
Bayes decision rule Decide w1 if P(w1|x) > P(w2|x); otherwise decide w2 Prob. of error P(error|x)=min[P(w1|x), P(w2|x)] If we ignore the “evidence”, the decision rule becomes: Decide w1 if P(x|w1) P(w1) > P(x|w2) P(w2) Otherwise decide w2 : Chapter 2: Bayesian Decision Theory
12
Bayesian Decision Theory--continuous features
Feature space In general, an input can be represented by a vector, a point in a d-dimensional Euclidean space Rd Loss function The loss function states exactly how costly each action is and is used to convert a probability determination into a decision Written as : Chapter 2: Bayesian Decision Theory
13
240-650: Chapter 2: Bayesian Decision Theory
Loss Function Describe the loss incurred for taking action ai when the state of nature is wj : Chapter 2: Bayesian Decision Theory
14
240-650: Chapter 2: Bayesian Decision Theory
Conditional Risk Suppose we observe a particular x We take action ai If the true state of nature is wj By definition we will incur the loss l(ai|wj) We can minimize our expected loss by selecting the action that minimize the condition risk, R(ai|x) : Chapter 2: Bayesian Decision Theory
15
Bayesian Decision Theory
Suppose that there are c categories {w1, w2, ..., wc} Conditional risk Risk is the average expected loss : Chapter 2: Bayesian Decision Theory
16
Bayesian Decision Theory
Bayes decision rule For a given x, select the action ai for which the conditional risk is minimum The resulting minimum overall risk is called the Bayes risk, denoted as R*, which is the best performance that can be achieved : Chapter 2: Bayesian Decision Theory
17
Two-Category Classification
Let lij = l(ai|wj) Conditional risk Fundamental decision rule Decide w1 if R(a1|x) < R(w2|x) : Chapter 2: Bayesian Decision Theory
18
Two-Category Classification – cont.
The decision rule can be written in several ways Decide w1 if one of the followings is true These rules are equivalent Likelihood Ratio : Chapter 2: Bayesian Decision Theory
19
Minimum-Error-Rate Classification
A special case of the Bayes decision rule with the following zero-one loss function Assigns no loss to correct decision Assigns unit loss to any error All errors are equally costly : Chapter 2: Bayesian Decision Theory
20
Minimum-Error-Rate Classification
Conditional risk : Chapter 2: Bayesian Decision Theory
21
Minimum-Error-Rate Classification
We should select i that maximizes the posterior probability For minimum error rate: Decide : Chapter 2: Bayesian Decision Theory
22
Minimum-Error-Rate Classification
: Chapter 2: Bayesian Decision Theory
23
Classifiers, Discriminant Functions, and Decision Surfaces
There are many ways to represent pattern classifiers One of the most useful is in terms of a set of discriminant functions gi(x), i=1,…,c The classifier assigns a feature vector x to class if : Chapter 2: Bayesian Decision Theory
24
The Multicategory Classifier
: Chapter 2: Bayesian Decision Theory
25
Classifiers, Discriminant Functions, and Decision Surfaces
There are many equivalent discriminant functions i.e., the classification results will be the same even though they are different functions For example, if f is a monotonically increasing function, then : Chapter 2: Bayesian Decision Theory
26
Classifiers, Discriminant Functions, and Decision Surfaces
Some of discriminant functions are easier to understand or to compute : Chapter 2: Bayesian Decision Theory
27
240-650: Chapter 2: Bayesian Decision Theory
Decision Regions The effect of any decision is to divide the feature space into c decision regions, R1, ..., Rc The regions are separated with decision boundaries, where ties occur among the largest discriminant functions : Chapter 2: Bayesian Decision Theory
28
Decision Regions – cont.
: Chapter 2: Bayesian Decision Theory
29
Two-Category Case (Dichotomizer)
Two-category case is a special case Instead of two discriminant functions, a single one can be used : Chapter 2: Bayesian Decision Theory
30
240-650: Chapter 2: Bayesian Decision Theory
The Normal Density Univariate Gaussian Density Mean Variance : Chapter 2: Bayesian Decision Theory
31
240-650: Chapter 2: Bayesian Decision Theory
The Normal Density : Chapter 2: Bayesian Decision Theory
32
240-650: Chapter 2: Bayesian Decision Theory
The Normal Density Central Limit Theorem The aggregate effect of the sum of a large number of small, independent random disturbances will lead to a Gaussian distribution Gaussian is often a good model for the actual probability distribution : Chapter 2: Bayesian Decision Theory
33
The Multivariate Normal Density
Multivariate Density (in d dimension) Abbreviation : Chapter 2: Bayesian Decision Theory
34
The Multivariate Normal Density
Mean Covariance matrix The ijth component of : Chapter 2: Bayesian Decision Theory
35
Statistically Independence
If xi and xj are statistically independence then The covariance matrix will become a diagonal matrix where all off-diagonal elements are zero : Chapter 2: Bayesian Decision Theory
36
Whitening Transform Diagonal matrix of the corresponding eigenvalues of matrix whose columns are the orthonormal eigenvectors of : Chapter 2: Bayesian Decision Theory
37
240-650: Chapter 2: Bayesian Decision Theory
Whitening Transform : Chapter 2: Bayesian Decision Theory
38
Squared Mahalanobis Distance from x to m
Constant density Principle axes of hyperellipsiods are given by the eigenvectors of S Length of axes are determined by eigenvalues of S : Chapter 2: Bayesian Decision Theory
39
Discriminant Functions for the Normal Density
Minimum distance classifier If the density are multivariate normal– i.e., if Then we have: : Chapter 2: Bayesian Decision Theory
40
Discriminant Functions for the Normal Density
Case 1: Features are statistically independence and each feature has the same variance Where || . || denotes the Euclidean norm : Chapter 2: Bayesian Decision Theory
41
240-650: Chapter 2: Bayesian Decision Theory
Case 1: Si = s2I : Chapter 2: Bayesian Decision Theory
42
Linear Discriminant Function
It is not necessary to compute distances Expanding the form yields The term is the same for all i We have the following linear discriminant function : Chapter 2: Bayesian Decision Theory
43
Linear Discriminant Function
where and Threshold or bias for the ith category : Chapter 2: Bayesian Decision Theory
44
240-650: Chapter 2: Bayesian Decision Theory
Linear Machine A classifier that uses linear discriminant functions is called a linear machine Its decision surfaces are pieces of hyperplanes defined by the linear equations for the two categories with the highest posterior probabilities. For our case this equation can be written as : Chapter 2: Bayesian Decision Theory
45
240-650: Chapter 2: Bayesian Decision Theory
Linear Machine Where And If then the second term vanishes It is called a minimum-distance classifier : Chapter 2: Bayesian Decision Theory
46
Priors change -> decision boundaries shift
: Chapter 2: Bayesian Decision Theory
47
Priors change -> decision boundaries shift
: Chapter 2: Bayesian Decision Theory
48
Priors change -> decision boundaries shift
: Chapter 2: Bayesian Decision Theory
49
240-650: Chapter 2: Bayesian Decision Theory
Case 2: Si = S Covariance matrices for all of the classes are identical but otherwise arbitrary The cluster for the ith class is centered about mi Discriminant function: Can be ignored if prior probabilities are the same for all classes : Chapter 2: Bayesian Decision Theory
50
Case 2: Discriminant function
Where and : Chapter 2: Bayesian Decision Theory
51
240-650: Chapter 2: Bayesian Decision Theory
For 2-category case If Ri and Rj are contiguous, the boundary between them has the equation where and : Chapter 2: Bayesian Decision Theory
52
240-650: Chapter 2: Bayesian Decision Theory
53
240-650: Chapter 2: Bayesian Decision Theory
54
240-650: Chapter 2: Bayesian Decision Theory
Case 3: Si = arbitrary In general, the covariance matrices are different for each category The only term that can be dropped is the (d/2) ln 2p term : Chapter 2: Bayesian Decision Theory
55
240-650: Chapter 2: Bayesian Decision Theory
Case 3: Si = arbitrary The discriminant functions are Where and : Chapter 2: Bayesian Decision Theory
56
240-650: Chapter 2: Bayesian Decision Theory
Two-category case The decision surface are hyperquadrics (hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids,…) : Chapter 2: Bayesian Decision Theory
57
240-650: Chapter 2: Bayesian Decision Theory
58
240-650: Chapter 2: Bayesian Decision Theory
59
240-650: Chapter 2: Bayesian Decision Theory
60
240-650: Chapter 2: Bayesian Decision Theory
61
240-650: Chapter 2: Bayesian Decision Theory
Example : Chapter 2: Bayesian Decision Theory
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.