Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principles of Pattern Recognition

Similar presentations


Presentation on theme: "Principles of Pattern Recognition"— Presentation transcript:

1 240-650 Principles of Pattern Recognition
Montri Karnjanadecha : Chapter 2: Bayesian Decision Theory

2 Bayesian Decision Theory
Chapter 2 Bayesian Decision Theory : Chapter 2: Bayesian Decision Theory

3 Statistical Approach to Pattern Recognition
: Chapter 2: Bayesian Decision Theory

4 240-650: Chapter 2: Bayesian Decision Theory
A Simple Example Suppose that we are given two classes w1 and w2 P(w1) = 0.7 P(w2) = 0.3 No measurement is given Guessing What shall we do to recognize a given input? What is the best we can do statistically? Why? : Chapter 2: Bayesian Decision Theory

5 A More Complicated Example
Suppose that we are given two classes A single measurement x P(w1|x) and P(w2|x) are given graphically : Chapter 2: Bayesian Decision Theory

6 240-650: Chapter 2: Bayesian Decision Theory
A Bayesian Example Suppose that we are given two classes A single measurement x We are given p(x|w1) and p(x|w2) this time : Chapter 2: Bayesian Decision Theory

7 A Bayesian Example – cont.
: Chapter 2: Bayesian Decision Theory

8 Bayesian Decision Theory
Bayes formula In case of two categories In English, it can be expressed as : Chapter 2: Bayesian Decision Theory

9 Bayesian Decision Theory – cont.
A posterior probability The probability of the state of nature being given that feature value x has been measured Likelihood is the likelihood of with respect to x Evidence The evidence factor can be viewed as a scaling factor that guarantees that the posterior probabilities sum to one. : Chapter 2: Bayesian Decision Theory

10 Bayesian Decision Theory – cont.
Whenever we observe a particular x, the prob. of error is The average prob. of error is given by : Chapter 2: Bayesian Decision Theory

11 Bayesian Decision Theory – cont.
Bayes decision rule Decide w1 if P(w1|x) > P(w2|x); otherwise decide w2 Prob. of error P(error|x)=min[P(w1|x), P(w2|x)] If we ignore the “evidence”, the decision rule becomes: Decide w1 if P(x|w1) P(w1) > P(x|w2) P(w2) Otherwise decide w2 : Chapter 2: Bayesian Decision Theory

12 Bayesian Decision Theory--continuous features
Feature space In general, an input can be represented by a vector, a point in a d-dimensional Euclidean space Rd Loss function The loss function states exactly how costly each action is and is used to convert a probability determination into a decision Written as : Chapter 2: Bayesian Decision Theory

13 240-650: Chapter 2: Bayesian Decision Theory
Loss Function Describe the loss incurred for taking action ai when the state of nature is wj : Chapter 2: Bayesian Decision Theory

14 240-650: Chapter 2: Bayesian Decision Theory
Conditional Risk Suppose we observe a particular x We take action ai If the true state of nature is wj By definition we will incur the loss l(ai|wj) We can minimize our expected loss by selecting the action that minimize the condition risk, R(ai|x) : Chapter 2: Bayesian Decision Theory

15 Bayesian Decision Theory
Suppose that there are c categories {w1, w2, ..., wc} Conditional risk Risk is the average expected loss : Chapter 2: Bayesian Decision Theory

16 Bayesian Decision Theory
Bayes decision rule For a given x, select the action ai for which the conditional risk is minimum The resulting minimum overall risk is called the Bayes risk, denoted as R*, which is the best performance that can be achieved : Chapter 2: Bayesian Decision Theory

17 Two-Category Classification
Let lij = l(ai|wj) Conditional risk Fundamental decision rule Decide w1 if R(a1|x) < R(w2|x) : Chapter 2: Bayesian Decision Theory

18 Two-Category Classification – cont.
The decision rule can be written in several ways Decide w1 if one of the followings is true These rules are equivalent Likelihood Ratio : Chapter 2: Bayesian Decision Theory

19 Minimum-Error-Rate Classification
A special case of the Bayes decision rule with the following zero-one loss function Assigns no loss to correct decision Assigns unit loss to any error All errors are equally costly : Chapter 2: Bayesian Decision Theory

20 Minimum-Error-Rate Classification
Conditional risk : Chapter 2: Bayesian Decision Theory

21 Minimum-Error-Rate Classification
We should select i that maximizes the posterior probability For minimum error rate: Decide : Chapter 2: Bayesian Decision Theory

22 Minimum-Error-Rate Classification
: Chapter 2: Bayesian Decision Theory

23 Classifiers, Discriminant Functions, and Decision Surfaces
There are many ways to represent pattern classifiers One of the most useful is in terms of a set of discriminant functions gi(x), i=1,…,c The classifier assigns a feature vector x to class if : Chapter 2: Bayesian Decision Theory

24 The Multicategory Classifier
: Chapter 2: Bayesian Decision Theory

25 Classifiers, Discriminant Functions, and Decision Surfaces
There are many equivalent discriminant functions i.e., the classification results will be the same even though they are different functions For example, if f is a monotonically increasing function, then : Chapter 2: Bayesian Decision Theory

26 Classifiers, Discriminant Functions, and Decision Surfaces
Some of discriminant functions are easier to understand or to compute : Chapter 2: Bayesian Decision Theory

27 240-650: Chapter 2: Bayesian Decision Theory
Decision Regions The effect of any decision is to divide the feature space into c decision regions, R1, ..., Rc The regions are separated with decision boundaries, where ties occur among the largest discriminant functions : Chapter 2: Bayesian Decision Theory

28 Decision Regions – cont.
: Chapter 2: Bayesian Decision Theory

29 Two-Category Case (Dichotomizer)
Two-category case is a special case Instead of two discriminant functions, a single one can be used : Chapter 2: Bayesian Decision Theory

30 240-650: Chapter 2: Bayesian Decision Theory
The Normal Density Univariate Gaussian Density Mean Variance : Chapter 2: Bayesian Decision Theory

31 240-650: Chapter 2: Bayesian Decision Theory
The Normal Density : Chapter 2: Bayesian Decision Theory

32 240-650: Chapter 2: Bayesian Decision Theory
The Normal Density Central Limit Theorem The aggregate effect of the sum of a large number of small, independent random disturbances will lead to a Gaussian distribution Gaussian is often a good model for the actual probability distribution : Chapter 2: Bayesian Decision Theory

33 The Multivariate Normal Density
Multivariate Density (in d dimension) Abbreviation : Chapter 2: Bayesian Decision Theory

34 The Multivariate Normal Density
Mean Covariance matrix The ijth component of : Chapter 2: Bayesian Decision Theory

35 Statistically Independence
If xi and xj are statistically independence then The covariance matrix will become a diagonal matrix where all off-diagonal elements are zero : Chapter 2: Bayesian Decision Theory

36 Whitening Transform Diagonal matrix of the corresponding eigenvalues of matrix whose columns are the orthonormal eigenvectors of : Chapter 2: Bayesian Decision Theory

37 240-650: Chapter 2: Bayesian Decision Theory
Whitening Transform : Chapter 2: Bayesian Decision Theory

38 Squared Mahalanobis Distance from x to m
Constant density Principle axes of hyperellipsiods are given by the eigenvectors of S Length of axes are determined by eigenvalues of S : Chapter 2: Bayesian Decision Theory

39 Discriminant Functions for the Normal Density
Minimum distance classifier If the density are multivariate normal– i.e., if Then we have: : Chapter 2: Bayesian Decision Theory

40 Discriminant Functions for the Normal Density
Case 1: Features are statistically independence and each feature has the same variance Where || . || denotes the Euclidean norm : Chapter 2: Bayesian Decision Theory

41 240-650: Chapter 2: Bayesian Decision Theory
Case 1: Si = s2I : Chapter 2: Bayesian Decision Theory

42 Linear Discriminant Function
It is not necessary to compute distances Expanding the form yields The term is the same for all i We have the following linear discriminant function : Chapter 2: Bayesian Decision Theory

43 Linear Discriminant Function
where and Threshold or bias for the ith category : Chapter 2: Bayesian Decision Theory

44 240-650: Chapter 2: Bayesian Decision Theory
Linear Machine A classifier that uses linear discriminant functions is called a linear machine Its decision surfaces are pieces of hyperplanes defined by the linear equations for the two categories with the highest posterior probabilities. For our case this equation can be written as : Chapter 2: Bayesian Decision Theory

45 240-650: Chapter 2: Bayesian Decision Theory
Linear Machine Where And If then the second term vanishes It is called a minimum-distance classifier : Chapter 2: Bayesian Decision Theory

46 Priors change -> decision boundaries shift
: Chapter 2: Bayesian Decision Theory

47 Priors change -> decision boundaries shift
: Chapter 2: Bayesian Decision Theory

48 Priors change -> decision boundaries shift
: Chapter 2: Bayesian Decision Theory

49 240-650: Chapter 2: Bayesian Decision Theory
Case 2: Si = S Covariance matrices for all of the classes are identical but otherwise arbitrary The cluster for the ith class is centered about mi Discriminant function: Can be ignored if prior probabilities are the same for all classes : Chapter 2: Bayesian Decision Theory

50 Case 2: Discriminant function
Where and : Chapter 2: Bayesian Decision Theory

51 240-650: Chapter 2: Bayesian Decision Theory
For 2-category case If Ri and Rj are contiguous, the boundary between them has the equation where and : Chapter 2: Bayesian Decision Theory

52 240-650: Chapter 2: Bayesian Decision Theory

53 240-650: Chapter 2: Bayesian Decision Theory

54 240-650: Chapter 2: Bayesian Decision Theory
Case 3: Si = arbitrary In general, the covariance matrices are different for each category The only term that can be dropped is the (d/2) ln 2p term : Chapter 2: Bayesian Decision Theory

55 240-650: Chapter 2: Bayesian Decision Theory
Case 3: Si = arbitrary The discriminant functions are Where and : Chapter 2: Bayesian Decision Theory

56 240-650: Chapter 2: Bayesian Decision Theory
Two-category case The decision surface are hyperquadrics (hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids,…) : Chapter 2: Bayesian Decision Theory

57 240-650: Chapter 2: Bayesian Decision Theory

58 240-650: Chapter 2: Bayesian Decision Theory

59 240-650: Chapter 2: Bayesian Decision Theory

60 240-650: Chapter 2: Bayesian Decision Theory

61 240-650: Chapter 2: Bayesian Decision Theory
Example : Chapter 2: Bayesian Decision Theory


Download ppt "Principles of Pattern Recognition"

Similar presentations


Ads by Google