Principles of Pattern Recognition

Slides:



Advertisements
Similar presentations
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Advertisements

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Bayesian Decision Theory
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 5: Linear Discriminant Functions
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 2 (part 3) Bayesian Decision Theory Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features All materials used.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
Bayesian Decision Theory (Classification) 主講人:虞台文.
Digital Image Processing Lecture 25: Object Recognition Prof. Charlene Tsai.
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
Lecture 2. Bayesian Decision Theory
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
LECTURE 03: DECISION SURFACES
CH 5: Multivariate Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
Bayesian Classification
Mathematical Foundations of BME
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

240-650 Principles of Pattern Recognition Montri Karnjanadecha montri@coe.psu.ac.th http://fivedots.coe.psu.ac.th/~montri 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory Chapter 2 Bayesian Decision Theory 240-650: Chapter 2: Bayesian Decision Theory

Statistical Approach to Pattern Recognition 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory A Simple Example Suppose that we are given two classes w1 and w2 P(w1) = 0.7 P(w2) = 0.3 No measurement is given Guessing What shall we do to recognize a given input? What is the best we can do statistically? Why? 240-650: Chapter 2: Bayesian Decision Theory

A More Complicated Example Suppose that we are given two classes A single measurement x P(w1|x) and P(w2|x) are given graphically 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory A Bayesian Example Suppose that we are given two classes A single measurement x We are given p(x|w1) and p(x|w2) this time 240-650: Chapter 2: Bayesian Decision Theory

A Bayesian Example – cont. 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory Bayes formula In case of two categories In English, it can be expressed as 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory – cont. A posterior probability The probability of the state of nature being given that feature value x has been measured Likelihood is the likelihood of with respect to x Evidence The evidence factor can be viewed as a scaling factor that guarantees that the posterior probabilities sum to one. 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory – cont. Whenever we observe a particular x, the prob. of error is The average prob. of error is given by 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory – cont. Bayes decision rule Decide w1 if P(w1|x) > P(w2|x); otherwise decide w2 Prob. of error P(error|x)=min[P(w1|x), P(w2|x)] If we ignore the “evidence”, the decision rule becomes: Decide w1 if P(x|w1) P(w1) > P(x|w2) P(w2) Otherwise decide w2 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory--continuous features Feature space In general, an input can be represented by a vector, a point in a d-dimensional Euclidean space Rd Loss function The loss function states exactly how costly each action is and is used to convert a probability determination into a decision Written as 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Loss Function Describe the loss incurred for taking action ai when the state of nature is wj 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Conditional Risk Suppose we observe a particular x We take action ai If the true state of nature is wj By definition we will incur the loss l(ai|wj) We can minimize our expected loss by selecting the action that minimize the condition risk, R(ai|x) 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory Suppose that there are c categories {w1, w2, ..., wc} Conditional risk Risk is the average expected loss 240-650: Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory Bayes decision rule For a given x, select the action ai for which the conditional risk is minimum The resulting minimum overall risk is called the Bayes risk, denoted as R*, which is the best performance that can be achieved 240-650: Chapter 2: Bayesian Decision Theory

Two-Category Classification Let lij = l(ai|wj) Conditional risk Fundamental decision rule Decide w1 if R(a1|x) < R(w2|x) 240-650: Chapter 2: Bayesian Decision Theory

Two-Category Classification – cont. The decision rule can be written in several ways Decide w1 if one of the followings is true These rules are equivalent Likelihood Ratio 240-650: Chapter 2: Bayesian Decision Theory

Minimum-Error-Rate Classification A special case of the Bayes decision rule with the following zero-one loss function Assigns no loss to correct decision Assigns unit loss to any error All errors are equally costly 240-650: Chapter 2: Bayesian Decision Theory

Minimum-Error-Rate Classification Conditional risk 240-650: Chapter 2: Bayesian Decision Theory

Minimum-Error-Rate Classification We should select i that maximizes the posterior probability For minimum error rate: Decide 240-650: Chapter 2: Bayesian Decision Theory

Minimum-Error-Rate Classification 240-650: Chapter 2: Bayesian Decision Theory

Classifiers, Discriminant Functions, and Decision Surfaces There are many ways to represent pattern classifiers One of the most useful is in terms of a set of discriminant functions gi(x), i=1,…,c The classifier assigns a feature vector x to class if 240-650: Chapter 2: Bayesian Decision Theory

The Multicategory Classifier 240-650: Chapter 2: Bayesian Decision Theory

Classifiers, Discriminant Functions, and Decision Surfaces There are many equivalent discriminant functions i.e., the classification results will be the same even though they are different functions For example, if f is a monotonically increasing function, then 240-650: Chapter 2: Bayesian Decision Theory

Classifiers, Discriminant Functions, and Decision Surfaces Some of discriminant functions are easier to understand or to compute 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Decision Regions The effect of any decision is to divide the feature space into c decision regions, R1, ..., Rc The regions are separated with decision boundaries, where ties occur among the largest discriminant functions 240-650: Chapter 2: Bayesian Decision Theory

Decision Regions – cont. 240-650: Chapter 2: Bayesian Decision Theory

Two-Category Case (Dichotomizer) Two-category case is a special case Instead of two discriminant functions, a single one can be used 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory The Normal Density Univariate Gaussian Density Mean Variance 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory The Normal Density 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory The Normal Density Central Limit Theorem The aggregate effect of the sum of a large number of small, independent random disturbances will lead to a Gaussian distribution Gaussian is often a good model for the actual probability distribution 240-650: Chapter 2: Bayesian Decision Theory

The Multivariate Normal Density Multivariate Density (in d dimension) Abbreviation 240-650: Chapter 2: Bayesian Decision Theory

The Multivariate Normal Density Mean Covariance matrix The ijth component of 240-650: Chapter 2: Bayesian Decision Theory

Statistically Independence If xi and xj are statistically independence then The covariance matrix will become a diagonal matrix where all off-diagonal elements are zero 240-650: Chapter 2: Bayesian Decision Theory

Whitening Transform Diagonal matrix of the corresponding eigenvalues of matrix whose columns are the orthonormal eigenvectors of 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Whitening Transform 240-650: Chapter 2: Bayesian Decision Theory

Squared Mahalanobis Distance from x to m Constant density Principle axes of hyperellipsiods are given by the eigenvectors of S Length of axes are determined by eigenvalues of S 240-650: Chapter 2: Bayesian Decision Theory

Discriminant Functions for the Normal Density Minimum distance classifier If the density are multivariate normal– i.e., if Then we have: 240-650: Chapter 2: Bayesian Decision Theory

Discriminant Functions for the Normal Density Case 1: Features are statistically independence and each feature has the same variance Where || . || denotes the Euclidean norm 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Case 1: Si = s2I 240-650: Chapter 2: Bayesian Decision Theory

Linear Discriminant Function It is not necessary to compute distances Expanding the form yields The term is the same for all i We have the following linear discriminant function 240-650: Chapter 2: Bayesian Decision Theory

Linear Discriminant Function where and Threshold or bias for the ith category 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Linear Machine A classifier that uses linear discriminant functions is called a linear machine Its decision surfaces are pieces of hyperplanes defined by the linear equations for the two categories with the highest posterior probabilities. For our case this equation can be written as 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Linear Machine Where And If then the second term vanishes It is called a minimum-distance classifier 240-650: Chapter 2: Bayesian Decision Theory

Priors change -> decision boundaries shift 240-650: Chapter 2: Bayesian Decision Theory

Priors change -> decision boundaries shift 240-650: Chapter 2: Bayesian Decision Theory

Priors change -> decision boundaries shift 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Case 2: Si = S Covariance matrices for all of the classes are identical but otherwise arbitrary The cluster for the ith class is centered about mi Discriminant function: Can be ignored if prior probabilities are the same for all classes 240-650: Chapter 2: Bayesian Decision Theory

Case 2: Discriminant function Where and 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory For 2-category case If Ri and Rj are contiguous, the boundary between them has the equation where and 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Case 3: Si = arbitrary In general, the covariance matrices are different for each category The only term that can be dropped is the (d/2) ln 2p term 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Case 3: Si = arbitrary The discriminant functions are Where and 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Two-category case The decision surface are hyperquadrics (hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids,…) 240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory Example 240-650: Chapter 2: Bayesian Decision Theory