Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Advertisements

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Bayesian Decision Theory
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Statistical Decision Theory, Bayes Classifier
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Today Wrap up of probability Vectors, Matrices. Calculus
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Pattern Recognition Intro to Pattern Recognition : Bayesian Decision Theory 2. 1 Introduction 2.2 Bayesian Decision Theory – Continuous Features Materials.
Lecture 2: Bayesian Decision Theory 1. Diagram and formulation
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
2. Bayes Decision Theory Prof. A.L. Yuille Stat 231. Fall 2004.
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Optimal Bayes Classification
Bayesian Decision Theory (Classification) 主講人:虞台文.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 12 Sept 30, 2005 Nanjing University of Science & Technology.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS Objectives: Typical Examples Performance Bounds ROC Curves Resources: D.H.S.: Chapter 2 (Part.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 2. Bayesian Decision Theory
Basic Technical Concepts in Machine Learning
Lecture 1.31 Criteria for optimal reception of radio signals.
LECTURE 04: DECISION SURFACES
Special Topics In Scientific Computing
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 03: DECISION SURFACES
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Advanced Pattern Recognition
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
Mathematical Foundations of BME
LECTURE 23: INFORMATION THEORY REVIEW
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 05: THRESHOLD DECODING
LECTURE 11: Exam No. 1 Review
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Bayesian Decision Theory
Presentation transcript:

Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for Images URL:.../publications/courses/ece_8443/lectures/current/lecture_05.ppt.../publications/courses/ece_8443/lectures/current/lecture_05.ppt ECE 8443 – Pattern Recognition LECTURE 05: CONTINUOUS FEATURES

Generalization of the preceding ideas:  Use of more than one feature (e.g., length and lightness)  Use more than two states of nature (e.g., N-way classification)  Allowing actions other than a decision to decide on the state of nature (e.g., rejection: refusing to take an action when alternatives are close or confidence is low)  Introduce a loss of function which is more general than the probability of error (e.g., errors are not equally costly)  Let us replace the scalar x by the vector x in a d-dimensional Euclidean space, R d, called the feature space. 05: CONTINUOUS FEATURES GENERALIZATION OF TWO-CLASS PROBLEM

05: CONTINUOUS FEATURES LOSS FUNCTION Let {  1,  2,…,  c } be the set of “c” categories Let {  1,  2,…,  a } be the set of “a” possible actions Let (  i |  j ) be the loss incurred for taking action  i when the state of nature is  j The posterior, P(  j |x), can be computed from Bayes formula: where the evidence is: The expected loss from taking action  i is:

05: CONTINUOUS FEATURES BAYES RISK An expected loss is called a risk. R(  i |x) is called the conditional risk. A general decision rule is a function  (x) that tells us which action to take for every possible observation. The overall risk is given by: If we choose  (x) so that R(  i (x)) is as small as possible for every x, the overall risk will be minimized. Compute the conditional risk for every  and select the action that minimizes R(  i |x). This is denoted R*, and is referred to as the Bayes risk. The Bayes risk is the best performance that can be achieved.

05: CONTINUOUS FEATURES TWO-CATEGORY CLASSIFICATION Let  1 correspond to  1,  2 to  2, and ij = (  i |  j ) The conditional risk is given by: R(  1 |x) =  11 P(  1 |x) + 12 P(  2 |x) R(  2 |x) =  21 P(  1 |x) + 22 P(  2 |x) Our decision rule is: choose  1 if: R(  1 |x) < R(  2 |x); otherwise decide  2 This results in the equivalent rule: choose  1 if: ( ) P(x|  1 ) > ( ) P(x|  2 ); otherwise decide  2 If the loss incurred for making an error is greater than that incurred for being correct, the factors ( ) and ( ) are positive, and the ratio of these factors simply scales the posteriors.

05: CONTINUOUS FEATURES LIKELIHOOD By employing Bayes formula, we can replace the posteriors by the prior probabilities and conditional densities: choose  1 if: ( ) p(x|  1 ) P(  1 ) > ( ) p(x|  2 ) P(  2 ); otherwise decide  2 If is positive, our rule becomes: If the loss factors are identical, and the prior probabilities are equal, this reduces to a standard likelihood ratio:

Consider a symmetrical or zero-one loss function: 05: CONTINUOUS FEATURES MINIMUM ERROR RATE The conditional risk is: The conditional risk is the average probability of error. To minimize error, maximize P(  i |x) — also known as maximum a posteriori decoding (MAP).

05: CONTINUOUS FEATURES LIKELIHOOD RATIO Minimum error rate classification: choose  i if: P(  i | x) > P(  j | x) for all j  i

05: CONTINUOUS FEATURES MINIMAX CRITERION Design our classifier to minimize the worst overall risk (avoid catastrophic failures) Factor overall risk into contributions for each region: Using a simplified notation (Van Trees, 1968):

05: CONTINUOUS FEATURES MINIMAX CRITERION We can rewrite the risk: Note that I 11 =1-I 21 and I 22 =1-I 12 : We make this substitution because we want the risk in terms of error probabilities and priors. Multiply out, add and subtract P 1 21, and rearrange:

Note P 1 =1- P 2: 05: CONTINUOUS FEATURES EXPANSION OF THE RISK FUNCTION

Note that the risk is linear in P 2: 05: CONTINUOUS FEATURES EXPLANATION OF THE RISK FUNCTION If we can find a boundary such that the second term is zero, then the minimax risk becomes: For each value of the prior, there is an associated Bayes error rate. Minimax: find the maximum Bayes error for the prior P 1, and then use the corresponding decision region.

05: CONTINUOUS FEATURES NEYMAN-PEARSON CRITERION Guarantee the total risk is less than some fixed constant (or cost). Minimize the risk subject to the constraint: (e.g., must not misclassify more than 1% of salmon as sea bass) Typically must adjust boundaries numerically. For some distributions (e.g., Gaussian), analytical solutions do exist.