LECTURE 05: THRESHOLD DECODING

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Visual Recognition Tutorial
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning CMPT 726 Simon Fraser University
Decision Theory Naïve Bayes ROC Curves
Linear Discriminant Functions Chapter 5 (Duda et al.)
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
1 E. Fatemizadeh Statistical Pattern Recognition.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 12 Sept 30, 2005 Nanjing University of Science & Technology.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS Objectives: Typical Examples Performance Bounds ROC Curves Resources: D.H.S.: Chapter 2 (Part.
Machine Learning 5. Parametric Methods.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Objectives: Chernoff Bound Bhattacharyya Bound ROC Curves Discrete Features Resources: V.V. – Chernoff Bound J.G. – Bhattacharyya T.T. – ROC Curves NIST.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 2. Bayesian Decision Theory
Lecture 1.31 Criteria for optimal reception of radio signals.
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
LECTURE 03: DECISION SURFACES
Maximum Likelihood Estimation
Latent Variables, Mixture Models and EM
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Image Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
The basics of Bayes decision theory
Pattern Recognition and Machine Learning
Generally Discriminant Analysis
Bayesian Classification
Mathematical Foundations of BME
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
LECTURE 11: Exam No. 1 Review
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

LECTURE 05: THRESHOLD DECODING • Objectives: Gaussian Random Variables Maximum Likelihood Decoding Threshold Decoding Receiver Operating Curves Resources: D.H.S: Chapter 2 (Part 3) K.F.: Intro to PR X. Z.: PR Course A.N. : Gaussian Discriminants T.T. : ROC Curves NIST: DET Curves

Multicategory Decision Surfaces Define a set of discriminant functions: gi(x), i = 1,…, c Define a decision rule: choose ωi if: gi(x) > gj(x) ∨j ≠ i For a Bayes classifier, let gi(x) = -R(ωi|x) because the maximum discriminant function will correspond to the minimum conditional risk. For the minimum error rate case, let gi(x) = P(ωi|x), so that the maximum discriminant function corresponds to the maximum posterior probability. Choice of discriminant function is not unique: multiply or add by same positive constant Replace gi(x) with a monotonically increasing function, f(gi(x)).

Gaussian Classifiers The discriminant function can be reduced to: Since these terms are constant w.r.t. the maximization: We can expand this: The term xtx is a constant w.r.t. i, and μitμi is a constant that can be precomputed.

Threshold Decoding This has a simple geometric interpretation: The decision region when the priors are equal and the support regions are spherical is simply halfway between the means (Euclidean distance).

General Case for Gaussian Classifiers

Identity Covariance Case: Σi = σ2I This can be rewritten as:

Equal Covariances Case: Σi = Σ

Arbitrary Covariances

Receiver Operating Characteristic (ROC) How do we compare two decision rules if they require different thresholds for optimum performance? Consider four probabilities:

General ROC Curves An ROC curve is typically monotonic but not symmetric: One system can be considered superior to another only if its ROC curve lies above the competing system for the operating region of interest.

Sensitivity and Specificity

ROC Curves vs. Detection Error Tradeoff Curves Receiver Operating Characteristic (ROC): the true positive rate (TPR) against the false positive rate (FPR) Detection Error Tradeoff (DET) Curve: the false rejection rate vs. false acceptance rate

Detection Error Tradeoff Curves Real applications can be messy:

Typical Examples of 2D Classifiers

Summary Threshold Decoding: Data modeled as a multivariate Gaussian distribution can be classified using a simple threshold decoding scheme. The threshold is a function of the priors, means, and covariances. Receiver Operating Curve: a visualization that evaluates performance over a range of operating conditions. Only the ’customer’ can decide ultimately what is the correct operating point. Performance Evaluation: one algorithm can only be considered superior to another if the ROC curves are mutually exclusive (don’t overlap). In practice, this is rarely the case and specific operating points (e.g., equal error rates) are considered. Gaussian data can be classified using decision surfaces based on smooth polynomials (e.g. parabolas, lines). However, if the data is not Gaussian, more complicated decisions surfaces are required. In this class, we will explore many algorithms that can achieve arbitrarily complex decision surfaces. The price paid for this is a computationally-intensive training process and an optimization process that is very sensitive to initial conditions.

Discrete Features For problems where features are discrete: Bayes formula involves probabilities (not densities): where Bayes rule remains the same: The maximum entropy distribution is a uniform distribution:

Discriminant Functions For Discrete Features Consider independent binary features: Assuming conditional independence: The likelihood ratio is: The discriminant function is:

Error Bounds Bayes decision rule guarantees lowest average error rate Closed-form solution for two-class Gaussian distributions Full calculation for high dimensional space difficult Bounds provide a way to get insight into a problem and engineer better solutions. Need the following inequality: Assume a > b without loss of generality: min[a,b] = b. Also, aβb(1- β) = (a/b)βb and (a/b)β > 1. Therefore, b < (a/b)βb, which implies min[a,b] < aβb(1- β) . Apply to our standard expression for P(error).

Chernoff Bound Recall: Note that this integral is over the entire feature space, not the decision regions (which makes it simpler). If the conditional probabilities are normal, this expression can be simplified.

Chernoff Bound for Normal Densities If the conditional probabilities are normal, our bound can be evaluated analytically: where: Procedure: find the value of β that minimizes exp(-k(β ), and then compute P(error) using the bound. Benefit: one-dimensional optimization using β

Bhattacharyya Bound The Chernoff bound is loose for extreme values The Bhattacharyya bound can be derived by β = 0.5: where: These bounds can still be used if the distributions are not Gaussian (why? hint: Occam’s Razor). However, they might not be adequately tight.