Objectives: Chernoff Bound Bhattacharyya Bound ROC Curves Discrete Features Resources: V.V. – Chernoff Bound J.G. – Bhattacharyya T.T. – ROC Curves NIST.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Component Analysis (Review)
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Visual Recognition Tutorial
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Decision Theory Naïve Bayes ROC Curves
Visual Recognition Tutorial
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Optimal Bayes Classification
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 12 Sept 30, 2005 Nanjing University of Science & Technology.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Discriminant Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS Objectives: Typical Examples Performance Bounds ROC Curves Resources: D.H.S.: Chapter 2 (Part.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Lecture 2. Bayesian Decision Theory
Lecture 1.31 Criteria for optimal reception of radio signals.
Bayesian Estimation and Confidence Intervals
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
LECTURE 04: DECISION SURFACES
LECTURE 03: DECISION SURFACES
Special Topics In Scientific Computing
Pattern Recognition PhD Course.
LECTURE 05: THRESHOLD DECODING
Error rate due to noise In this section, an expression for the probability of error will be derived The analysis technique, will be demonstrated on a binary.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Machine Learning
Mathematical Foundations of BME
LECTURE 23: INFORMATION THEORY REVIEW
Parametric Methods Berlin Chen, 2005 References:
LECTURE 05: THRESHOLD DECODING
LECTURE 11: Exam No. 1 Review
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Bayesian Decision Theory
Presentation transcript:

Objectives: Chernoff Bound Bhattacharyya Bound ROC Curves Discrete Features Resources: V.V. – Chernoff Bound J.G. – Bhattacharyya T.T. – ROC Curves NIST – DET Curves AAAS - Verification URL:.../publications/courses/ece_8443/lectures/current/lecture_09.ppt.../publications/courses/ece_8443/lectures/current/lecture_09.ppt ECE 8443 – Pattern Recognition LECTURE 09: ERROR BOUNDS / DISCRETE FEATURES

Bayes decision rule guarantees lowest average error rate Closed-form solution for two-class Gaussian distributions Full calculation for high dimensional space difficult Bounds provide a way to get insight into a problem and engineer better solutions. Need the following inequality: 09: ERROR BOUNDS MOTIVATION Assume a  b without loss of generality: min[a,b] = b. Also, a  b (1-  ) = (a/b)  b and (a/b)   1. Therefore, b  (a/b)  b, which implies min[a,b]  a  b (1-  ). Apply to our standard expression for P(error).

09: ERROR BOUNDS CHERNOFF BOUND Recall: Note that this integral is over the entire feature space, not the decision regions (which makes it simpler). If the conditional probabilities are normal, this expression can be simplified.

09: ERROR BOUNDS CHERNOFF BOUND FOR NORMAL DENSITIES If the conditional probabilities are normal, our bound can be evaluated analytically: where: Procedure: find the value of  that minimizes exp(-k(  ), and then compute P(error) using the bound. Benefit: one-dimensional optimization using 

09: ERROR BOUNDS BHATTACHARYYA BOUND The Chernoff bound is loose for extreme values The Bhattacharyya bound can be derived by  = 0.5: where: These bounds can still be used if the distributions are not Gaussian (why? hint: maximum entropy). However, they might not be adequately tight.

09: ERROR BOUNDS RECEIVER OPERATING CHARACTERISITC How do we compare two decision rules if they require different thresholds for optimum performance? Consider four probabilities:

09: ERROR BOUNDS GENERAL ROC CURVES An ROC curve is typically monotonic but not symmetric: One system can be considered superior to another only if its ROC curve lies above the competing system for the operating region of interest.

09:DISCRETE FEATURES INTEGRALS BECOME SUMS For problems where features are discrete: Bayes formula involves probabilities (not densities): where Bayes rule remains the same: The maximum entropy distribution is a uniform distribution: P(x=x i ) = 1/N.

09: ERROR BOUNDS INTEGRALS BECOME SUMS Consider independent binary features: Assuming conditional independence: The likelihood ratio is: The discriminant function is: