Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

March 2006Alon Slapak 1 of 1 Bayes Classification A practical approach Example Discriminant function Bayes theorem Bayes discriminant function Bibliography.
Pattern Recognition and Machine Learning
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Visual Recognition Tutorial
How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Model Evaluation Metrics for Performance Evaluation
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Statistical Decision Theory, Bayes Classifier
Machine Learning CMPT 726 Simon Fraser University
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1st Day Lecture 1: Intro. Goal of Vision To understand and interpret the image. Images consist of many different patterns – grass, faces, crowds.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Principles of Pattern Recognition
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
2. Bayes Decision Theory Prof. A.L. Yuille Stat 231. Fall 2004.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Bayesian Decision Theory (Classification) 主講人:虞台文.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
INTRODUCTION TO Machine Learning 3rd Edition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS Objectives: Typical Examples Performance Bounds ROC Curves Resources: D.H.S.: Chapter 2 (Part.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Machine Learning 5. Parametric Methods.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Objectives: Chernoff Bound Bhattacharyya Bound ROC Curves Discrete Features Resources: V.V. – Chernoff Bound J.G. – Bhattacharyya T.T. – ROC Curves NIST.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Machine Learning – Classification David Fenyő
Probability Theory and Parameter Estimation I
Special Topics In Scientific Computing
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Computing and Statistical Data Analysis / Stat 8
LECTURE 05: THRESHOLD DECODING
INTRODUCTION TO Machine Learning 3rd Edition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 05: THRESHOLD DECODING
Presentation transcript:

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Bayes Decision Theory: Part II 1. Two-state case. Bounds for Risk. 2.Multiple Samples. 3.ROC curve and Signal Detection Theory.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Two-State Case Detect state Let loss function pay a penalty of 1 for misclassification, 0 otherwise. Risk becomes Error. Bayes Risk becomes Bayes Error. Want to put bounds on the error.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Error Bounds: Use bounds to estimate errors. Bayes error: By We have: with

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Chernoff and Bhatta (I) the Bhattarcharyya bound with Bhattarcharyya coefficient : (II) the Chernoff bound With Chernoff Information :

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Chernoff and Bhatta Chernoff bound is tighter than the Bhatta bound. Both bounds are often good approximations – see Duda, Hart, Stork (pp 44, 48 example 1). There is also a lower bound: Bhatta and Chernoff will appear as exact errors rates for many samples.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Multiple Samples N Samples All from =A, or all from =B. (Bombers or Birds). Independence Assumption.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Multiple Samples Prior becomes unimportant for large N. Task becomes easier. Gaussian example: Then where

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Probabilities of N Samples Posterior Distributions tend to Gaussians. (Central Limit Theorem). (Assumes independence or semi- independence). Results for N=0,1,2,3,50,200. (Left to Right, Top to Bottom).

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Error Rates for Large N The error rate E(N) decreases exponentially with the number N of samples. The Chernoff information: Recall for a single sample we have:

Lecture notes for Stat 231: Pattern Recognition and Machine Learning ROC curves Receiver Operator Characteristics (ROC) curves are more general than Bayes risk. Compare the performance of a human observer to Bayesian ideal for bright/dim light test. Suppose human does worse than Bayes risk-- then maybe this is only decision bias.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning ROC Curves: For two-state problems, the Bayes decision rule is where T depends on the priors and the loss function. The observer may use the correct log-likelihood ratio, but have the wrong threshold. E.g. the observer’s loss function may penalize false negatives (trigger-shy) or false positives (trigger-happy).

Lecture notes for Stat 231: Pattern Recognition and Machine Learning ROC Curves The ROC curve plots the proportion of correct responses (hits) against the false positives as the threshold T changes. Requires altering the loss function of observers by rewards (chocolate) and penalties (electric shocks). The ROC curve gives information which is independent of the observer’s loss function.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning ROC Curves. Plot hits against false positives. For T large & +ve, bottom left of curve. T large & -ve, top right of curve. Tangent at 45 deg.s at T=0.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Example: Boundary Detection 1. The boundaries of objects (right) usually occur where the image intensity gradient is large (left).

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Example: Boundary Detection 2. Learn the probability distributions for intensity gradient on and off labeled edges.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Boundary Detection 3. Perform edge detection by log-likelihood ratio test.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning ROC Curves: Special case: the likelihood functions are Gaussians with different means but same variance. Important in Psychology. See Duda, Hart, Stork. The Bayes Error can be computed from ROC curve. ROC curves distinguish between Discriminability and Decision Bias.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning. Summary Bounds on Error rates for single data. Bhatta and Chernoff Bounds. Multiple Samples. Error rates fall off exponentially with number of samples. Chernoff coefficient. ROC curves (Signal Detection Theory).