A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b, Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
An Overview of Machine Learning
Lecture 3 Nonparametric density estimation and classification
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
CS292 Computational Vision and Language Pattern Recognition and Classification.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
1 A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber.
Statistical Decision Theory, Bayes Classifier
Evaluating Hypotheses
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Visual Recognition Tutorial
Biometric ROC Curves Methods of Deriving Biometric Receiver Operating Characteristic Curves from the Nearest Neighbor Classifier Robert Zack dissertation.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
This week: overview on pattern recognition (related to machine learning)
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
1 E. Fatemizadeh Statistical Pattern Recognition.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Visual Information Systems Recognition and Classification.
Optimal Bayes Classification
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Feature extraction using fuzzy complete linear discriminant analysis The reporter : Cui Yan
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Validation methods.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Usman Roshan Dept. of Computer Science NJIT
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Overview of Supervised Learning
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Bias and Variance of the Estimator
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
CONCEPTS OF ESTIMATION
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Model generalization Brief summary of methods
Nonparametric density estimation and classification
Hairong Qi, Gonzalez Family Professor
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b, Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics & Statistics b The Johns Hopkins University Applied Physics Laboratory Abstract PMH Distribution Given finite length classifier training set, we propose a new estimation approach that provides an interval estimate of the Bayes’-optimal classification error L*, by: Assuming power-law decay for unconditional error rate of k- nearest neighbor (kNN) classifier Constructing bootstrap-sampled training sets of varying size Evaluating kNN classifier on bootstrap training sets to estimate unconditional error rate Fitting resulting kNN error rate decay as function of training set size to assumed power-law form Standard kNN rule provides upper bound on L* Hellman’s (k,k’) nearest neighbor rule with reject option provides lower bound on L* Result is asymptotic interval estimate of L* using finite sample We apply this L* interval estimator to two classification datasets Motivation Approach: Part 1 Pima Indians Knowledge of Bayes’-optimal classification error L* tells us the best any classification rule could do on a given classification problem: Difference between your classifier’s error rate L n and L* indicates how much improvement is possible by changes to your classifier, for a fixed feature set If L* is small and |L n -L*| is large, then it’s worth spending time & money to improve your classifier Knowledge of Bayes’-optimal classification error L* indicates how good our features are for discriminating between our (two) classes: If L* is large and |L n -L*| is small, then better to spend time & money finding better features (changing F XY ) than improving your classifier Estimate of Bayes’ error L* is useful for guiding where to invest time & money for classification rule improvement and feature development Theory Model & Notation We have training data: Conditional probability of error for kNN rule: Finite sample: Asymptotic: Feature Vector: Class Label: We have testing data: We build k-nearest neighbor (kNN) classification rule: denoted as Unconditional probability of error for kNN rule: Finite sample: Asymptotic: Empirical distribution puts mass 1/n on n training samples No approach to estimate Bayes’ error can work for all joint distributions F XY : Devroye 1982: For any (fixed) integer n,  >0, and classification rule g n there exists a distribution F XY with Bayes’ error L*=0 such that there exist conditions on F XY for which our technique applies Asymptotic kNN-rule error rates form an interval bound on L*: Devijver 1979: For fixed k:, where lower bound is asymptotic error rate of the kNN-rule with reject option (Hellman 1970) if estimate asymptotic rates w/ finite sample, we have L* estimate KNN-rule’s unconditional error follows known form for class of distributions F XY : Snapp & Venkatesh 1998: Under regularity conditions on F XY, the finite sample unconditional error rate of the kNN-rule, for fixed k, follows the asymptotic expansion there exists known parametric form for kNN-rule’s error rate decay 1.Construct B bootstrap-sampled training datasets of size n j from D n using For each bootstrap-constructed training dataset, estimate kNN-rule conditional error rate on test set T m, yielding 2.Estimate mean & variance of for training sample size n j : Mean provides estimate of unconditional error rate Variance used for weighted fitting of error rate decay curve 3.Repeat steps 1 and 2 for desired training sample sizes : Yields estimates 4.Construct estimated unconditional error rate decay curve versus training sample size n Approach: Part 2 1.Assume kNN-rule error rates decay according to simple power-lay form: 2.Perform weighted nonlinear least squares fit to constructed error rate curve: Use variance of bootstrapped conditional error rate estimates as weights 3.Resulting forms upper bound for L*: Strong assumption on form of error rate decay enables estimate of asymptotic error rate using only a finite sample 4.Repeat entire procedure using Hellman’s (k,k’) nearest neighbor rule with reject option to form lower bound estimate for L*: This yields interval estimate for Bayes’ classification error as Priebe, Marchette, Healy (PMH) distribution has known L* = , d=6: Training size n = 2000 Test set size m = 2000 Symbols are bootstrap estimates of unconditional error rate Interval estimate: UCI Pima Indian Diabetes distribution has unknown L*, d=8: Training size n = 500 Test set size m = 268 Symbols are bootstrap estimates of unconditional error rate Interval estimate: References [1] Devijver, P. “New error bounds with the nearest neighbor rule,” IEEE Trans. on Informtion Theory, 25, [2] Devroye, L. “Any discrimination rule can have an arbitrarily bad probability of error for finite sample size,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 4, [3] Hellman, M. “The nearest neighbor classification rule with a reject option,” IEEE Trans. on Systems Science & Cybernetics, 6, [4] Priebe, C., D. Marchette, & D. Healy. “Integrated sensing and processing decision trees,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 26, [5] Snapp, R. & S. Venkatesh. “Asymptotic expansions of the k nearest neighbor risk,” Annals of Statistics, 26, 1998.