The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.

Slides:



Advertisements
Similar presentations
Evaluating Classifiers
Advertisements

Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Learning Algorithm Evaluation
Evaluation of segmentation. Example Reference standard & segmentation.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Receiver Operating Characteristic (ROC) Curves
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Cbio course, spring 2005, Hebrew University (Alignment) Score Statistics.
Decision Theory Naïve Bayes ROC Curves
ROC Curves.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Jeremy Wyatt Thanks to Gavin Brown
ROC Curves.
Graphing Reciprocal Functions Reciprocal of a Linear Function y = x.
Sample Size Determination Ziad Taib March 7, 2014.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
1 CSI 5388: ROC Analysis (Based on ROC Graphs: Notes and Practical Considerations for Data Mining Researchers by Tom Fawcett, (Unpublished) January 2003.
Evaluation – next steps
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
F. Provost and T. Fawcett. Confusion Matrix 2Bitirgen - CS678.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #50 Binary.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
7. Performance Measurement
Evaluating Classifiers
Performance Evaluation 02/15/17
7CCSMWAL Algorithmic Issues in the WWW
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Measuring Success in Prediction
Experiments in Machine Learning
Evaluating Classifiers (& other algorithms)
Learning Algorithm Evaluation
Model Evaluation and Selection
Computational Intelligence: Methods and Applications
Dr. Sampath Jayarathna Cal Poly Pomona
Roc curves By Vittoria Cozza, matr
Evaluating Classifiers
Dr. Sampath Jayarathna Cal Poly Pomona
More on Maxent Env. Variable importance:
Presentation transcript:

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 17: Classifier Performance: ROC, Precision-Recall, and All That

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 2 Unit 17: Classifier Performance: ROC, Precision-Recall, and All That (Summary) The performance of a classifier is a 2x2 contingency table –the “confusion matrix” of TP, FP, FN, TN Most classifiers can be varied from “conservative” to “liberal” –call a larger number of TPs at the expense of also a larger number of FPs –it’s a one-parameter curve so one classifier might dominate another or there might be no clear ordering between them There is a thicket of terminology –TPR, FPR, PPV, NPV, FDR, accuracy –sensitivity, specificity –precision, recall ROC plots TPR (y-axis) as a function of FPR (x-axis) –0,0 monotonically to 1,1 –in practice, convex because can be trivially upgraded to its convex hull –but can be misleading when the numbers of actual P’s and N’s are very different Precision-Recall plots are designed to be useful in just that case –can go back and forth between Precision-Recall and ROC curves –if a classifier dominates in one, it dominates in the other PPV vs. NPV is another equivalent performance measure

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 3 A (binary) classifier classifies data points as + or − If we also know the true classification, the performance of the classifier is a 2x2 contingency table, in this application usually called a confusion matrix. good! bad! (Type I error) bad! (Type II error) As we saw, this kind of table has many other uses: treatment vs. outcome, clinical test vs. diagnosis, etc.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 4 Most classifiers have a “knob” or threshold that you can adjust: How certain do they have to be before they classify a “+”? To get more TP’s, you have to let in some FP’s! Notice there is just one free parameter, think of it as TP, since FP(TP) = [given by algorithm] TP + FN = P (fixed number of actual positives, column marginal) FP + TN = N (fixed number of actual negatives, column marginal) So all scalar measures of performance are functions of one free parameter (i.e., curves). And the points on any such curve are in 1-to-1 correspondence with those on any other such curve. If you ranked some classifiers by how good they are, you might get a different rankings at different points on the scale. On the other hand, one classifier might dominate another at all points on the scale. more conservative more liberal TP FP FN TN Cartoon, not literal:

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 5 Terminology used to measure the performance of classifiers Different combinations of ratios have been given various names. All vary between 0 and 1. A performance curve picks one as the independent variable and looks at another as the dependent variable. Dark color is numerator, dark and light color is denominator. Blue parameters: 1 is good. Red: 0 is good. “one minus”

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 6 ROC (“Receiver Operating Characteristic”) curves plot TPR vs. FPR as the classifier goes from “conservative” to “liberal” blue dominates red and green neither red nor green dominate the other You could get the best of the red and green curves by making a hybrid or “Frankenstein” classifier that switches between strategies at the cross-over points.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 7 ROC curves can always be “upgraded” to their convex hull by replacing any concave portions by a random sample List points classified as + by B but not A. Start up the curve to A. When you reach A, start adding a fraction of them (increasing from 0 to 1) randomly, until you reach B. Continue on the curve from B. Using data with known “ground truth” answers, you can find what “knob” settings correspond to A and B. Then you can apply the convex classifier to cases where you don’t know the answers.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 8 Since ROC curves don’t explicitly show any dependence on the constant P/N (ratio of actual + to – in the sample) they can be misleading if you care about FP versus TP lam = (0:0.01:1); fpr = * lam; tpr = 1 - ( *(1-lam)); fpr(1) = 0; fpr(end) = 1; tpr(1) = 0; tpr(end) = 1; plot(fpr,tpr) Suppose you have a test for Alzheimer’s whose false positive rate can be varied from 5% to 25% as the false negative rate varies from 25% to 5% (suppose linear dependences on both): FPR = 0.15, TPR=0.85 Now suppose you try the test on a population of 10,000 people, 1% of whom actually are Alzheimer’s positive: FP swamps TP by ~17:1. You’ll be telling 17 people that they might have Alzheimer’s for every one who actually does. It is unlikely that your test will be used. In a case like this, ROC, while correct, somewhat misses the point.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 9 Precision-Recall curves overcome this issue by comparing TP with FN and FP prec = tpr*100./(tpr*100+fpr*9900); prec(1) = prec(2); % fix up 0/0 reca = tpr; plot(reca,prec) Continue our toy example: note that P and N now enter never better than ~ By the way, this shape “cliff” is what the ROC convexity constraint looks like in a Precision-Recall plot. It’s not very intuitive.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 10 For fixed marginals P,N the points on the ROC curve are in 1-to-1 correspondence with the points on the Precision-Recall curve. That is, both display the same information. You can go back and forth. pre, rec from TPR, FPR TPR, FPR from pre, rec pre = TPRP TPRP + FPRN rec = TPR rec ( 1 ¡ pre ) pre P N = FPR It immediately follows that if one curve dominates another in ROC space, it also dominates in Precision-Recall space. (Because a crossing in one implies a crossing in the other, by the above equations.) But for curves that cross, the metrics in one space don’t easily map to the other. For example, people sometimes use “area under the ROC curve”. This doesn’t correspond to “area under the Precision-Recall curve”, or to anything simple.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 11 One also sees used PPV and NPV (more often as a pair of numbers than as a curve) PPV: given a positive test, how often does the patient have the disease. NPV: given a negative test, how often is the patient disease-free. PPV = NPV = 0.998

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 12 It’s easy to get from PPV,NPV to ROC or vice versa. Or, for that matter, any other of the parameterizations. In Mathematica, for example: