ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.

Slides:



Advertisements
Similar presentations
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Advertisements

Learning Algorithm Evaluation
Evaluation of segmentation. Example Reference standard & segmentation.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Lecture 22: Evaluation April 24, 2010.
CMPUT 466/551 Principal Source: CMU
What is Statistical Modeling
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Classification and risk prediction
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Statistical Decision Theory, Bayes Classifier
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
Decision Theory Naïve Bayes ROC Curves
ROC Curves.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Jeremy Wyatt Thanks to Gavin Brown
Chapter 5 Data mining : A Closer Look.
Decision Tree Models in Data Mining
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
1 CSI 5388: ROC Analysis (Based on ROC Graphs: Notes and Practical Considerations for Data Mining Researchers by Tom Fawcett, (Unpublished) January 2003.
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
Criteria for Assessment of Performance of Cancer Risk Prediction Models: Overview Ruth Pfeiffer Cancer Risk Prediction Workshop, May 21, 2004 Division.
Evaluation – next steps
Non-Traditional Metrics Evaluation measures from the Evaluation measures from the medical diagnostic community medical diagnostic community Constructing.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
ROC 1.Medical decision making 2.Machine learning 3.Data mining research communities A technique for visualizing, organizing, selecting classifiers based.
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Experimental Evaluation of Learning Algorithms Part 1.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
F. Provost and T. Fawcett. Confusion Matrix 2Bitirgen - CS678.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
1 ASSESSING THE PERFORMANCE OF MEDICAL DIAGNOSTIC SYSTEMS: THE RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE JOSEPH GEORGE CALDWELL, PH.D. 27 FEBRUARY.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.
Evaluating Classification Performance
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #50 Binary.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Canadian Bioinformatics Workshops
Lecture 1.31 Criteria for optimal reception of radio signals.
Evaluating Classifiers
Evaluation – next steps
Trees, bagging, boosting, and stacking
Verifying and interpreting ensemble products
Performance Measures II
Dipartimento di Ingegneria «Enzo Ferrari»,
Data Mining Lecture 11.
Data Mining Classification: Alternative Techniques
Learning Algorithm Evaluation
Analyzing the Association Between Categorical Variables
Roc curves By Vittoria Cozza, matr
Presentation transcript:

ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005

Why Should I Care? Imagine you have 2 different probabilistic classification models –e.g. logistic regression vs. neural network How do you know which one is better? How do you communicate your belief? Can you provide quantitative evidence beyond a gut feeling and subjective interpretation?

Recall Basics: Contingencies MODEL PREDICTED It’s NOT a Heart Attack Heart Attack!!! GOLD STANDARD TRUTH Was NOT a Heart Attack AB Was a Heart Attack CD

Some Terms MODEL PREDICTED NO EVENTEVENT GOLD STANDARD TRUTH NO EVENT TRUE NEGATIVE B EVENTC TRUE POSITIVE

Some More Terms MODEL PREDICTED NO EVENTEVENT GOLD STANDARD TRUTH NO EVENTA FALSE POSITIVE (Type 1 Error) EVENT FALSE NEGATIVE (Type 2 Error) D

Accuracy What does this mean? What is the difference between “accuracy” and an “accurate prediction”? Contingency Table Interpretation (True Positives) + (True Negatives) + (False Positives) + (False Negatives) Is this a good measure? (Why or Why Not?)

Note on Discrete Classes TRADITION … Show contingency table when reporting predictions of model. BUT … probabilistic models do not provide discrete calculations of the matrix cells!!! IN OTHER WORDS … Regression does not report the number of individuals predicted positive (e.g. has a heart attack) … well, not really INSTEAD … report probability the output will be certain variable (e.g. 1 or 0)

Visual Perspective ??

ROC Curves Originated from signal detection theory –Binary signal corrupted by Guassian noise –What is the optimal threshold (i.e. operating point)? Dependence on 3 factors –Signal Strength –Noise Variance –Personal tolerance in Hit / False Alarm Rate

ROC Curves Receiver operator characteristic Summarize & present performance of any binary classification model Models ability to distinguish between false & true positives

Use Multiple Contingency Tables Sample contingency tables from range of threshold/probability. TRUE POSITIVE RATE (also called SENSITIVITY) True Positives (True Positives) + (False Negatives) FALSE POSITIVE RATE (also called 1 - SPECIFICITY) False Positives (False Positives) + (True Negatives) Plot Sensitivity vs. (1 – Specificity) for sampling and you are done

Data-Centric Example TRUTHLOGISTICNEURAL

ROC Rates LOGISTIC REGRESSIONNEURAL NETWORK THRESHOLDTP-RateFP-RateTP-RateFP-Rate

ROC Plot LOGISTIC NEURAL

Sidebar: Use More Samples (These are plots from a much larger dataset – See Malin 2005)

ROC Quantification Area Under ROC Curve –Use quadrature to calculate the area –e.g. trapz (trapezoidal rule) function in Matlab will work Example – Appears “Neural Network” model is better. AREA UNDER ROC CURVE LOGISTIC NEURAL0.7679

Theory: Model Optimality Classifiers on convex hull are always “optimal” –e.g. Net & Tree Classifiers below convex hull are always “suboptimal” –e.g. Naïve Bayes Naïve Bayes Decision Tree Neural Net

Building Better Classifiers Classifiers on convex hull can be combined to form a strictly dominant hybrid classifier –ordered sequence of classifiers –can be converted into “ranker” Decision Tree Neural Net

Some Statistical Insight Curve Area: –Take random healthy patient  score of X –Take random heart attack patient  score of Y –Area estimate of P [Y > X] Slope of curve is equal to likelihood: P (score | Signal) P (score | Noise) ROC graph captures all information in conting. table –False negative & true negative rates are complements of true positive & false positive rates, resp.

Can Always Quantify Best Operating Point When misclassification costs are equal, best operating point is … 45  tangent to curve closest to (0,1) coord. Verify this mathematically (economic interpretation) Why?

Quick Question Are ROC curves always appropriate? Subjective operating points? Must weight the tradeoffs between false positives and false negatives –ROC curve plot is independent of the class distribution or error costs This leads into utility theory (not touching this today)

Much Much More on ROC Oh, if only I had more time. You should also look up and learn about: –Iso-accuracy lines –Skew distributions and why the 45  line isn’t always “best” –Convexity vs. non-convexity vs. concavity –Mann-Whitney-Wilcoxon sum of ranks –Gini coefficient –Calibrated thresholds –Averaging ROC curves Precision-Recall (THIS IS VERY IMPORTANT) Cost Curves

Some References Good Bibliography: Drummond C and Holte R. What ROC curves can and can’t do (and cost curves can). In Proceedings of the Workshop on ROC Analysis in AI; in conjunction with the European Conference on AI. Valencia, Spain Malin B. Probabilistic prediction of myocardial infarction: logistic regression versus simple neural networks. Data Privacy Lab Working Paper WP-25, School of Computer Science, Carnegie Mellon University. Sept McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 1984; 4: Provost F and Fawcett T. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15 th International Conference on Machine Learning. Madison, Wisconsin. 1998: Swets J. Measuring the accuracy of diagnostic systems. Science. 1988; 240(4857): (based on his 1967 book Information Retrieval Systems)