Notes on Measuring the Performance of a Binary Classifier David Madigan.

Slides:



Advertisements
Similar presentations
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Advertisements

...visualizing classifier performance in R Tobias Sing, Ph.D. (joint work with Oliver Sander) Modeling & Simulation Novartis Pharma AG 3 rd BaselR meeting.
...visualizing classifier performance Tobias Sing Dept. of Modeling & Simulation Novartis Pharma AG Joint work with Oliver Sander (MPI for Informatics,
Brief introduction on Logistic Regression
Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004
Chapter 5 – Evaluating Classification & Predictive Performance
1 BUS 297D: Data Mining Professor David Mease Lecture 8 Agenda: 1) Reminder about HW #4 (due Thursday, 10/15) 2) Lecture over Chapter 10 3) Discuss final.
Logistic Regression.
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Model Evaluation Metrics for Performance Evaluation
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Flexible Metric NN Classification based on Friedman (1995) David Madigan.
ROC Curve and Classification Matrix for Binary Choice Professor Thomas B. Fomby Department of Economics SMU Dallas, TX February, 2015.
Chapter 9.2 ROC Curves How does this relate to logistic regression?
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
1 Relationships We have examined how to measure relationships between two categorical variables (chi-square) one categorical variable and one measurement.
Decision Tree Models in Data Mining
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
Computer Vision Lecture 8 Performance Evaluation.
Evaluating Classifiers
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Machine learning system design Prioritizing what to work on
Model Evaluation. CRISP-DM CRISP-DM Phases Business Understanding – Initial phase – Focuses on: Understanding the project objectives and requirements.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
SPH 247 Statistical Analysis of Laboratory Data. Binary Classification Suppose we have two groups for which each case is a member of one or the other,
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Data Analytics CMIS Short Course part II Day 1 Part 4: ROC Curves Sam Buttrey December 2015.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Model Evaluation Saed Sayad
Information Organization: Evaluation of Classification Performance.
David Housman for Math 323 Probability and Statistics Class 05 Ion Sensitive Electrodes.
Machine Learning – Classification David Fenyő
Evaluating Classifiers
Measuring Success in Prediction
(classification & regression trees)
LESSON 21: REGRESSION ANALYSIS
Probabilistic forecasts
Evaluating Classifiers (& other algorithms)
Logistic Regression.
For the data set classify_regress.dat, find w.
ROC Curves and Operating Points
Model Evaluation and Selection
Mathematical Foundations of BME
Prediction of in-hospital mortality after ruptured abdominal aortic aneurysm repair using an artificial neural network  Eric S. Wise, MD, Kyle M. Hocking,
Roc curves By Vittoria Cozza, matr
Evaluating Classifiers
—ROC curves for each simple test compared with NCS (gold standard) plotting the sensitivity versus 1-specificity (the false-positive rate) for different.
Machine Learning in Business John C. Hull
ROC Curves and Operating Points
Information Organization: Evaluation of Classification Performance
Presentation transcript:

Notes on Measuring the Performance of a Binary Classifier David Madigan

Training Data for a Logistic Regression Model

Test Data predicted probabilities actual outcome predicted outcome Suppose we use a cutoff of 0.5…

a b c d actual outcome predicted outcome More generally… misclassification rate: b + c a+b+c+d sensitivity: a a+ca+c (aka recall) specificity: d b+db+d predicitive value positive: a a+ba+b (aka precision)

actual outcome predicted outcome Suppose we use a cutoff of 0.5… sensitivity: = 100% specificity: = 75% 9 9+3

actual outcome predicted outcome Suppose we use a cutoff of 0.8… sensitivity: = 75% specificity: = 83%

Note there are 20 possible thresholds ROC computes sensitivity and specificity for all possible thresholds and plots them Note if threshold = minimum c=d=0 so sens=1; spec=0 If threshold = maximum a=b=0 so sens=0; spec=1 a b c d actual outcome

“Area under the curve” is a common measure of predictive performance So is squared error:  (y i -yhat) 2 also known as the “Brier Score”