1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.

Slides:



Advertisements
Similar presentations
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Advertisements

Statistical Significance and Performance Measures
Learning Algorithm Evaluation
Evaluation of segmentation. Example Reference standard & segmentation.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Curva ROC figuras esquemáticas Curva ROC figuras esquemáticas Prof. Ivan Balducci FOSJC / Unesp.
Receiver Operating Characteristic (ROC) Curves
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Jeremy Wyatt Thanks to Gavin Brown
ROC & AUC, LIFT ד"ר אבי רוזנפלד.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
Evaluation – next steps
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
הערכת טיב המודל F-Measure, Kappa, Costs, MetaCost ד " ר אבי רוזנפלד.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Chapter 10 Performance Metrics. Introduction Sometimes, measuring how well a system is performing is relatively straightforward: we calculate a “percent.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
Experimental Evaluation of Learning Algorithms Part 1.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
F. Provost and T. Fawcett. Confusion Matrix 2Bitirgen - CS678.
Evaluating Results of Learning Blaž Zupan
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #50 Binary.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Spooky Stuff in Metric Space. Spooky Stuff Spooky Stuff Data Mining in Metric Space Rich Caruana Alex Niculescu Cornell University.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Data Analytics CMIS Short Course part II Day 1 Part 4: ROC Curves Sam Buttrey December 2015.
Performance measures Morten Nielsen, CBS, Department of Systems Biology, DTU.
Performance Indices for Binary Classification 張智星 (Roger Jang) 多媒體資訊檢索實驗室 台灣大學 資訊工程系.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Sensitivity, Specificity, and Receiver- Operator Characteristic Curves 10/10/2013.
Information Organization: Evaluation of Classification Performance.
Lecture 1.31 Criteria for optimal reception of radio signals.
Evaluating Classifiers
Evaluation – next steps
Evaluating Results of Learning
Chapter 7 – K-Nearest-Neighbor
Measuring Success in Prediction
Machine Learning Week 10.
Experiments in Machine Learning
Evaluating Classifiers (& other algorithms)
Learning Algorithm Evaluation
Computational Intelligence: Methods and Applications
Roc curves By Vittoria Cozza, matr
Evaluating Classifiers
More on Maxent Env. Variable importance:
Information Organization: Evaluation of Classification Performance
Presentation transcript:

1 Performance Measures for Machine Learning

2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point ROC –ROC Area

3 Target: 0/1, -1/+1, True/False, … Prediction = f(inputs) = f(x): 0/1 or Real Threshold: f(x) > thresh => 1, else => 0 threshold(f(x)): 0/1 #right / #total p(“correct”): p(threshold(f(x)) = target) Accuracy

4 Confusion Matrix Predicted 1Predicted 0 True 0 True 1 ab cd correct incorrect accuracy = (a+d) / (a+b+c+d) threshold

5 Prediction Threshold Predicted 1 Predicted 0 True 0 True 1 0b 0 d threshold > MAX(f(x)) all cases predicted 0 (b+d) = total accuracy = %False = %0’s Predicted 1 Predicted 0 True 0 True 1 a0 c 0 threshold < MIN(f(x)) all cases predicted 1 (a+c) = total accuracy = %True = %1’s

6 18% 1’s in data 82% 0’s in data optimal threshold

7 threshold demo

8 Problems with Accuracy Assumes equal cost for both kinds of errors –cost(b-type-error) = cost (c-type-error) is 99% accuracy good? –can be excellent, good, mediocre, poor, terrible –depends on problem is 10% accuracy bad? –information retrieval BaseRate = accuracy of predicting predominant class (on most problems obtaining BaseRate accuracy is easy)

9 Percent Reduction in Error 80% accuracy = 20% error suppose learning increases accuracy from 80% to 90% error reduced from 20% to 10% 50% reduction in error 99.90% to 99.99% = 90% reduction in error 50% to 75% = 50% reduction in error can be applied to many other measures

10 Costs (Error Weights) Predicted 1Predicted 0 True 0 True 1 wawa wbwb wcwc wdwd Often W a = W d = zero and W b ≠ W c ≠ zero

11

12

13 Lift not interested in accuracy on entire dataset want accurate predictions for 5%, 10%, or 20% of dataset don’t care about remaining 95%, 90%, 80%, resp. typical application: marketing how much better than random prediction on the fraction of the dataset predicted true (f(x) > threshold)

14 Lift Predicted 1Predicted 0 True 0 True 1 ab cd threshold

15 lift = 3.5 if mailings sent to 20% of the customers

16 Lift and Accuracy do not always correlate well Problem 1 Problem 2 (thresholds arbitrarily set at 0.5 for both lift and accuracy)

17 Precision and Recall typically used in document retrieval Precision: –how many of the returned documents are correct –precision(threshold) Recall: –how many of the positives does the model return –recall(threshold) Precision/Recall Curve: sweep thresholds

18 Precision/Recall Predicted 1Predicted 0 True 0 True 1 ab cd threshold

19

20 Summary Stats: F & BreakEvenPt harmonic average of precision and recall

21 better performance worse performance

22 F and BreakEvenPoint do not always correlate well Problem 1 Problem 2

23 Predicted 1Predicted 0 True 0 True 1 true positive false negative false positive true negative Predicted 1Predicted 0 True 0 True 1 hits misses false alarms correct rejections Predicted 1Predicted 0 True 0 True 1 P(pr1|tr1) P(pr0|tr1) P(pr0|tr0)P(pr1|tr0) Predicted 1Predicted 0 True 0 True 1 TP FN TNFP

24 ROC Plot and ROC Area Receiver Operator Characteristic Developed in WWII to statistically model false positive and false negative detections of radar operators Better statistical foundations than most other measures Standard measure in medicine and biology Becoming more popular in ML

25 ROC Plot Sweep threshold and plot –TPR vs. FPR –Sensitivity vs. 1-Specificity –P(true|true) vs. P(true|false) Sensitivity = a/(a+b) = Recall = LIFT numerator 1 - Specificity = 1 - d/(c+d)

26 diagonal line is random prediction

27 Properties of ROC ROC Area: –1.0: perfect prediction –0.9: excellent prediction –0.8: good prediction –0.7: mediocre prediction –0.6: poor prediction –0.5: random prediction –<0.5: something wrong!

28 Properties of ROC Slope is non-increasing Each point on ROC represents different tradeoff (cost ratio) between false positives and false negatives Slope of line tangent to curve defines the cost ratio ROC Area represents performance averaged over all possible cost ratios If two ROC curves do not intersect, one method dominates the other If two ROC curves intersect, one method is better for some cost ratios, and other method is better for other cost ratios

29 Problem 1 Problem 2

30 Problem 1 Problem 2

31 Problem 1 Problem 2

32 Summary the measure you optimize to makes a difference the measure you report makes a difference use measure appropriate for problem/community accuracy often is not sufficient/appropriate ROC is gaining popularity in the ML community only accuracy generalizes to >2 classes!