Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Slides:

Advertisements

Similar presentations

Learning Algorithm Evaluation

Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Model Evaluation Metrics for Performance Evaluation

Model Evaluation Instructor: Qiang Yang

CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.

Model Evaluation Instructor: Qiang Yang

Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.

Evaluation and Credibility How much should we believe in what was learned?

ROC & AUC, LIFT ד"ר אבי רוזנפלד.

Evaluation – next steps

2015/7/15University of Waikato1 Significance tests Significance tests tell us how confident we can be that there really is a difference Null hypothesis:

Chapter 5 Data mining : A Closer Look.

Evaluation of Learning Models

Data Mining – Credibility: Evaluating What’s Been Learned

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.

Today Evaluation Measures Accuracy Significance Testing

Evaluating Classifiers

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人：黃子齊

Evaluation – next steps

Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.

Slides for “Data Mining” by I. H. Witten and E. Frank.

1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.

Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =

הערכת טיב המודל F-Measure, Kappa, Costs, MetaCost ד " ר אבי רוזנפלד.

Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.

Experimental Evaluation of Learning Algorithms Part 1.

Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.

CpSc 810: Machine Learning Evaluation of Classifier.

Evaluating Results of Learning Blaž Zupan

Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.

1 Evaluation of Learning Models Literature: Literature: T. Mitchel, Machine Learning, chapter 5 T. Mitchel, Machine Learning, chapter 5 I.H. Witten and.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.

Evaluating Classification Performance

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Evaluation of learned models Kurt Driessens again with slides stolen from Evgueni Smirnov and Hendrik Blockeel.

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten and E. Frank.

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques

Data Science Credibility: Evaluating What’s Been Learned

Evaluation – next steps

Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.

Evaluating Results of Learning

Data Mining – Credibility: Evaluating What’s Been Learned

9. Credibility: Evaluating What’s Been Learned

Performance evaluation

Performance Measures II

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Classification: Alternative Techniques

Evaluation and Its Methods

Learning Algorithm Evaluation

Model Evaluation and Selection

Computational Intelligence: Methods and Applications

Evaluation and Its Methods

Evaluation and Its Methods

Data Mining Practical Machine Learning Tools and Techniques

Slides for Chapter 5, Evaluation

Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Predicting Probabilities ● Performance measure so far: success/fail rate ● Also called 0-1 loss function: ● Many classifiers estimate class probabilities ● Depending on the application, we might want to check the accuracy of the probability estimates ● 0-1 loss is not the right thing to use in those cases

Rodney Nielsen, Human Intelligence & Language Technologies Lab Quadratic Loss Function ● p ^ 1 … p ^ K the estimated class probability distribution for an instance (K= number classes) ● c is the index of the instance’s actual class ● p 1 … p K = 0, except for p c which is 1 ● Quadratic loss is: ● Want to minimize ● Can show that this is minimized when p ^ j = p j, the true probabilities

Rodney Nielsen, Human Intelligence & Language Technologies Lab Informational Loss Function ● The informational loss function is -log(p ^ c ), where c is the index of the instance’s actual class ● Number of bits required to communicate the actual class ● Let p 1 … p k be the true class probabilities ● Then the expected value for the loss function is: ● Justification: minimized when p ^ j = p j ● Difficulty: zero-frequency problem

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Counting the Cost ● In practice, different types of classification errors often incur different costs ● Examples:  Terrorist profiling ● “Not a terrorist” correct ~ % of the time  Loan decisions  Oil-slick detection  Fault diagnosis  Promotional mailing

Rodney Nielsen, Human Intelligence & Language Technologies Lab Counting the Cost ● The confusion matrix: ● Each cell has its own cost or reward ● There are many other types of cost! ● E.g.: cost of collecting training data Actual class True negativeFalse positiveNo False negativeTrue positiveYes NoYes Predicted class

Rodney Nielsen, Human Intelligence & Language Technologies Lab Aside: the Kappa Statistic Two confusion matrices for a 3-class problem: actual predictor (left) vs. random predictor (right) Number of successes: sum of entries in diagonal (D) Cohen’s Kappa statistic: measures relative improvement over random predictor

Rodney Nielsen, Human Intelligence & Language Technologies Lab Classification with Costs Two cost matrices: Failure rate is replaced by average cost per prediction Cost is given by appropriate entry in the cost matrix In the above matrices, all errors have the same cost = 1

Rodney Nielsen, Human Intelligence & Language Technologies Lab Cost-Sensitive Classification Can take costs into account when making predictions Basic idea: only predict high-cost class when very confident about prediction Given: predicted class probabilities Normally we just predict the most likely class Here, we should make the prediction that minimizes the expected cost Expected cost: dot product of vector of class probabilities and appropriate column in cost matrix Choose column (class) that minimizes expected cost

Rodney Nielsen, Human Intelligence & Language Technologies Lab Cost-Sensitive Classification Assume p ^ = What would be the expected cost of classifying according to most likely class? What classification would result in the lowest expected cost? Cost Matrix Predicted Class abc Actual Class a b c

Rodney Nielsen, Human Intelligence & Language Technologies Lab Cost-Sensitive Learning ● So far we haven't taken costs into account at training time ● Most learning schemes do not perform cost- sensitive learning ● They generate the same classifier no matter what costs are assigned to the different classes ● Example: standard decision tree learner ● Simple methods for cost-sensitive learning: ● Resampling of instances according to costs ● Weighting of instances according to costs ● Some schemes can take costs into account by varying a parameter

Rodney Nielsen, Human Intelligence & Language Technologies Lab Lift Charts ● In practice, costs are rarely known ● Decisions are usually made by comparing possible scenarios ● Example: promotional mail-out to 1,000,000 households Mail to all; 0.1% respond (1000) Data mining tool identifies subset of 100,000 most promising, 0.4% of these respond (400) 40% of responses for 10% of cost may pay off Identify subset of 400,000 most promising, 0.2% respond (800) ● A lift chart allows a visual comparison

Rodney Nielsen, Human Intelligence & Language Technologies Lab Generating a Lift Chart ● Sort instances according to predicted probability of being positive: ● Create chart with: ● x axis = sample size ● y axis = number of true positives …… … Yes No Yes Yes Actual classPredicted probability

Rodney Nielsen, Human Intelligence & Language Technologies Lab A Hypothetical Life Chart 40% of responses for 10% of cost 80% of responses for 40% of cost

Rodney Nielsen, Human Intelligence & Language Technologies Lab ROC Curves ● ROC curves are similar to lift charts  Stands for “receiver operating characteristic”  Used in signal detection to show trade-off between hit rate and false alarm rate over noisy channel ● Differences to lift chart:  y axis shows percentage of true positives in sample rather than absolute number  x axis shows percentage of false positives in sample rather than sample size

Rodney Nielsen, Human Intelligence & Language Technologies Lab A Sample ROC Curve ● Jagged curve—one set of test data ● Smoother curve—use cross-validation

Rodney Nielsen, Human Intelligence & Language Technologies Lab Cross-Validation and ROC Curves ● Simple method of getting a ROC curve using cross- validation:  Collect probabilities for instances in test folds  Sort instances according to probabilities ● This method is implemented in WEKA ● However, this is just one possibility  Another possibility is to generate an ROC curve for each fold and average them

Rodney Nielsen, Human Intelligence & Language Technologies Lab ROC Curves for Two Schemes ● For a small, focused sample, use method A ● For a larger one, use method B ● In between, choose between A and B with appropriate probabilities

Rodney Nielsen, Human Intelligence & Language Technologies Lab More Measures ● Percentage of retrieved documents that are relevant: precision=TP/(TP+FP) ● Percentage of relevant documents that are returned: recall =TP/(TP+FN)

Rodney Nielsen, Human Intelligence & Language Technologies Lab More Measures ● Summary measures: average precision at 20%, 50% and 80% recall (three-point average recall) ● Precision at N P measured over the N examples with the highest probability estimates ● Balanced F-measure=(2 × recall × precision)/(recall+precision) ● Or F 1 -measure ● sensitivity × specificity = (TP / (TP + FN)) × (TN / (TN + FP)) ● Area under the ROC curve (AUC): probability that randomly chosen positive instance is ranked above randomly chosen negative one

Rodney Nielsen, Human Intelligence & Language Technologies Lab Summary of Some Measures ExplanationPlotDomain TP/(TP+FN) TP/(TP+FP) Recall Precision Information retrieval Recall- precision curve TP/(TP+FN) FP/(FP+TN) TP rate FP rate CommunicationsROC curve TP (TP+FP)/(TP+FP+TN +FN) TP Subset size MarketingLift chart

Rodney Nielsen, Human Intelligence & Language Technologies Lab Credibility: Evaluating What’s Been Learned ● Issues: training, testing, tuning ● Predicting performance ● Holdout, cross-validation, bootstrap ● Comparing schemes: the t-test ● Predicting probabilities: loss functions ● Cost-sensitive measures ● Evaluating numeric prediction ● The Minimum Description Length principle