Evaluation and Its Methods

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Advertisements

Evaluating Classifiers
Learning Algorithm Evaluation
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Spring 2003Data Mining by H. Liu, ASU1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations.
Evaluation.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Today Evaluation Measures Accuracy Significance Testing
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
Evaluating Classifiers
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Chapter 8 Introduction to Hypothesis Testing
Evaluation – next steps
Overview Basics of Hypothesis Testing
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
CpSc 810: Machine Learning Evaluation of Classifier.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Evaluating Results of Learning Blaž Zupan
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
© Copyright McGraw-Hill 2004
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Data Science Credibility: Evaluating What’s Been Learned
7. Performance Measurement
Step 1: Specify a null hypothesis
Evaluation – next steps
Statistics made simple Dr. Jennifer Capers
Lecture Slides Elementary Statistics Twelfth Edition
Review and Preview and Basics of Hypothesis Testing
Performance Evaluation 02/15/17
Statistics for the Social Sciences
Evaluating Results of Learning
Objectives of the Course and Preliminaries
9. Credibility: Evaluating What’s Been Learned
Understanding Results
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Analyzing and Interpreting Quantitative Data
Performance Measures II
Elementary Statistics
Hypothesis Testing: Hypotheses
Power, Sample Size, & Effect Size:
Chapter 9 Hypothesis Testing.
Elementary Statistics
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Evaluation and Its Methods
Discrete Event Simulation - 4
Multiple Regression Models
Evaluating Classifiers (& other algorithms)
Learning Algorithm Evaluation
INTRODUCTION TO Machine Learning
Chapter 3 Probability Sampling Theory Hypothesis Testing.
Model Evaluation and Selection
Sampling and Power Slides by Jishnu Das.
Chapter 8 Making Sense of Statistical Significance: Effect Size, Decision Errors, and Statistical Power.
Section 11-1 Review and Preview
Chapter 9 Hypothesis Testing: Single Population
Evaluation and Its Methods
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Evaluation and Its Methods How to know a method is good or better than another Measuring Data Mining Algorithms What are your intuitive ideas? 5/2/2019 CSE 572: Data Mining by H. Liu Copyright, 1996 © Dale Carnegie & Associates, Inc.

CSE 572: Data Mining by H. Liu Why should we evaluate Comparison Goodness of the mining schemes Difference between f (the true function) and f’ (the learned one) Avoiding over-fitting data What is over-fitting? Challenge How do we measure something we don’t really know Criteria for comparison objective repeatable fair 5/2/2019 CSE 572: Data Mining by H. Liu

What to compare? Measures vary For classification, we may compare along accuracy compactness comprehensibility time … For clustering, ... For association rules, ... 5/2/2019 CSE 572: Data Mining by H. Liu

How to obtain evaluation results Just trust me – does that work? Training data only (resubstitution) Training-Testing (2/3 and 1/3) Training-Validation-Testing Cross-Validation Leave-one-out Bootstrap: random sampling with replacement 63.2% data is used each time on average One important step - shuffling data For each instance in a dataset of n instances, its p of being picked is 1/n, its p of not being picked is 1-1/n. To pick n times and an instance is not picked, its p is (1-1/n)^n = e^(-1)= 0.368 5/2/2019 CSE 572: Data Mining by H. Liu

CSE 572: Data Mining by H. Liu Basic concepts True positive (TP), false positive (FP) True negative (TN), false negative (FN) One definition of accuracy: (TP+TN)/Sum Sum = TP+FP+TN+FN Error rate is (1 - accuracy) Various other definitions Precision - TP/(TP+FP) Recall - TP/(TP+FN) F measure = 2P*R/(P+R) 5/2/2019 CSE 572: Data Mining by H. Liu

CSE 572: Data Mining by H. Liu Examples of curves If a prediction can be associated with a probability, then, we can have the following Organizing predicted data Rank them according to predicted probabilities in descending order Lift charts: Number of respondents vs. sample size (%) ROC (Receiver Operating Characteristic) curves: Use the ranked data Plot true positive% (Y) vs. false positive% (X) Figure source: http://gim.unmc.edu/dxtests/roc2.htm Additional examples can be found Witten & Frank book 5/2/2019 CSE 572: Data Mining by H. Liu

Paper presentation work When is a good due date? You should submit a hard copy of the required (please refer to the instructor’s course website). After the submission, you can still modify your slides. So, you need to provide a URL in the hardcopy where we can get the latest slides. We will arrange the presentation order accordingly after selection. It depends on the time available – we may have to select some papers to present based on the quality of the preparation. 5/2/2019 CSE 572: Data Mining by H. Liu

CSE 572: Data Mining by H. Liu Some issues Sizes of data Large/Small How much data is needed Roughly vs. theoretically Happy curves (viewing the effect of increasing data) Subjectivity What is it needed? There can be many legitimate solutions (8 gold coins) Applications Categorization/Prediction Microarrays Adding the cost Different errors can incur very different costs. 5/2/2019 CSE 572: Data Mining by H. Liu

Evaluating numeric prediction mean-squared error root mean-squared error mean absolute error correlation coefficient and many others (please refer to your favorite text book) Source: Witten and Frank’s book 5/2/2019 CSE 572: Data Mining by H. Liu

Cross Validation (CV - revisit) n-fold CV (a common n is 10) Mean  = (Ai)/n, where Ai is accuracy for ith run in a total of n runs. Variance 2= ((xi- )2)/(n-1) K n-fold CV and its procedure Loop K times shuffle data and divide it into n folds Loop n times (n-1) folds are used for training, 1 fold for testing 5/2/2019 CSE 572: Data Mining by H. Liu

Comparing different DM algorithms Which algorithm is better between any two An example from Assignment 3 Using different data sets Need to consider both mean and variance differences The intuitive ideas are (1) to see if one is consistently better than the other, or (2) they are not significantly (in a statistical sense) different from each other Hypothesis test and Type I (α) and II (β) errors http://davidmlane.com/hyperstat/A18652.html Test Statistics (t, F, and chi-square) one-tail or two-tail test Depending on what your null hypothesis is difference in means Type I: reject a true null hypothesis Type II: not reject a false null hypothesis 5/2/2019 CSE 572: Data Mining by H. Liu

Comparing two sets of results Null hypothesis H0 – the two means are equal The alternative of H0 is that they are not equal Paired or not paired tests Fixed level testing Significance level α (5%, 10%) is preset α is also Type I error (rejecting true H0 ) Confidence interval: 1 - α Critical region (CR) in which if any of them are observed, sth extreme has occurred One- or two tail Problem: when it’s in CR No difference is discernable http://www.sportsci.org/resource/stats/pvalues.html http://www.graphpad.com/www/book/Interpret.htm http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/pvalues.htm α/2 5/2/2019 CSE 572: Data Mining by H. Liu

CSE 572: Data Mining by H. Liu P value Observed significance value A statistics program can calculate p-value It is the smallest fixed level at which the null hypothesis can be rejected Statistically significant if p-value is less than the preset value of α; the results would be surprising if H0 is true, hence reject H0 Using p-values: let α = 0.05 p-value = 1 > α, accept H0 p-value = 0.02 < α, reject H0 More details can be found at http://www.tufts.edu/~gdallal/pval.htm An example to calculate p-value http://en.wikipedia.org/wiki/P-value α/2 5/2/2019 CSE 572: Data Mining by H. Liu

CSE 572: Data Mining by H. Liu Costs of Predictions Different errors may incur different costs Positive and negative predictions can have dramatically different costs Including the cost consideration in learning can significantly influence the outcome Cost-sensitive learning examples Disease diagnosis - which type of errors (false positive or false negative) is more important? Intrusion detection Car alarm, house alarm How to introduce costs in measurement A 2-class, 3-class, or k-class problem 5/2/2019 CSE 572: Data Mining by H. Liu

Project proposal and onward Let’s look at the course website What do you want to achieve? What are the ideas you have? Among which, what are feasible ideas? What if there is no idea? Where to find project ideas What are the difficulties? Interesting or big problems? Available data now or later? 5/2/2019 CSE 572: Data Mining by H. Liu

CSE 572: Data Mining by H. Liu Now what about project? Act on it NOW What are the common challenges in the group projects so that they can be attacked separately? Let’s list some of them here … What if I fail? We award failures too if they are dissected carefully, and if the experience can benefit others 5/2/2019 CSE 572: Data Mining by H. Liu

CSE 572: Data Mining by H. Liu Summary Many ways to measure Which way to use The key is to follow the acceptable standards where you want your results published/accepted Reproducibility of the empirical results is utterly important Using benchmark datasets if possible Subjectivity in evaluation What is fair? Don’t forget the ‘8 gold coins’ problem So try to objectively explain your results 5/2/2019 CSE 572: Data Mining by H. Liu