Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation and Its Methods

Similar presentations


Presentation on theme: "Evaluation and Its Methods"— Presentation transcript:

1 Evaluation and Its Methods
How to know a method is good - Measuring Data Mining Algorithms 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu Copyright, 1996 © Dale Carnegie & Associates, Inc.

2 CSE 572, CBS 572: Data Mining by H. Liu
Why should we evaluate Comparison Goodness of the mining schemes Avoiding over-fitting data Criteria for comparison objective repeatable fair 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

3 CSE 572, CBS 572: Data Mining by H. Liu
What to compare For classification, we may compare along accuracy compactness comprehensibility time For clustering, ... For association rules, ... 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

4 CSE 572, CBS 572: Data Mining by H. Liu
Basic concepts True positive (TP), false positive (FP) True negative (TN), false negative (FN) One definition of accuracy: (TP+TN)/Sum Sum = TP+FP+TN+FN Error rate is (1 - accuracy) Various other definitions Precision - TP/(TP+FP) Recall - TP/(TP+FN) F measure = 2P*R/(P+R) ROC (Receiver Operating Characteristic) curves: true positive% (Y) vs. false positive% (X) Figure source: 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

5 How to obtain evaluation results
Just trust me – does that work? Training data only (resubstitution) Training-Testing (2/3 and 1/3) Training-Validation-Testing Cross-Validation Leave-one-out Bootstrap: random sampling with replacement 63.2% data is used each time on average One important step - shuffling data For each instance in a dataset of n instances, its p of being picked is 1/n, its p of not being picked is 1-1/n. To pick n times and an instance is not picked, its p is (1-1/n)^n = e^(-1)= 0.368 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

6 Paper presentation work Due on 3/6/06 Monday
You should submit a hard copy of the required (please refer to the instructor’s course website). After the submission, you can still modify your slides. So, you need to provide a URL in the hardcopy where we can get the latest slides. We will arrange the presentation order accordingly. It depends on the time available – we may have to select some papers to present based on the quality of the preparation. 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

7 CSE 572, CBS 572: Data Mining by H. Liu
Some issues Sizes of data Large/Small How much data is needed Roughly vs. theoretically Happy curves (viewing the effect of increasing data) Applications Categorization/Prediction Microarrays Adding the cost 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

8 Evaluating numeric prediction
mean-squared error root mean-squared error mean absolute error correlation coefficient Source: Witten and Frank’s book 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

9 CSE 572, CBS 572: Data Mining by H. Liu
Cross Validation (CV) n-fold CV (a common n is 10) Mean  = (xi)/n Variance 2= ((xi- )2)/(n-1) K n-fold CV and its procedure Loop K times shuffle data and divide it into n folds Loop n times (n-1) folds are used for training, 1 fold for testing 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

10 Comparing different DM algorithms
Which algorithm is better between any two An example from Assignment 3 Discussion on Mushroom and Splice datasets Need to consider both mean and variance differences The intuitive ideas are (1) to see if one is consistently better than the other, or (2) they are not significantly (in a statistical sense) different from each other Test Statistics (t, F, and chi-square) one-tail or two-tail test difference in means 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

11 Comparing two sets of results
Null hypothesis H0 – the two means are equal Paired or not paired tests Fixed level testing Significance level α (5%, 10%) is preset Confidence interval: 1 - α A statistics program can calculate p-value Statistically significant if p-value is less than the preset value of α; the results would be surprising if H0 is true, hence reject H0 Using p-values: let α = 0.05 p-value = 1 > α, accept H0 p-value = 0.02 < α, reject H0 α/2 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

12 CSE 572, CBS 572: Data Mining by H. Liu
Costs of Predictions Different errors may incur different costs Positive and negative predictions can have dramatically different costs Including the cost consideration in learning can significantly influence the outcome Cost-sensitive learning examples Disease diagnosis - which type of errors (false positive or false negative) is more important? Intrusion detection Car alarm, house alarm How to introduce costs in measurement 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

13 CSE 572, CBS 572: Data Mining by H. Liu
Project proposal Let’s look at the website What do you want to achieve? What are the ideas you have? Among which, what are feasible ideas? What if there is no idea? Where to find project ideas What are the difficulties? Interesting or big problems? Available data now or later? 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu

14 CSE 572, CBS 572: Data Mining by H. Liu
Summary Many ways to measure Which way to use The key is to follow the acceptable standards where you want your results accepted Reproducibility of the empirical results is utterly important Using benchmark datasets Subjectivity in evaluation What is fair? The ‘8 gold coins’ problem 11/30/2018 CSE 572, CBS 572: Data Mining by H. Liu


Download ppt "Evaluation and Its Methods"

Similar presentations


Ads by Google