Evaluation and Its Methods

Evaluation and Its Methods
How to know a method is good or better than another Measuring Data Mining Algorithms What are your intuitive ideas? 5/2/2019 CSE 572: Data Mining by H. Liu Copyright, 1996 © Dale Carnegie & Associates, Inc.

CSE 572: Data Mining by H. Liu
Why should we evaluate Comparison Goodness of the mining schemes Difference between f (the true function) and f’ (the learned one) Avoiding over-fitting data What is over-fitting? Challenge How do we measure something we don’t really know Criteria for comparison objective repeatable fair 5/2/2019 CSE 572: Data Mining by H. Liu

What to compare? Measures vary
For classification, we may compare along accuracy compactness comprehensibility time … For clustering, ... For association rules, ... 5/2/2019 CSE 572: Data Mining by H. Liu

How to obtain evaluation results
Just trust me – does that work? Training data only (resubstitution) Training-Testing (2/3 and 1/3) Training-Validation-Testing Cross-Validation Leave-one-out Bootstrap: random sampling with replacement 63.2% data is used each time on average One important step - shuffling data For each instance in a dataset of n instances, its p of being picked is 1/n, its p of not being picked is 1-1/n. To pick n times and an instance is not picked, its p is (1-1/n)^n = e^(-1)= 0.368 5/2/2019 CSE 572: Data Mining by H. Liu

Basic concepts True positive (TP), false positive (FP) True negative (TN), false negative (FN) One definition of accuracy: (TP+TN)/Sum Sum = TP+FP+TN+FN Error rate is (1 - accuracy) Various other definitions Precision - TP/(TP+FP) Recall - TP/(TP+FN) F measure = 2P*R/(P+R) 5/2/2019 CSE 572: Data Mining by H. Liu

Examples of curves If a prediction can be associated with a probability, then, we can have the following Organizing predicted data Rank them according to predicted probabilities in descending order Lift charts: Number of respondents vs. sample size (%) ROC (Receiver Operating Characteristic) curves: Use the ranked data Plot true positive% (Y) vs. false positive% (X) Figure source: Additional examples can be found Witten & Frank book 5/2/2019 CSE 572: Data Mining by H. Liu

Paper presentation work
When is a good due date? You should submit a hard copy of the required (please refer to the instructor’s course website). After the submission, you can still modify your slides. So, you need to provide a URL in the hardcopy where we can get the latest slides. We will arrange the presentation order accordingly after selection. It depends on the time available – we may have to select some papers to present based on the quality of the preparation. 5/2/2019 CSE 572: Data Mining by H. Liu

Some issues Sizes of data Large/Small How much data is needed Roughly vs. theoretically Happy curves (viewing the effect of increasing data) Subjectivity What is it needed? There can be many legitimate solutions (8 gold coins) Applications Categorization/Prediction Microarrays Adding the cost Different errors can incur very different costs. 5/2/2019 CSE 572: Data Mining by H. Liu

Evaluating numeric prediction
mean-squared error root mean-squared error mean absolute error correlation coefficient and many others (please refer to your favorite text book) Source: Witten and Frank’s book 5/2/2019 CSE 572: Data Mining by H. Liu

Cross Validation (CV - revisit)
n-fold CV (a common n is 10) Mean  = (Ai)/n, where Ai is accuracy for ith run in a total of n runs. Variance 2= ((xi- )2)/(n-1) K n-fold CV and its procedure Loop K times shuffle data and divide it into n folds Loop n times (n-1) folds are used for training, 1 fold for testing 5/2/2019 CSE 572: Data Mining by H. Liu

Comparing different DM algorithms
Which algorithm is better between any two An example from Assignment 3 Using different data sets Need to consider both mean and variance differences The intuitive ideas are (1) to see if one is consistently better than the other, or (2) they are not significantly (in a statistical sense) different from each other Hypothesis test and Type I (α) and II (β) errors Test Statistics (t, F, and chi-square) one-tail or two-tail test Depending on what your null hypothesis is difference in means Type I: reject a true null hypothesis Type II: not reject a false null hypothesis 5/2/2019 CSE 572: Data Mining by H. Liu

Comparing two sets of results
Null hypothesis H0 – the two means are equal The alternative of H0 is that they are not equal Paired or not paired tests Fixed level testing Significance level α (5%, 10%) is preset α is also Type I error (rejecting true H0 ) Confidence interval: 1 - α Critical region (CR) in which if any of them are observed, sth extreme has occurred One- or two tail Problem: when it’s in CR No difference is discernable α/2 5/2/2019 CSE 572: Data Mining by H. Liu

P value Observed significance value A statistics program can calculate p-value It is the smallest fixed level at which the null hypothesis can be rejected Statistically significant if p-value is less than the preset value of α; the results would be surprising if H0 is true, hence reject H0 Using p-values: let α = 0.05 p-value = 1 > α, accept H0 p-value = 0.02 < α, reject H0 More details can be found at An example to calculate p-value α/2 5/2/2019 CSE 572: Data Mining by H. Liu

Costs of Predictions Different errors may incur different costs Positive and negative predictions can have dramatically different costs Including the cost consideration in learning can significantly influence the outcome Cost-sensitive learning examples Disease diagnosis - which type of errors (false positive or false negative) is more important? Intrusion detection Car alarm, house alarm How to introduce costs in measurement A 2-class, 3-class, or k-class problem 5/2/2019 CSE 572: Data Mining by H. Liu

Project proposal and onward
Let’s look at the course website What do you want to achieve? What are the ideas you have? Among which, what are feasible ideas? What if there is no idea? Where to find project ideas What are the difficulties? Interesting or big problems? Available data now or later? 5/2/2019 CSE 572: Data Mining by H. Liu

Now what about project? Act on it NOW What are the common challenges in the group projects so that they can be attacked separately? Let’s list some of them here … What if I fail? We award failures too if they are dissected carefully, and if the experience can benefit others 5/2/2019 CSE 572: Data Mining by H. Liu

Summary Many ways to measure Which way to use The key is to follow the acceptable standards where you want your results published/accepted Reproducibility of the empirical results is utterly important Using benchmark datasets if possible Subjectivity in evaluation What is fair? Don’t forget the ‘8 gold coins’ problem So try to objectively explain your results 5/2/2019 CSE 572: Data Mining by H. Liu

Evaluation and Its Methods

Similar presentations

Presentation on theme: "Evaluation and Its Methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evaluation and Its Methods

Similar presentations

Presentation on theme: "Evaluation and Its Methods"— Presentation transcript:

Similar presentations

About project

Feedback