Download presentation
Presentation is loading. Please wait.
Published byCharlotte Cook Modified over 9 years ago
1
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations of Evaluation Results Statistical Tests
2
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 2 How to evaluate/estimate error Resubstitution –one data set used for both training and for testing Holdout (training and testing) –2/3 for training, 1/3 for testing Leave-one-out –If a data set is small Cross validation –10-fold, why 10? –m 10-fold CV
3
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 3 Error and Error Rate Mean and Median –mean = 1/n x i –weighted mean = ( w i x i )/ w i –median = x (n+1)/2 if n is odd, else (x n/2 +x (n/2)+1 )/2 Error – disagreement btwn y and y’ (predicted) –1 if they disagree, 0 otherwise (0-1 loss l 01 ) –Other definitions depending on the output of a predictor such as quadratic loss l 2, absolute loss l ‖
4
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 4 Error estimation –Error rate e = #Errors/N, where N is the total number of instances –Accuracy A = 1 - e
5
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 5 Precision and Recall False negative and false positive Types of errors for k classes = k 2 -k –k = 3, 3*3-3 = 6, k = 2, 2*2-2 = 2 Precision (wrt the retrieved) –P = TP/(TP+FP) Recall (wrt the total relevant) –R = TP/(TP+FN) Precision×Recall (PR) and PR gain –PR gain = (PR’ – PR 0 )/PR 0 Accuracy –A = (TP+TN)/(TP+TN+FP+FN) O|PredP’veN’ve P’veTPFN N’veFPTN P R
6
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 6 Similarity or Dissimilarity Measures Distance (dissimilarity) measures ( Triangle Inequality ) –Euclidean –City-block, or Manhattan –Cosine (p i,p j )= [ (p ik p jk )/ (p ik ) 2 (p jk ) 2 ] Inter-clusters and intra-clusters –Single linkage vs. complete linkage D min = min|p i - p j |, two data points D max = max|p i - p j | –Centroid methods D avg = 1/(n i n j ) |p i – p j | D mean = |m i - m j |, two means
7
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 7 k-Fold Cross Validation Cross validation –1 fold for training, the rest for testing –rotate until every fold is used for training –calculate average m k-fold cross validation –reshuffle data, repeat XV for m times –what is a suitable k? Model complexity –use of XV tree complexity, training/testing error rates Fold 2 Fold 3 Fold 1
8
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 8 Presentations of Evaluation Results Learning (happy) curves –Accuracy increases over X –Its opposite (or error) decreases over X Box-plot –Whiskers (min, max) –Box: confidence interval –Graphical equivalent of t- test Results are usually about time, space, trend, average case min max 22 mean
9
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 9 Statistical Tests Null hypothesis and alternative hypothesis Type I and Type II errors Student’s t test comparing two means Paired t test comparing two means Chi-Square test –Contingency table
10
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 10 Null Hypothesis Null hypothesis (H 0 ) –No difference between the test statistic and the actual value of the population parameter –E.g., H 0 : = 0 Alternative hypothesis (H 1 ) –It specifies the parameter value(s) to be accepted if the H 0 is rejected. –E.g., H 1 : != 0 – two-tailed test –Or H 1 : > 0 – one-tailed test
11
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 11 Type I, II errors Type I errors ( ) –Rejecting a null hypothesis when it is true (FN) Type II errors ( ) –Accepting a null hypothesis when it is false (FP) –Power = 1 – Costs of different errors –A life-saving medicine appears to be effective, which is cheap and has no side effect (H 0 : non-effective) Type I error: it is effective, not costly Type II error: it is non-effective, very costly
12
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 12 Test using Student’s t Distribution Use t distribution for testing the difference between two population means is appropriate if –The population standard deviations are not known –The samples are small (n < 30) –The populations are assumed to be approx. normal –The two unknown 1 = 2 H0: ( 1 - 2) = 0, H1: ( 1 - 2) != 0 –Check the difference of estimated means normalized by common population means degree of freedom and p level of significance –df = n 1 + n 2 – 2
13
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 13 Paired t test With paired observations, use paired t test Now H 0 : d = 0 and H 1 : d != 0 –Check the estimated difference mean The t in previous and current cases are calculated differently. –Both are 2-tailed test, p = 1% means.5% on each side –Excel can do that for you! 0 + /2- /2 Rejection Region
14
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 14 Chi-Square Test (the goodness-of-fit) Testing a null hypothesis that the population distribution for a random variable follows a specified form. The chi-square statistic is calculated: degree of freedom df = k-m-1 –k = num of data categories –m = num of parameters estimated 0 – uniform, 1- Poisson, 2 - normal –Each cell should be at least 5 One-tail test C1C2 I-1A 11 A 12 R1R1 I-2A 21 A 22 R2R2 C1C1 C2C2 N 2 k 2 = (A ij – E ij ) 2 / E ij i=1 j=1 Rejection Region
15
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 15 Bibliography W. Klosgen & J.M. Zytkow, edited, 2002, Handbook of Data Mining and Knowledge Discovery. Oxford University Press. L. J. Kazmier & N. F. Pohl, 1987. Basic Statistics for Business and Economics. R.E. Walpole & R.H. Myers, 1993. Probability and Statistics for Engineers and Scientists (5 th edition). MACMILLAN Publishing Company.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.