7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation.

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations of Evaluation Results Statistical Tests

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 2 How to evaluate/estimate error Resubstitution –one data set used for both training and for testing Holdout (training and testing) –2/3 for training, 1/3 for testing Leave-one-out –If a data set is small Cross validation –10-fold, why 10? –m 10-fold CV

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 3 Error and Error Rate Mean and Median –mean = 1/n  x i –weighted mean = (  w i x i )/  w i –median = x (n+1)/2 if n is odd, else (x n/2 +x (n/2)+1 )/2 Error – disagreement btwn y and y’ (predicted) –1 if they disagree, 0 otherwise (0-1 loss l 01 ) –Other definitions depending on the output of a predictor such as quadratic loss l 2, absolute loss l ‖

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 4 Error estimation –Error rate e = #Errors/N, where N is the total number of instances –Accuracy A = 1 - e

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 5 Precision and Recall False negative and false positive Types of errors for k classes = k 2 -k –k = 3, 3*3-3 = 6, k = 2, 2*2-2 = 2 Precision (wrt the retrieved) –P = TP/(TP+FP) Recall (wrt the total relevant) –R = TP/(TP+FN) Precision×Recall (PR) and PR gain –PR gain = (PR’ – PR 0 )/PR 0 Accuracy –A = (TP+TN)/(TP+TN+FP+FN) O|PredP’veN’ve P’veTPFN N’veFPTN P R

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 6 Similarity or Dissimilarity Measures Distance (dissimilarity) measures ( Triangle Inequality ) –Euclidean –City-block, or Manhattan –Cosine (p i,p j )= [  (p ik p jk )/  (p ik ) 2  (p jk ) 2 ] Inter-clusters and intra-clusters –Single linkage vs. complete linkage D min = min|p i - p j |, two data points D max = max|p i - p j | –Centroid methods D avg = 1/(n i n j )  |p i – p j | D mean = |m i - m j |, two means

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 7 k-Fold Cross Validation Cross validation –1 fold for training, the rest for testing –rotate until every fold is used for training –calculate average m k-fold cross validation –reshuffle data, repeat XV for m times –what is a suitable k? Model complexity –use of XV tree complexity, training/testing error rates Fold 2 Fold 3 Fold 1

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 8 Presentations of Evaluation Results Learning (happy) curves –Accuracy increases over X –Its opposite (or error) decreases over X Box-plot –Whiskers (min, max) –Box: confidence interval –Graphical equivalent of t- test Results are usually about time, space, trend, average case min max 22 mean

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 9 Statistical Tests Null hypothesis and alternative hypothesis Type I and Type II errors Student’s t test comparing two means Paired t test comparing two means Chi-Square test –Contingency table

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 10 Null Hypothesis Null hypothesis (H 0 ) –No difference between the test statistic and the actual value of the population parameter –E.g., H 0 :  =  0 Alternative hypothesis (H 1 ) –It specifies the parameter value(s) to be accepted if the H 0 is rejected. –E.g., H 1 :  !=  0 – two-tailed test –Or H 1 :  >  0 – one-tailed test

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 11 Type I, II errors Type I errors (  ) –Rejecting a null hypothesis when it is true (FN) Type II errors (  ) –Accepting a null hypothesis when it is false (FP) –Power = 1 –  Costs of different errors –A life-saving medicine appears to be effective, which is cheap and has no side effect (H 0 : non-effective) Type I error: it is effective, not costly Type II error: it is non-effective, very costly

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 12 Test using Student’s t Distribution Use t distribution for testing the difference between two population means is appropriate if –The population standard deviations are not known –The samples are small (n < 30) –The populations are assumed to be approx. normal –The two unknown  1 =  2 H0: (  1 -  2) = 0, H1: (  1 -  2) != 0 –Check the difference of estimated means normalized by common population means degree of freedom and p level of significance –df = n 1 + n 2 – 2

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 13 Paired t test With paired observations, use paired t test Now H 0 :  d = 0 and H 1 :  d != 0 –Check the estimated difference mean The t in previous and current cases are calculated differently. –Both are 2-tailed test, p = 1% means.5% on each side –Excel can do that for you! 0 +  /2-  /2 Rejection Region

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 14 Chi-Square Test (the goodness-of-fit) Testing a null hypothesis that the population distribution for a random variable follows a specified form. The chi-square statistic is calculated: degree of freedom df = k-m-1 –k = num of data categories –m = num of parameters estimated 0 – uniform, 1- Poisson, 2 - normal –Each cell should be at least 5 One-tail test C1C2  I-1A 11 A 12 R1R1 I-2A 21 A 22 R2R2  C1C1 C2C2 N 2 k  2 =   (A ij – E ij ) 2 / E ij i=1 j=1 Rejection Region

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 15 Bibliography W. Klosgen & J.M. Zytkow, edited, 2002, Handbook of Data Mining and Knowledge Discovery. Oxford University Press. L. J. Kazmier & N. F. Pohl, 1987. Basic Statistics for Business and Economics. R.E. Walpole & R.H. Myers, 1993. Probability and Statistics for Engineers and Scientists (5 th edition). MACMILLAN Publishing Company.

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation.

Similar presentations

Presentation on theme: "7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation.

Similar presentations

Presentation on theme: "7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation."— Presentation transcript:

Similar presentations

About project

Feedback