Presentation is loading. Please wait.

Presentation is loading. Please wait.

2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.

Similar presentations


Presentation on theme: "2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science."— Presentation transcript:

1 2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science & Technology Chapter 4: Evaluating Classification & Predictive Performance

2 2 2011 Data Mining, IISE, SNUT Steps in Data Mining revisited 1. Define and understand the purpose of data mining project 2. Formulate the data mining problem 3. Obtain/verify/modify the data 5. Build data mining models 6. Evaluate and interpret the results 7. Deploy and monitor the model 4. Explore and customize the data

3  Over-fitting for training data 3 2011 Data Mining, IISE, SNUT Why Evaluate? Training data Validation data Test data Is red boundary is better than blue one?

4  Over-fitting for training data 4 2011 Data Mining, IISE, SNUT Why Evaluate? Training data Validation data Test data Do not memorize them all!!

5  Multiple methods are available to classify or predict.  Classification: Naïve bayes, linear discriminant, k-nearest neighbor, classification trees, etc.  Prediction: Multiple linear regression, neural networks, regression trees, etc.  For each method, multiple choices are available for settings.  Neural networks: # hidden nodes, activation functions, etc.  To choose best model, need to assess each model’s performance.  Best setting (parameters) among various candidates for an algorithm (validation).  Best model among various data mining algorithms for the task (test). 5 2011 Data Mining, IISE, SNUT Why Evaluate?

6 6 2011 Data Mining, IISE, SNUT Classification Performance Example: Gender classification  Classify a person based on his/her body fat percentage (BFP).  Simple classifier: if BFP > 20 then female else male.  How do you evaluate the performance of the above classifier? 1 10.021.78.923.428.915.721.621.523.219.9 10.021.78.923.428.915.721.621.823.219.9 MFMFFMFFFM

7 7 2011 Data Mining, IISE, SNUT Classification Performance Confusion Matrix  Summarizes the correct and incorrect classifications that a classifier produced for a certain data set.  Confusion matrix can be constructed as 2 10.021.78.923.428.915.721.621.523.219.9 MFMFFMFFFM Confusion Matrix Predicted FM Actual F41 M23

8 8 2011 Data Mining, IISE, SNUT Classification Performance Confusion Matrix  Summarizes the correct and incorrect classifications that a classifier produced for a certain data set. Sensitivity (true positive, recall) = n 11 /(n 11 +n 10 ) Specificity (true negative) = n 00 /(n 01 +n 00 ) Precision = n 11 /(n 11 +n 01 ) Type I error (false negative) = n 10 /(n 11 +n 10 ) Type II error (false positive) = n 01 /(n 01 +n 00 ) 2 Confusion Matrix Predicted 1(+)0(-) Actual 1(+)n 11 n 10 0(-)n 01 n 00

9 9 2011 Data Mining, IISE, SNUT Classification Performance Confusion Matrix: continued  Misclassification error = (n 12 +n 21 )/(n 11 +n 12 +n 21 +n 22 )  Accuracy (1-misclassification error) = (n 11 +n 22 )/(n 11 +n 12 +n 21 +n 22 )  Balanced correction rate = sqrt(sensitivity*specificity)  F1 measure (harmonized mean of recall and precision) = 2 Confusion Matrix Predicted 1(+)0(-) Actual 1(+)n 11 n 10 0(-)n 01 n 00

10 10 2011 Data Mining, IISE, SNUT Classification Performance Confusion Matrix  For the previous example:  Sensitivity: 4/5 = 0.8, Specificity: 3/5 = 0.6  Recall: 4/5 = 0.8, Precision: 4/6 = 0.67  Type I error: 1/5 = 0.2, Type II error: 2/5 = 0.4  Misclassification error: (1+2)/(4+1+2+3) = 0.3, accuracy = 0.7  Balanced correction rate: sqrt(0.8*0.6) = 0.69  F1 measure: (2*0.8*0.67)/(0.8+0.67) = 0.85 2 Confusion Matrix Predicted FM Actual F41 M23 41 2 3

11 11 2011 Data Mining, IISE, SNUT Classification Performance Cut-off for classification  A new classifier: : if BFP > θ then female else male.  Sort data in a descending order of BFS.  How do you decide the cut-off for classification? 10.0 22.7 8.923.415.721.621.523.2 28.9 19.9 28.6 25.4 24.2 23.6 21.7 21.5 19.9 15.710.0 8.9 3

12 12 2011 Data Mining, IISE, SNUT Classification Performance Cut-off for classification  Performance measures for different cut-offs: No.BFSGender 128.6F 225.4M 324.2F 423.6F 522.7F 621.5M 719.9F 815.7M 910.0M 108.9M  If θ = 24, Misclassification error: 0.4 Accuracy: 0.6 Balanced correction rate: 0.57 F1 measure = 0.5  If θ = 22, Misclassification error: 0.2 Accuracy: 0.8 Balanced correction rate: 0.8 F1 measure = 0.8  If θ = 18, Misclassification error: 0.2 Accuracy: 0.8 Balanced correction rate: 0.77 F1 measure = 0.83 Confusion Matrix Predicted FM Actual F23 M14 Confusion Matrix Predicted FM Actual F41 M14 Confusion Matrix Predicted FM Actual F50 M23 3

13 13 2011 Data Mining, IISE, SNUT Classification Performance Cut-off for classification  In general, classification algorithms can produce the likelihood for each class in terms of probability or degree of evidence, etc.  Classification performance highly depends on the cut-off of the algorithm.  For model selection & model comparison, cut-off independent performance measures are recommended.  Lift charts, receiver operating characteristic (ROC) curve, etc. 3

14 14 2011 Data Mining, IISE, SNUT Classification Performance Lift charts: An example  Cancer diagnosis: Predict patients’ probability of malignant. A total of 100 patients. 20 patients are malignant. Malignant ratio: 0.2. 4

15 15 2011 Data Mining, IISE, SNUT Classification Performance Confusion matrix  Set the cut-off to 0.9 Malignant if P(Malignant) > 0.9, else benign. Misclassification error = 0.17 Accuracy = 0.83  Is it a good classification model? 4 Confusion Matrix Predicted MB Actual M614 B377 614 3 77

16 16 2011 Data Mining, IISE, SNUT Classification Performance Confusion matrix  Set the cut-off to 0.8 Malignant if P(Malignant) > 0.8, else benign. Misclassification error = 0.2 Accuracy = 0.8  Is it worse than the previous model? 4 Confusion Matrix Predicted MB Actual M10 B 70 10 1010 70

17 17 2011 Data Mining, IISE, SNUT Classification Performance Lift charts  Useful for assessing performance in terms of identifying the most important class.  Compare performance of DM model to “no model, pick randomly.”  Measures ability of DM model to identify the important class, relative to its average prevalence.  Charts give explicit assessment of results over a large number of cutoffs. 4

18 18 2011 Data Mining, IISE, SNUT Classification Performance Lift charts: Preparation  Benchmark model (B): randomly assign malignant with the probability of 0.2.  Compute the number of malignant patients for each decile. 4

19 19 2011 Data Mining, IISE, SNUT Classification Performance Lift charts  Plot the case/relative ratio/proportion for each decile. 4 Case Relative ratio

20 20 2011 Data Mining, IISE, SNUT Classification Performance Lift charts  Plot the case/relative ratio/proportion for each decile. 4 Proportion (non-cumulative) Top 20~30% Prob. 30% of them are malignant. Lift = 0.3/0.2 = 1.5

21 21 2011 Data Mining, IISE, SNUT Classification Performance Lift charts  Plot the case/relative ratio/proportion for each decile. 4 Proportion (cumulative) Top 0~30% Prob. 43.33% of them are malignant. Cumulative lift: 0.43/0.2 = 2.17

22 22 2011 Data Mining, IISE, SNUT Classification Performance Gain chart  Compare two models for each cumulative decile. 4 Top 0~30% Prob. 65% of malignant patients belong to this group. Cumulative lift: 0.65/0.3 = 2.17 Cumulative lift chart (y-axis): (malignant/total patients) in the group Gain chart (y-axis): (malignant in the group)/total malignant

23 23 2011 Data Mining, IISE, SNUT Classification Performance Receiver operating characteristics (ROC) curve  Sort the records based on the P(interesting class) in a descending order.  Compute the true positive rate and false positive rate by varying the cut-off.  Draw a chart where x & y axes are false & true positive, respectively. 5

24 24 2011 Data Mining, IISE, SNUT Classification Performance Receiver operating characteristics (ROC) curve False positive (1-specificity) True positive (sensitivity) Random classifier Ideal classifier 5

25 25 2011 Data Mining, IISE, SNUT Classification Performance Receiver operating characteristics (ROC) curve 5 True positive False positive cut-off high low Good So so Bad

26 26 2011 Data Mining, IISE, SNUT Classification Performance ROC curve and confusion matrix 5 Cut-offHighLow

27 27 2011 Data Mining, IISE, SNUT Classification Performance ROC curve, lift chart, and gain chart 5

28 28 2011 Data Mining, IISE, SNUT Classification Performance Area Under ROC curve (AUROC)  The area under the ROC curve.  Can be a useful metric for parameter/model selection.  1 for the ideal classifier  0.5 for the random classifier. False positive (1-specificity) True positive (sensitivity) AUROC 6

29 29 2011 Data Mining, IISE, SNUT Classification Performance Asymmetric misclassification costs  In many cases it is more important to identify members of one class. Cancer diagnosis, tax fraud, credit default, response to promotional offer, etc.  In such cases, we are willing to tolerate greater overall error, in return for better identifying the important class for further attention.  The cost of making a misclassification error may be higher for one class than the other(s).  The benefit of making a correct classification may be higher for one class than the other(s). 7

30 30 2011 Data Mining, IISE, SNUT Classification Performance Example: Response to promotional offer  Suppose we send an offer to 1000 people, with 1% average response rate (“1” = response, “0” = non-response).  “Naïve rule” Classify everyone as “0. Misclassification error = 1% Accuracy = 99%. Confusion Matrix Predicted 10 Actual 1010 00990 7

31 31 2011 Data Mining, IISE, SNUT Classification Performance Example: Response to promotional offer  DM model Correctly classify eight 1’s as 1’s at the cost of misclassifying twenty 0’s as 1’s and two 0’s as 1’s. Misclassification error = 2.2% Accuracy = 97.8%  Is it worse than the previous model? Confusion Matrix Predicted 10 Actual 182 020970 7

32 32 2011 Data Mining, IISE, SNUT Classification Performance Profit/Cost matrix  Assign profit/cost for each cell of confusion matrix. Example: $10: net profit for the responders if the offer is sent. $10: net cost for not sending offer for the responders. $1: net cost for sending an offer. Total profit for the naïve rule: 10*(-$10) = -$100 Total profit for DM model: 8*($9)+2*(-$10)+20*(-$1) = $32 Confusion Matrix Predicted 10 Actual 1$9-$10 0-$10 7

33 33 2011 Data Mining, IISE, SNUT Classification Performance Profit/Cost matrix for cancer diagnosis  Can assign the net cost for classifying malignant to benign? This is why doctors’ diagnoses are usually very conservative. Confusion Matrix Predicted 10 Actual 1Save one’s lift Can measure? 0 Misdiagnosis cost 0 7

34 34 2011 Data Mining, IISE, SNUT Classification Performance Cost ratio  In general, actual costs and benefits are hard to estimate.  Need to express everything in terms of costs (i.e. cost of misclassification per record).  Goal is to minimize the average cost per record.  A good practical substitute for individual costs is the ratio of misclassification costs  Misclassifying responders costs 10 times higher then misclassifying non-responders.  Misclassifying fraudulent firms is 5 times worse than misclassifying solvent firms. 7

35 35 2011 Data Mining, IISE, SNUT Classification Performance Cost ratio  Evaluation using cost ratio: q 0 /q 1 : misclassifying cost for negative(0)/positive(1) class.  Expected misclassification cost per record: = 7 Confusion Matrix Predicted 1(+)0(-) Actual 1(+)n 11 n 10 0(-)n 01 n 00

36 36 2011 Data Mining, IISE, SNUT Classification Performance Oversampling for asymmetric costs  When misclassification costs are equal: 7

37 37 2011 Data Mining, IISE, SNUT Classification Performance Oversampling for asymmetric costs  When misclassification costs are unequal: Misclassification cost for o is 5 times higher than that of x. 7

38 38 2011 Data Mining, IISE, SNUT Classification Performance Oversampling for asymmetric costs  Oversampling: Generate four synthetic o instances around each o. 7

39 39 2011 Data Mining, IISE, SNUT Classification Performance Confusion matrix for over-sampled data  Assume that there are 2% of class 1 and 98% of class 2.  Conducted over-sampling so that there are equal number of class 1 and class 2 records.  After oversampling: Misclassification rate = (80+110)/1,000 = 19% 7 Confusion Matrix Predicted 10 Actual 142080 0110390

40 40 2011 Data Mining, IISE, SNUT Classification Performance Confusion matrix for over-sampled data  # of records for the entire data: 0.02*X = 500, X=25,000.  # of records for class 2: 25,000*0.98 = 24,500  For the original data: Misclassification rate = (80+5,390)/25,000 = 21.9% 7 Confusion Matrix Predicted 10 Actual 142080 05,39019,110

41 41 2011 Data Mining, IISE, SNUT Prediction Performance Example  Predict a baby’s weight(kg) based on his age. 1 Age Actual Weight(y) Predicted Weight(y’) 15.66.0 26.96.4 310.410.9 413.712.4 517.415.6 620.721.5 723.523.0

42 42 Average error  Indicate whether the predictions are on average over- or under- predicted. Age Actual Weight(y) Predicted Weight(y’) 15.66.0 26.96.4 310.410.9 413.712.4 517.415.6 620.721.5 723.523.0 2011 Data Mining, IISE, SNUT Prediction Performance 2

43 43 Mean absolute error (MAE)  Gives the magnitude of the average error. Age Actual Weight(y) Predicted Weight(y’) 15.66.0 26.96.4 310.410.9 413.712.4 517.415.6 620.721.5 723.523.0 2011 Data Mining, IISE, SNUT Prediction Performance 3

44 44 Mean absolute percentage error (MAPE)  Gives a percentage score of how predictions deviate (on average) from the actual values. Age Actual Weight(y) Predicted Weight(y’) 15.66.0 26.96.4 310.410.9 413.712.4 517.415.6 620.721.5 723.523.0 2011 Data Mining, IISE, SNUT Prediction Performance 4

45 45 (Root) Mean squared error ((R)MSE)  Standard error of estimate.  Same units as the variable predicted. Age Actual Weight(y) Predicted Weight(y’) 15.66.0 26.96.4 310.410.9 413.712.4 517.415.6 620.721.5 723.523.0 2011 Data Mining, IISE, SNUT Prediction Performance 5


Download ppt "2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science."

Similar presentations


Ads by Google