Model Evaluation Saed Sayad www.ismartsoft.com
Data Mining Steps 1 2 3 4 5 6 www.ismartsoft.com Problem Definition Data Preparation 3 Data Exploration 4 Modeling 5 Evaluation 6 Deployment www.ismartsoft.com
Model Evaluation www.ismartsoft.com Evaluation Classification Confusion Matrix Gain, Lift, ... Charts Regression Mean Squared Error Residuals Chart www.ismartsoft.com
Classification - Confusion Matrix Positive Cases Negative Cases CM True Positive False Negative Predicted Positive Predicted Negative www.ismartsoft.com
Confusion Matrix - Evaluation Measurements Actual + - TP FP TP+FP FN TN FN+TN TP+FN FP+TN TP+FP+FN+TN Predicted
Sensitivity and Specificity www.ismartsoft.com
Classification – Gain Chart Target% Wizard 100% Model Random Population% 0% 50% 100% www.ismartsoft.com
Gain Chart Wizard 100% A 50% Random 10% Target% Population% 10% 18% www.ismartsoft.com
Gain Chart Score Table Sorted by Score Gain Table Target Score 235 1 235 1 724 556 345 480 676 195 880 368 ... Target Score 1 880 724 676 556 480 368 345 235 195 ... Count% Target% 10 36 20 54 30 66 40 76 50 85 60 90 70 94 80 98 100 www.ismartsoft.com
Classification – Gain Chart Target% 100% A 85% 76% B 66% 54% 36% Population% 10% 20% 30% 40% 50% 100% Copyright iSmartsoft Inc. 2008 www.ismartsoft.com
Lift Chart Gain Table Lift Table Count% Target% 10 36 20 54 30 66 40 76 50 85 60 90 70 94 80 98 100 Count% Lift 10 3.6 20 2.7 30 2.2 40 1.9 50 1.7 60 1.5 70 1.3 80 1.2 90 1.1 100 1 Copyright iSmartsoft Inc. 2008 www.ismartsoft.com
Lift Chart Lift Population% www.ismartsoft.com
K-S Chart (Kolmogorov-Smirnov) Score Range Count Cumulative Count Lower Upper Target Non-Target K-S 100 3 62 0.5% 0.8% 0.3% 200 23 1.1% 0.6% 300 1 66 0.7% 2.0% 1.3% 400 7 434 7.7% 5.7% 500 181 5627 34.3% 81.7% 47.4% 600 112 886 54.3% 93.3% 39.0% 700 83 332 69.1% 97.7% 28.6% 800 45 63 77.1% 98.5% 21.4% 900 29 37 82.3% 99.0% 16.7% 1000 99 77 100.0% 0.0% K-S K(0.95) = 6.0% K(0.99) = 7.1% www.ismartsoft.com
K-S Chart Count% Score www.ismartsoft.com
ROC Chart (Receiver Operating Characteristic) Count% False Positive Rate (1-Specificity) True Positive Rate (Sensitivity) 10 0.1 0.66 20 0.2 0.79 30 0.3 0.86 40 0.4 0.91 50 0.5 0.94 60 0.6 0.95 70 0.7 0.98 80 0.8 90 0.9 0.99 100 1.0 1.00 www.ismartsoft.com
ROC Chart Sensitivity 1-Specificity www.ismartsoft.com
Regression – Mean Squared Error www.ismartsoft.com
Regression – Relative Squared Error www.ismartsoft.com
Regression – Mean Absolute Error www.ismartsoft.com
Regression – Relative Absolute Error www.ismartsoft.com
Regression – Standardized Residuals Plot www.ismartsoft.com
Questions? www.ismartsoft.com