Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services
Copyright © 2003, SAS Institute Inc. All rights reserved. 2 Rule-Based Knowledge Extraction A typical goal in extracting knowledge from data is the production of a classification rule that will assign a class membership to a future event with a specified probability A binary classifier assigns an object to one of two classes The decision regarding the class assignment will be either correct or incorrect, so there are four possible outcomes: {Predicted Event, Actual Event}(True Positive) {Predicted Event, Actual Nonevent}(False Positive) {Predicted Nonevent, Actual Event}(False Negative) {Predicted Nonevent, Actual Nonevent}(True Negative)
Copyright © 2003, SAS Institute Inc. All rights reserved. 3 Evaluating Classifier Performance Use 2x2 classification table of predicted vs actual class membership A critical concept in the discussion of decisions is the definition of an event An observation or instance, I, has been classified into class e with probability if the classifier assigns a probability
Copyright © 2003, SAS Institute Inc. All rights reserved. 4 The Cost of a Decision Correct Decision:TP = p(E|p) = False Positive:FP = p(E|n) = … Assume that correct decisions incur no cost The theoretical expected cost of misclassifying an instance I is
Copyright © 2003, SAS Institute Inc. All rights reserved. 5 Receiver Operating Characteristic Compute a 2x2 classification table for values of and plot the curve traced by (FP, TP) as ranges from 0 to 1 This curve is called the “receiver operating characteristic” and was developed during World War II to assess the performance of radar receivers in detecting targets accurately The area under the ROC curve (AUC) is defined to be the performance index of interest
Copyright © 2003, SAS Institute Inc. All rights reserved. 6 ROC Plot (.29,.70) (.29,.67)
Copyright © 2003, SAS Institute Inc. All rights reserved. 7 ROC Curve and Decision Costs ROC curve does not include any class distribution or misclassification cost information in its construction Does not give much guidance in the choice among competing classifiers unless one of them clearly dominates all of the others over all values of Overlay class distribution and misclassification cost on ROC curve using average cost of decision
Copyright © 2003, SAS Institute Inc. All rights reserved. 8 ROC Curve and Decision Costs (cont’d) For the 2x2 classification table, the equation becomes At minimum average cost point, slope of ROC curve is ROC operating point is sensitive to class distribution, misclassification costs
Copyright © 2003, SAS Institute Inc. All rights reserved. 9 Determine ROC Operating Point Represent slope of ROC curve using adjacent points to form the isoperformance line Compute slopes at adjacent points, determine interval containing slope, match with classifier point, find
Copyright © 2003, SAS Institute Inc. All rights reserved. 10 ROC Convex Hull (Provost and Fawcett,1997) Overlay multiple ROC curves on same (FP, TP) axes
Copyright © 2003, SAS Institute Inc. All rights reserved. 11 ROC Convex Hull (cont’d) Add convex hull to ROC curves
Copyright © 2003, SAS Institute Inc. All rights reserved. 12 ROC Convex Hull (cont’d) Add isoperformance line
Copyright © 2003, SAS Institute Inc. All rights reserved. 13 Selecting Classifiers Using ROC Method The isoperformance line, which is tangent to the ROCCH at the point of minimum expected cost, indicates which classifier to use for a specified combination of class distribution and misclassification costs Furthermore, the ROCCH method indicates the range of slopes over which a particular classifier is optimal with respect to class and costs
Copyright © 2003, SAS Institute Inc. All rights reserved. 14 Selecting Classifiers Using ROCCH (cont’d) Convex hull points + associated classifier
Copyright © 2003, SAS Institute Inc. All rights reserved. 15 Selecting Classifiers Using ROCCH (cont’d) Range of slopes, points of tangency, classifier
Copyright © 2003, SAS Institute Inc. All rights reserved. 16 Selecting Classifiers Using ROCCH (cont’d) Classifier and AUC for German credit ensemble classifiers
Copyright © 2003, SAS Institute Inc. All rights reserved. 17 Selecting Classifiers Using ROCCH (cont’d) Ensemble classifiers for Catalog Direct Mail
Copyright © 2003, SAS Institute Inc. All rights reserved. 18 Selecting Classifiers Using ROCCH (cont’d) Classifier and AUC for Catalog Direct Mail
Copyright © 2003, SAS Institute Inc. All rights reserved. 19 Selecting Classifiers Using ROCCH (cont’d) Ensemble classifiers for KDD-98 Cup
Copyright © 2003, SAS Institute Inc. All rights reserved. 20 Selecting Classifiers Using ROCCH (cont’d) Classifier and AUC for KDD-98 Cup
Copyright © 2003, SAS Institute Inc. All rights reserved. 21 Summary The ROCCH methodology for selecting binary classifiers explicitly includes class distribution and misclassification costs in its formulation. It is a robust alternative to whole-curve metrics like AUC, which reports global classifier performance but which may not indicate the best classifier (in the least-cost sense) for the range of operating conditions under which the classifier will assign class memberships.
Copyright © 2003, SAS Institute Inc. All rights reserved. 22 Copyright © 2003, SAS Institute Inc. All rights reserved. 22