Cost-Sensitive Learning Prepared with Lei Tang 11/22/2018 CSE 572: Data Mining by H. Liu
Cost-Sensitive Learning Motivation Data with different misclassification costs. Objective: Minimize the total misclassification cost. Application Medical diagnosis Fraud Detection Spam filtering Intrusion detection …… 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Toy Example A Decision Tree(T1) based on Accuracy: Body Heat Tumor Ill abnormal yes no normal Tumor yes no Ill Not Ill Two prediction errors 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Toy Example(cont’d) Misclassification costs for false positive and false negative are 1 and 100 respectively. Then misclassification cost for T1: 1 *1+100*1=101 T2 is another tree with higher error-rate, but lower misclassification cost. Errors : 3 Cost: 1*3=3 Tumor yes no ill Heat RDBMS - relational database management systems RDBMS offer simple operators for the deduction of information, such as join abnormal normal ill Not ill T2 based on cost 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Cost matrix Cost matrix of the toy example: (Similar to confusion matrix) Quiz: Dose the absolute value matter? How to get cost matrix? --User defined or --based on the class distribution Actual Pred ill Not ill Ill 100 1 Induction is different from deduction and DBMS does not not support induction; The result of induction is higher-level information or knowledge: general statements about data There are many approaches. Refer to the lecture notes for CS3244 available at the Co-Op. We focus on three approaches here, other examples: Other approaches Instance-based learning other neural networks Concept learning (Version space, Focus, Aq11, …) Genetic algorithms Reinforcement learning 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Different approaches There exist many techniques Stratification(sampling based on cost) Algorithm specific methods Build or prune decision tree based on cost cost-sensitive boosting, AdaCost … Meta-cost 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Meta-cost Sampling multiple times to build different models For each example, calculate the probability of prediction for each class Re-label the training data based on the probability and cost matrix Build a normal error-based classifier Issues How to build such a tree from the data? What are the criteria for performance measurement correctness conciseness What are the key components? test stopping criterion 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Further issues Multiple classes? Individual example cost? Different types of cost? Test cost? (Tasks for the group on cost-sensitive learning!) Issues How to build such a tree from the data? What are the criteria for performance measurement correctness conciseness What are the key components? test stopping criterion 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Ensemble Learning Prepared with Surendra 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Types of Ensemble Homogeneous Ensembles Uses the same learning algorithm e.g.. Bagging, Boosting Heterogeneous Ensembles Uses different learning algorithms e.g.., combination of Decision Tree, Nearest Neighbour, K-Star, etc. 11/22/2018 CSE 572: Data Mining by H. Liu
Phases of Building an Ensemble Model Generation Generate diverse set of classifiers Resampling, using different learning algorithm, various other strategies Model Combination Decide upon a strategy of combining the predictions of the classifiers making the ensemble 11/22/2018 CSE 572: Data Mining by H. Liu
Meta–Classification Framework Classifiers at two levels Base level or low level classifiers generated during model generation phase. Meta level classifier created during model combination phase. 11/22/2018 CSE 572: Data Mining by H. Liu
Categorization of Model Combination Strategies Voting As the name implies do some sort of voting of base classifiers Stacking Find a pattern between the predictions of base classifiers and the actual class label Grading Grade the base-classifiers and decide the subset which should be used 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Voting Techniques Majority Voting Sum the prediction probabilities of different classes given by the base classifier and predict in favor of the majority class Weighted Voting Assign weights to classifiers and do a weighted sum of prediction probabilities Weight calculated using the error rate Threshold Voting Use majority voting or weighted voting only when the error rate is above a certain threshold 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Stacking Techniques Stacking Use complete class distribution from each classifier Build a stacking classifier for each class 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu StackingC Use class distribution only for the concerned class 11/22/2018 CSE 572: Data Mining by H. Liu
CSE 572: Data Mining by H. Liu Grading Techniques Grading/Referee Method For each base classifier there is a grader classifier which determines whether the base classifier will be correct or not for the given test instance. G1 C1 G2 C2 G3 C3 11/22/2018 CSE 572: Data Mining by H. Liu