Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994.

Similar presentations


Presentation on theme: "Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994."— Presentation transcript:

1 Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994

2 Outline ➤ Introduction ➤ Algorithm and Implementation ➤ Experiment ➤ Discussion & Analysis

3 Introduction ➤ Class imbalance: ➤ The number of instances from each class is unequal ➤ e.g. Medical situation: Tell if a patient has cancer or not given his health examination results. ➤ Problem ➤ Imbalance training set leads to a discriminative model with regard to poor accuracy performance for minor class However there are many situations that the minor class is of more interest (e.g. cancer detection)

4 Introduction ➤ Existing solutions: ➤ Cost-Sensitive ➤ Different wrong results have different costs. Classify an instance to the class to minimize the cost. ➤ Solve the imbalance problem by increasing the cost for FP of the minor class ➤ Oversampling ➤ Populate the instances in the minor class to get a more balanced data set ➤ Undersampling ➤ Reduce the number of instances in the major class to get a more balanced data set

5 Implementation ➤ Cost-sensitive ➤ Weighted SVM (implemented in libsvm) ➤ Oversampling ➤ SMOTE (Synthetic Minority Over-sampling Technique) ➤ Undersampling ➤ Basic ➤ Bagged

6 Smote ➤ Basic idea: ➤ Generate synthetic points for the minor class between the points in the minor class and their k nearest neighbors ➤ Assumption: ➤ The points between the minor class point and its k nearest neighbors still belong to the minor class

7 Algorithm ➤ Parameter: ➤ N%: Amount of SMOTE instance ➤ k: number of nearest neighbors ➤ Algorithm: ➤ Identify the minor classes ➤ Calculate n nearest neighbor for each point in the minor class ➤ Populate the minor class by creating synthetic minor class examples

8 ➤ N: 500 ➤ k: 4 SMOTE

9 Undersampling

10

11 Experiment NameClassSizeDimensionPrevalenceSource haberman230630.26UCI cmc31473240.23UCI satimage64435360.09Statlog car41728210.04UCI dna320001800.23Statlog

12 Experiment

13 Result ➤ Micro-averaged F2 Measure:

14 Result ➤ Macro-averaged F2 Measure:

15 Discussion & Analysis ➤ Tradeoff between minority and majority. ➤ Bagged undersampling performs better than basic undersampling. ➤ Trend: ➤ Smaller set size, smaller dimension, and smaller minority prevalence leads to better performance improvement.

16 Thank you Q&A


Download ppt "Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994."

Similar presentations


Ads by Google