Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994.

Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994

Outline ➤ Introduction ➤ Algorithm and Implementation ➤ Experiment ➤ Discussion & Analysis

Introduction ➤ Class imbalance: ➤ The number of instances from each class is unequal ➤ e.g. Medical situation: Tell if a patient has cancer or not given his health examination results. ➤ Problem ➤ Imbalance training set leads to a discriminative model with regard to poor accuracy performance for minor class However there are many situations that the minor class is of more interest (e.g. cancer detection)

Introduction ➤ Existing solutions: ➤ Cost-Sensitive ➤ Different wrong results have different costs. Classify an instance to the class to minimize the cost. ➤ Solve the imbalance problem by increasing the cost for FP of the minor class ➤ Oversampling ➤ Populate the instances in the minor class to get a more balanced data set ➤ Undersampling ➤ Reduce the number of instances in the major class to get a more balanced data set

Implementation ➤ Cost-sensitive ➤ Weighted SVM (implemented in libsvm) ➤ Oversampling ➤ SMOTE (Synthetic Minority Over-sampling Technique) ➤ Undersampling ➤ Basic ➤ Bagged

Smote ➤ Basic idea: ➤ Generate synthetic points for the minor class between the points in the minor class and their k nearest neighbors ➤ Assumption: ➤ The points between the minor class point and its k nearest neighbors still belong to the minor class

Algorithm ➤ Parameter: ➤ N%: Amount of SMOTE instance ➤ k: number of nearest neighbors ➤ Algorithm: ➤ Identify the minor classes ➤ Calculate n nearest neighbor for each point in the minor class ➤ Populate the minor class by creating synthetic minor class examples

➤ N: 500 ➤ k: 4 SMOTE

Undersampling

Experiment NameClassSizeDimensionPrevalenceSource haberman230630.26UCI cmc31473240.23UCI satimage64435360.09Statlog car41728210.04UCI dna320001800.23Statlog

Experiment

Result ➤ Micro-averaged F2 Measure:

Result ➤ Macro-averaged F2 Measure:

Discussion & Analysis ➤ Tradeoff between minority and majority. ➤ Bagged undersampling performs better than basic undersampling. ➤ Trend: ➤ Smaller set size, smaller dimension, and smaller minority prevalence leads to better performance improvement.

Thank you Q&A

Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994.

Similar presentations

Presentation on theme: "Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994.

Similar presentations

Presentation on theme: "Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994."— Presentation transcript:

Similar presentations

About project

Feedback