Download presentation
Presentation is loading. Please wait.
Published byJunior Dylan O’Brien’ Modified over 9 years ago
1
Class Imbalance Classification Implementation 2015-11-24 Group 4 WEI Lili, 20297324 ZENG Gaoxiong, 20279994
2
Outline ➤ Introduction ➤ Algorithm and Implementation ➤ Experiment ➤ Discussion & Analysis
3
Introduction ➤ Class imbalance: ➤ The number of instances from each class is unequal ➤ e.g. Medical situation: Tell if a patient has cancer or not given his health examination results. ➤ Problem ➤ Imbalance training set leads to a discriminative model with regard to poor accuracy performance for minor class However there are many situations that the minor class is of more interest (e.g. cancer detection)
4
Introduction ➤ Existing solutions: ➤ Cost-Sensitive ➤ Different wrong results have different costs. Classify an instance to the class to minimize the cost. ➤ Solve the imbalance problem by increasing the cost for FP of the minor class ➤ Oversampling ➤ Populate the instances in the minor class to get a more balanced data set ➤ Undersampling ➤ Reduce the number of instances in the major class to get a more balanced data set
5
Implementation ➤ Cost-sensitive ➤ Weighted SVM (implemented in libsvm) ➤ Oversampling ➤ SMOTE (Synthetic Minority Over-sampling Technique) ➤ Undersampling ➤ Basic ➤ Bagged
6
Smote ➤ Basic idea: ➤ Generate synthetic points for the minor class between the points in the minor class and their k nearest neighbors ➤ Assumption: ➤ The points between the minor class point and its k nearest neighbors still belong to the minor class
7
Algorithm ➤ Parameter: ➤ N%: Amount of SMOTE instance ➤ k: number of nearest neighbors ➤ Algorithm: ➤ Identify the minor classes ➤ Calculate n nearest neighbor for each point in the minor class ➤ Populate the minor class by creating synthetic minor class examples
8
➤ N: 500 ➤ k: 4 SMOTE
9
Undersampling
11
Experiment NameClassSizeDimensionPrevalenceSource haberman230630.26UCI cmc31473240.23UCI satimage64435360.09Statlog car41728210.04UCI dna320001800.23Statlog
12
Experiment
13
Result ➤ Micro-averaged F2 Measure:
14
Result ➤ Macro-averaged F2 Measure:
15
Discussion & Analysis ➤ Tradeoff between minority and majority. ➤ Bagged undersampling performs better than basic undersampling. ➤ Trend: ➤ Smaller set size, smaller dimension, and smaller minority prevalence leads to better performance improvement.
16
Thank you Q&A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.