Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Investigating the Effect of Sampling Methods for Imbalanced.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Investigating the Effect of Sampling Methods for Imbalanced."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Investigating the Effect of Sampling Methods for Imbalanced Data Distributions Advisor : Dr. Hsu Presenter : Ai-Chen Liao Authors : Show-Jane Yen, Yue-Shi Lee, Cheng-Han Lin and Jia-Ching Ying 2006. ICSMC. Page(s) : 4163 - 4168

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Method  Strategies for handling imbalanced data  Cluster-based under-sampling approach Experimental Result Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  Classification is an important and well-known technique in the field of machine learning, and the training data will significantly influence the classification accuracy.  The classification techniques usually assume that the training samples are uniformly-distributed between different classes.  The training data in real-world applications often are imbalanced class distribution. ex. Fraud detection, risk management, medical research…,etc.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  We propose a cluster-based sampling approach for selecting the representative data as training data to improve the classification accuracy.  We investigate the effect of under-sampling methods in the imbalanced class distribution problem.

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Method ─ Cluster-based under-sampling approach  The main idea is that there are different clusters in a dataset, and each cluster seems to have distinct characteristics. Dataset : 共 1100 筆資料 MA : 共 1000 筆資料 MI : 共 100 筆資料  Cluster 1 MA=500 MI=10 Cluster 2 MA=300 MI=50 Cluster 3 MA=200 MI=40

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Cluster 1 MA=500 MI=10 Cluster 2 MA=300 MI=50 Cluster 3 MA=200 MI=40 Method ─ Cluster-based under-sampling approach Assume that the ratio of Size MA TO Size MI in the training data is set to be 1:1

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Experimental Results

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Conclusion  We propose cluster-based under-sampling approach to solve the imbalanced class distribution problem by using backpropagation neural network.  SBC not only has high classification accuracy on predicting the minority class samples but also has fast execution time.

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Comments Advantage  A novel approach Drawback  Setting necessary parameters Application  Handling imbalanced data

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Method ─ Strategies for handling imbalanced data 修正學習演算法來處理 imbalanced data cost-sensitive learning 將資料進行事前的處理 Multi-classifier committee Resampling  upsizing the minority class (oversampling)  downsizing the majority class (undersampling) MA=48 samples MI = 2 samples MA’s size:MI’s size=1:1 48/2=24  Voting ex.SMOTE


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Investigating the Effect of Sampling Methods for Imbalanced."

Similar presentations


Ads by Google