Download presentation
Presentation is loading. Please wait.
Published byBethany Woods Modified over 8 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced data Advisor: Dr. Hsu Presenter: Hsin-Yi Huang Authors: Yanmin Sun, Mohamed S.Kamel, Andrew K.C. Wong, Yang Wang 2007.PR.21
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology AdaBoost Cost-sensitive boosting algorithms Experiment Conclusion Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Standard classifiers are designed to generalize from training data and output the simplest hypothesis that best fits the data. The simplest hypothesis pays less attention to rare cases in an imbalanced data set. AdaBoost is an accuracy-oriented algorithm, its learning strategy may bias towards the prevalent class as it contributes more to the overall classification accuracy.
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective The AdaBoost algorithm is adapted for advancing the classification of imbalanced data. The authors propose three cost-sensitive boosting algorithms which are introduced cost items into the learning framework of AdaBoost.
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Methodology Man? Woman? … h1h1 h2h2 h3h3 htht D 1 (i) D 2 (i) D 3 (i) D t (i) H man woman α1α1 α2α2 α3α3 α t-1
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Methodology AdaBoost algorithm
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Methodology Cost-sensitive boosting algorithms : : Cost setups
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Experiment Dataset The authors use four medical diagnosis data sets taken from the UCI Machine Learning Database. These four data sets are: Breast cancer data (Cancer), Hepatits data (Hepatits), Pima Indian’s diabetes database (Pima), and Sick-euthyroid data (Sick). All data sets have two output labels: one denotes the disease category which is treated as the positive class, and another represents the normal category. Base classifier C4.5 HPWR
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Experiment
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Conclusion The authors investigate cost-sensitive boosting algorithm for advancing the classification of imbalanced data. Experimental results indicate that AdaC2 is superior to its rivals. Some research issues are open for future investigation To fix cost factors using some more efficient methods. To explore their effectiveness in any other specific domains. To integrating cost values into the framework of RealBoost and to develop cost-sensitive boosting algorithms.
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Comments Advantage … Drawback … Application Classification of imbalanced data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.