Download presentation
Presentation is loading. Please wait.
Published byPreston Carter Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Empirical Study of Learning from Imbalanced Data Using Random Forest Presenter : Ai-Chen Liao Authors : Taghi M. Khoshgofattr, Moiz Golawala, and Jason Van Hulse 2007. ICTAI. Page : 310 - 317
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Method Experiment Experimental Result Conclusion Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation A tree A forest
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Motivation RF is a relatively new learner, only preliminary experimentation on the construction of random forest classifiers in the context of imbalanced data has been reported in previous work. What should be the recommended default number of trees in the ensemble? What should the recommended value be for the number of attributes? How does the RF learner perform on imbalanced data when compared with other commonly-used learners? NB, SVM, KNN, C4.5, etc. …
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Objective This work, is the first to conduct comprehensive experimentation with the RF learner in Weka and recommend empirically proven default values for the numTrees and numFeatures parameters.
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method ─ RF 6 Dataset : 取後放回 1 … 2 1 2 3 4 5
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method ─ Experimental Datasets Metrics : The area under the ROC curve (AUC) The Kolmogorov-Smirnov (KS)
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. numFeatures numTrees 8 Experimental Results Phase 1: Selecting an Appropriate RF Learner
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental Results Phase 2: Comparison of RF-100 to Other Learners 9 Good !
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Conclusion The contribution of this study is to provide an extensive empirical evaluation of RF learners built from imbalanced data. The parameters for the RF learners were chosen to ensure good performance in many different circumstances and to be reasonable for the imbalanced datasets.
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Comments Advantage Building many learners in these experiments let me believe in the reliability of their experimental results. Drawback Due to space restrictions many experiments results are not included here. Application Handling imbalanced data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.