Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN

Slides:

Advertisements

Similar presentations

A Robust Bagging Method using Median as a Combination Rule Zaman Md. Faisal and Hideo Hirose Department of Information Design and Informatics Kyushu Institute.

Advertisements

Random Forest Predrag Radenković 3237/10

Co Training Presented by: Shankar B S DMML Lab

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Sparse vs. Ensemble Approaches to Supervised Learning

2D1431 Machine Learning Boosting.

A Brief Introduction to Adaboost

Ensemble Learning: An Introduction

Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?

Machine Learning: Ensemble Methods

Sparse vs. Ensemble Approaches to Supervised Learning

Rotation Forest: A New Classifier Ensemble Method 交通大學電子所蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.

Ensemble Learning (2), Tree and Forest

1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.

Ensembles of Classifiers Evgueni Smirnov

Machine Learning CS 165B Spring 2012

Issues with Data Mining

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Computational Intelligence: Methods and Applications Lecture 36 Meta-learning: committees, sampling and bootstrap. Włodzisław Duch Dept. of Informatics,

Ensemble Methods: Bagging and Boosting

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CLASSIFICATION: Ensemble Methods

Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.

Ensemble Methods in Machine Learning

Konstantina Christakopoulou Liang Zeng Group G21

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

COMP24111: Machine Learning Ensemble Models Gavin Brown

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Combining Bagging and Random Subspaces to Create Better Ensembles

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Data Mining Practical Machine Learning Tools and Techniques

Bagging and Random Forests

Week 2 Presentation: Project 3

Eco 6380 Predictive Analytics For Economists Spring 2016

Chapter 13 – Ensembles and Uplift

Trees, bagging, boosting, and stacking

COMP61011 : Machine Learning Ensemble Models

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Basic machine learning background with Python scikit-learn

Ungraded quiz Unit 6.

A “Holy Grail” of Machine Learing

Combining Base Learners

Data Mining Practical Machine Learning Tools and Techniques

Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Multiple Decision Trees ISQS7342

Ensemble learning.

Support Vector Machine _ 2 (SVM)

Ensemble learning Reminder - Bagging of Trees Random Forest

Ensemble Methods: Bagging.

CS639: Data Management for Data Science

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Outlines Introduction & Objectives Methodology & Workflow

Presentation transcript:

Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN Effect of Subsampling Rate on Subbagging and Related Ensembles of Stable Classifiers Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN

Contents ensemble learning bagging what is subsampling? subagging double bagging subsample in double bagging = double subagging bias – variance of a learning algorithm what is stable learning algorithm experiments and results conclusion

Ensemble Learning Ensemble learning refers to a collection of methods that learn a target function by training a number of individual learners and combining their predictions. Accuracy: a more reliable mapping can be obtained by combining the output of multiple “experts” Efficiency: a complex problem can be decomposed into multiple sub-problems that are easier to understand and solve. Example of ensemble methods : Bagging, Boosting, Double Bagging, Random Forest, Rotation Forest.

Bagging

Dataset: Extended Moments 1-Phase, 2-Phase and 6-Phase Bagging(Bootstrap Aggregating) Bagging uses the bootstrapping to generate multiple versions of the training set; then build predictor on each version. Then the prediction of these classifiers are combined or aggregated to get the final decision rule. Bagging is executed as follows: 1. Repeat for b=1, . . . B a) Take a bootstrap replicate Xb of the training set XTRAIN . b) Construct a base classifier Cb(x). 2. Combine base classifiers Cb(x), b=1,2, . . . B; by the simple majority rule to a final decision rule CCOMB.

Bagging(Bootstrap Aggregating) architecture Using bootstrapping create multiple training sets T1 T2 T3 TB-1 TB O1 O2 O3 OB-1 OB Create multiple version of the baseclassifiers The out-of bag samples C1 C2 C3 CB-1 CB w2 w1 w2 w1 w2 w1 w1 w2 w1 w2 w1 w2 Majority voting Classifier outputs Standard Bagging Procedure CCOM

Subsampling

Subsampling-definition Subsampling is a computationally intensive resampling method. In bootstrap we take samples of size n out of n, where n is the size of the training sample; where as in subsampling we take samples of size m out of n. In subsampling the sampling is without replacement for each sample, unlike the bootstrapping.

Subsampling-example Let T be a training set with n elements. A subsample Tb can be created from T choosing m elements from T randomly, without replacement. In the following example we have created an example with 5 subsamples, each having 3 instances, which is half of the original training sample. T T1 T2 T3 T4 T5 X(1) X(2) X(3) X(4) X(5) X(6) X(3) X(2) X(5) X(2) X(3) X(1) X(1) X(6) X(4) X(5) X(2) X(1) X(6) X(4) X(5) Example of 5 subsamples

Subsampling Ratio-definition In the example we have subsampled half of the training sample size for each subsample. This is called the subsampling ratio. This is denoted with ρ. So if ρ = 0.4 and training sample size is N, then each subsample shall have ρxN instances.

Subagging

Subagging Subbagging ( SUBsample AGGregatING) Subbagging was proposed by P. Bühlman in 2003. In Subbagging : 1) Use ,”subsamples” to generate multiple training sets instead of bootstrap samples. 2) In the case of CART, it performs quite similar to Bagging. 3) When the size of each subsample is half of the training set then the subbagging with CART performs alike Bagging with CART.

Double Bagging

Double Bagging Double Bagging was first proposed by Torsten Hothorn in 2002. The main idea of double bagging lies in increasing (adding) additional predictors with the original predictors. They used LDA as the additional classifier model. These additional predictors are generated from the out-of-bag sample. In bagging in each bootstrap replicate 63% of the original training instances are sampled, where as the rest (37%) are unsampled; these samples are called out-of-bag samples (OOBS). In Double bagging classifiers models are built using these OOBS and then trained back on the bootstrap replicates to generate additional predictors.

Double Bagging-Algorithm In general the Double Bagging algorithm is performed as the following steps: Loop Start: For b = 1,2, … B Step 1: Generate bootstrap sample from the training set. Step 2: From the out-of-bag sample of the construct a classifier model. Step 3a: Use this additional classifier on the bth bootstrap sample to generate additional predictors. Step 3b: Do the same for a testing instance x, and generate additional predictors with x. Step 4: Build a tree classifier model with bth bootstrap replicate and the additional predictors. Finish. Step 5: Combine all the tree models using, “average” rule. Step 6: Classify a test instance x with the additional predictors using the combined tree( tree ensemble).

Double Bagging-architecture Training Set N = No. of Observations in Data α = percentage of observations T Data N*(1-α) Test Test Set Step 1: T1 O1 T2 O2 TB OB … Multiple bootstrap Sets Test Set Step 2: Training Classifier models using out-of-bag samples Model1 Model2 ModelB Step 3a and 3b: Using these classifiers in Bootstrap samples and Test set to get additional predictors C1 T1 C2 T2 CB TB Building DT ensemble with the additional predictors DT1 DT2 DTB Combine the DT using average rule Test TC1 TCB , … CCOMB

Subsample in Double Bagging-Algorithm In Double Bagging instead of bootstrap samples, we can use subsamples. This has two major advantages: it will enlarge the out-of-bag sample size, which entails a better learning of the additional classifier. b) the time complexity of the ensemble learning will be reduced. N = No. of Observations in Data ρ = Subsampling Ratio Sampling without replacement T N*ρ = size of subsample T Data O N*(1-ρ) = size of out-of- bag sample O So if ρ = 0.5 then the size of the OOBS will be larger than the usual bagging OOBS and in addition to that the size of the subsample will be smaller, which ensure that the training time of the ensemble will be less.

Bias-Variance of a Learning Algorithm

Bias and Variance of a learning algorithm Bias  systematic error component (independent of the learning sample) Variance  error due to the variability of the model with respect to the learning sample randomness Intrinsic Error  There are errors due to bias and errors due to variance Error = Bias2 + Variance + Intrinsic Error

Stable learning algorithm - Bias – variance point of view A learning algorithm is called stable if it has high bias but low variance. This means that in each prediction problem the predicted examples of that algorithm will not differ much. Example: Linear classifiers, Nearest Neighbor classifiers, Support Vector Machine, e.t.c. In the opposite a learning algorithm is called instable if it has low bias but high variance. Example: Decision Tree.

Experiments and Results

Experiment and Results We have used three additional classifier models in double bagging with different subsamples ratios: Linear Support Vector Machine (LSVM) Stable Linear Discriminant Classifier (sLDA) Logistic Linear Classifier (LogLC)

Experiment and Results In the experiments we have used five different subsampling ratios, ρ = 0.2, 0.3, 0.5, 0.65, 0.75, 0.8 We have five datasets from UCI Machine Learning Repository. We have used 10-Crossvalidation to compute the errors of the methods. Table: Descriptions of the datasets Dataset N Classes Features Diabetes 768 2 8 German 1000 20 Glass 214 7 19 Heart 297 5 13 Ion 351 34

Experiment and Results- Diabetes Dataset Results Double Subagging Results Subagging Results Misclassification Error Misclassification Error Subsample Ratios Subsample Ratios Figure: Misclassification Error of Double Subagging and Subagging in Diabetes Data with different stable classifieirs

Experiment and Results- German Dataset Results Double Subagging Results Subagging Results Figure: Misclassification Error of Double Subagging and Subagging in German Data with different stable classifieirs

Experiment and Results- Glass Dataset Results Double Subagging Results Subagging Results Misclassification Error Misclassification Error Subsample Ratios Subsample Ratios Figure: Misclassification Error of Double Subagging and Subagging in Glass Data with different stable classifieirs

Experiment and Results- Heart Dataset Results Double Subagging Results Subagging Results Misclassification Error Misclassification Error Subsample Ratios Subsample Ratios Figure: Misclassification Error of Double Subagging and Subagging in Heart Data with different stable classifieirs

Experiment and Results- Ion Dataset Results Double Subagging Results Subagging Results Misclassification Error Misclassification Error Subsample Ratios Subsample Ratios Figure: Misclassification Error of Double Subagging and Subagging in Ion Data with different stable classifieirs

Conclusion

Conclusion In almost all datasets, double subagging performed quite better than subagging. Double Subagging performed well with very small subsample ratios ρ= 0.3. With subsample ratio ρ= 0.65~0.8 the performance of double subagging is BAD. Performance of LSVM and Loglc are competitive as additional classifier in Double Subagging. In subagging all the classifiers performed very competitively; with sLDA showing slightly better performance than LSVM and Loglc. In case of subagging, it performed well with larger subsample ratios ρ= 0.75 and 0.8 in almost all datasets (exception is Heart dataset). With very small subsample ratios subagging performed very BAD. There is an opposite relationship in the performance of Double subagging and subagging. For each dataset for each classifier, double subagging performed best with the subsample ratio ρLOW= 1-ρHIGH, where ρHIGH is the subsample ratio with which the subagging performed best.

Thank you