Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining Bagging and Random Subspaces to Create Better Ensembles

Similar presentations


Presentation on theme: "Combining Bagging and Random Subspaces to Create Better Ensembles"— Presentation transcript:

1 Combining Bagging and Random Subspaces to Create Better Ensembles
Panče Panov, Sašo Džeroski Jožef Stefan Institute

2 Outline Motivation Overview of Randomization Methods for constructing Ensembles (bagging, random subspace method, random forests) Combining Bagging and Random Subspaces Experiments and results Summary and further work

3 Motivation Random Forests is one of best performing ensemble methods
Use random sub samples of the training data Use randomized base level algorithm Our proposal is to use similar approach Combination of bagging and random subspace method to achieve similar effect Advantages: The method is applicable to any base level algorithm There is no need of randomizing the base level algorithm

4 Randomization methods for constructing ensembles
Find set of base-level algorithms that are diverse in their decisions and complement each other Different possibilities bootstrap sampling random subset of features randomized version of the base-level algorithms

5 Bagging Introduced by Breiman in 1996
X11 X12 X13 X14 X1n ENSEMBLE Learning algorithm Classifier C1 Introduced by Breiman in 1996 Based on bootstraping with replacement Usefull to use with unstable algorithms (e.g. decision trees) Training set S . X1 X2 X3 X4 Xn S1 X21 X22 X23 X24 X2n S2 Learning algorithm Classifier C2 Sb Xb1 Xb2 Xb3 Xb4 Xbn Learning algorithm Classifier Cb

6 Random Subspace Method
ENSEMBLE S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ Introduced by Ho in 1998 Modification of the training data is in the feature space Usefull to use with high dimensional data Learning algorithm Classifier C1 . X1 X2 X3 X4 Xn S’1 S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ S’2 Learning algorithm Classifier C2 S’b S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ Learning algorithm Classifier Cb

7 Random Forest Introduced by Breiman in 2001
X11 X12 X13 X14 X1n ENSEMBLE Classifier C1 Introduced by Breiman in 2001 Particular implementation of bagging where base level algorithm is a random tree Training set S . X1 X2 X3 X4 Xn Random Tree S1 X21 X22 X23 X24 X2n S2 Classifier C2 Random Tree Sb Xb1 Xb2 Xb3 Xb4 Xbn Classifier Cb Random Tree

8 Combining Bagging and Random Subspaces
Training sets are generated on the basis of bagging and random subspaces First we perform bootstrap sampling with replication then we perform random feature subset selection on the bootstrap samples The new algorithm is named SubBag

9 Training set S . X1 X2 X3 X4 Xn

10 b – number of bootstrap replicates S1 Training set S X1 S1 X2 X3 S2 X4
P S1 X13 X14 X12 X11 X1n Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn Bootstrap sampling with replacement

11 b – number of bootstrap replicates Random Subspace selection S1
41 Random Subspace selection 1 2 3 P S1 X13 X14 X12 X11 X1n Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn Bootstrap sampling with replacement

12 b – number of bootstrap replicates Random Subspace selection S1 S1 S’1
2 3 4 P S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ S1 X13 X14 X12 X11 X1n S’1 Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn S’b Bootstrap sampling with replacement

13 b – number of bootstrap replicates Random Subspace selection S1 S1
2 3 4 P S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ S1 X13 X14 X12 X11 X1n Learning algorithm S’1 Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n Learning algorithm S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn Learning algorithm S’b Bootstrap sampling with replacement

14 b – number of bootstrap replicates Random Subspace selection S1
P features Random Subspace selection P’ features (P’<P) 1 2 4 P 1 2 3 S1 X13 X14 X12 X11 X1n X’11 X’12 Learning algorithm Classifier C1 S’1 X’13 X’14 S1 Training set S X’1n . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n Learning algorithm Classifier C2 S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn Learning algorithm Classifier Cb S’b Bootstrap sampling with replacement

15 Experiments 19 datasets from UCI Repository
WEKA environment used for experiments Comparison of SubBag (proposed method) to: Random Subspace Method Bagging Random Forest Three different base-level algorithms used J48 – decision tree JRip – rule learning IBk - nearest neighbor 10 –fold cross-validation was performed

16 Results Note Bold Best performance for a given dataset Statistically
significant degradation compared to SubBag improvement

17 Results Bold Best performance for a given dataset Statistically
significant degradation compared to SubBag improvement

18 Results Bold Best performance for a given dataset Statistically
significant degradation compared to SubBag improvement

19 Results – Wilcoxon test
Predictive performance using J48 as base level classifier Predictive performance using JRip as base level classifier Predictive performance using IBk as base level classifier

20 Summary SubBag is comparable to Random Forests in case of J48 as base and better than Bagging and Random Subspaces SubBag is comparable to Bagging and better than Random Subspaces in case of JRip SubBag is better than Bagging and Random Subspaces in case of IBk

21 Further work Investigate the diversity of ensemble and compare it with other methods Use different combinations of bagging and random subspaces (e.g. bags of RSM ensembles and RSM ensembles of bags) Compare bagged ensembles of randomized algorithms (e.g. rules)


Download ppt "Combining Bagging and Random Subspaces to Create Better Ensembles"

Similar presentations


Ads by Google