Combining Bagging and Random Subspaces to Create Better Ensembles Panče Panov, Sašo Džeroski Jožef Stefan Institute
Outline Motivation Overview of Randomization Methods for constructing Ensembles (bagging, random subspace method, random forests) Combining Bagging and Random Subspaces Experiments and results Summary and further work
Motivation Random Forests is one of best performing ensemble methods Use random sub samples of the training data Use randomized base level algorithm Our proposal is to use similar approach Combination of bagging and random subspace method to achieve similar effect Advantages: The method is applicable to any base level algorithm There is no need of randomizing the base level algorithm
Randomization methods for constructing ensembles Find set of base-level algorithms that are diverse in their decisions and complement each other Different possibilities bootstrap sampling random subset of features randomized version of the base-level algorithms
Bagging Introduced by Breiman in 1996 X11 X12 X13 X14 X1n ENSEMBLE Learning algorithm Classifier C1 Introduced by Breiman in 1996 Based on bootstraping with replacement Usefull to use with unstable algorithms (e.g. decision trees) Training set S . X1 X2 X3 X4 Xn S1 X21 X22 X23 X24 X2n S2 Learning algorithm Classifier C2 Sb Xb1 Xb2 Xb3 Xb4 Xbn Learning algorithm Classifier Cb
Random Subspace Method ENSEMBLE S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … Introduced by Ho in 1998 Modification of the training data is in the feature space Usefull to use with high dimensional data Learning algorithm Classifier C1 . X1 X2 X3 X4 Xn S’1 S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … S’2 Learning algorithm Classifier C2 S’b S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … Learning algorithm Classifier Cb
Random Forest Introduced by Breiman in 2001 X11 X12 X13 X14 X1n ENSEMBLE Classifier C1 Introduced by Breiman in 2001 Particular implementation of bagging where base level algorithm is a random tree Training set S . X1 X2 X3 X4 Xn Random Tree S1 X21 X22 X23 X24 X2n S2 Classifier C2 Random Tree Sb Xb1 Xb2 Xb3 Xb4 Xbn Classifier Cb Random Tree
Combining Bagging and Random Subspaces Training sets are generated on the basis of bagging and random subspaces First we perform bootstrap sampling with replication then we perform random feature subset selection on the bootstrap samples The new algorithm is named SubBag
Training set S . X1 X2 X3 X4 Xn
b – number of bootstrap replicates S1 Training set S X1 S1 X2 X3 S2 X4 … P S1 X13 X14 X12 X11 X1n Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn Bootstrap sampling with replacement
b – number of bootstrap replicates Random Subspace selection S1 41 Random Subspace selection 1 2 3 … P S1 X13 X14 X12 X11 X1n Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn Bootstrap sampling with replacement
b – number of bootstrap replicates Random Subspace selection S1 S1 S’1 2 3 4 … P S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … S1 X13 X14 X12 X11 X1n S’1 Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn S’b Bootstrap sampling with replacement
b – number of bootstrap replicates Random Subspace selection S1 S1 2 3 4 … P S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … S1 X13 X14 X12 X11 X1n Learning algorithm S’1 Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n Learning algorithm S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn Learning algorithm S’b Bootstrap sampling with replacement
b – number of bootstrap replicates Random Subspace selection S1 P features Random Subspace selection P’ features (P’<P) 1 2 4 … P 1 2 3 … S1 X13 X14 X12 X11 X1n X’11 X’12 Learning algorithm Classifier C1 S’1 X’13 X’14 S1 Training set S X’1n . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n Learning algorithm Classifier C2 S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn Learning algorithm Classifier Cb S’b Bootstrap sampling with replacement
Experiments 19 datasets from UCI Repository WEKA environment used for experiments Comparison of SubBag (proposed method) to: Random Subspace Method Bagging Random Forest Three different base-level algorithms used J48 – decision tree JRip – rule learning IBk - nearest neighbor 10 –fold cross-validation was performed
Results Note Bold Best performance for a given dataset Statistically significant degradation compared to SubBag improvement
Results Bold Best performance for a given dataset Statistically significant degradation compared to SubBag improvement
Results Bold Best performance for a given dataset Statistically significant degradation compared to SubBag improvement
Results – Wilcoxon test Predictive performance using J48 as base level classifier Predictive performance using JRip as base level classifier Predictive performance using IBk as base level classifier
Summary SubBag is comparable to Random Forests in case of J48 as base and better than Bagging and Random Subspaces SubBag is comparable to Bagging and better than Random Subspaces in case of JRip SubBag is better than Bagging and Random Subspaces in case of IBk
Further work Investigate the diversity of ensemble and compare it with other methods Use different combinations of bagging and random subspaces (e.g. bags of RSM ensembles and RSM ensembles of bags) Compare bagged ensembles of randomized algorithms (e.g. rules)