Download presentation
Published byVirgil Short Modified over 7 years ago
1
Combining Bagging and Random Subspaces to Create Better Ensembles
Panče Panov, Sašo Džeroski Jožef Stefan Institute
2
Outline Motivation Overview of Randomization Methods for constructing Ensembles (bagging, random subspace method, random forests) Combining Bagging and Random Subspaces Experiments and results Summary and further work
3
Motivation Random Forests is one of best performing ensemble methods
Use random sub samples of the training data Use randomized base level algorithm Our proposal is to use similar approach Combination of bagging and random subspace method to achieve similar effect Advantages: The method is applicable to any base level algorithm There is no need of randomizing the base level algorithm
4
Randomization methods for constructing ensembles
Find set of base-level algorithms that are diverse in their decisions and complement each other Different possibilities bootstrap sampling random subset of features randomized version of the base-level algorithms
5
Bagging Introduced by Breiman in 1996
X11 X12 X13 X14 X1n ENSEMBLE Learning algorithm Classifier C1 Introduced by Breiman in 1996 Based on bootstraping with replacement Usefull to use with unstable algorithms (e.g. decision trees) Training set S . X1 X2 X3 X4 Xn S1 X21 X22 X23 X24 X2n S2 Learning algorithm Classifier C2 Sb Xb1 Xb2 Xb3 Xb4 Xbn Learning algorithm Classifier Cb
6
Random Subspace Method
ENSEMBLE S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … Introduced by Ho in 1998 Modification of the training data is in the feature space Usefull to use with high dimensional data Learning algorithm Classifier C1 . X1 X2 X3 X4 Xn S’1 S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … S’2 Learning algorithm Classifier C2 S’b S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … Learning algorithm Classifier Cb
7
Random Forest Introduced by Breiman in 2001
X11 X12 X13 X14 X1n ENSEMBLE Classifier C1 Introduced by Breiman in 2001 Particular implementation of bagging where base level algorithm is a random tree Training set S . X1 X2 X3 X4 Xn Random Tree S1 X21 X22 X23 X24 X2n S2 Classifier C2 Random Tree Sb Xb1 Xb2 Xb3 Xb4 Xbn Classifier Cb Random Tree
8
Combining Bagging and Random Subspaces
Training sets are generated on the basis of bagging and random subspaces First we perform bootstrap sampling with replication then we perform random feature subset selection on the bootstrap samples The new algorithm is named SubBag
9
Training set S . X1 X2 X3 X4 Xn
10
b – number of bootstrap replicates S1 Training set S X1 S1 X2 X3 S2 X4
… P S1 X13 X14 X12 X11 X1n Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn Bootstrap sampling with replacement
11
b – number of bootstrap replicates Random Subspace selection S1
41 Random Subspace selection 1 2 3 … P S1 X13 X14 X12 X11 X1n Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn Bootstrap sampling with replacement
12
b – number of bootstrap replicates Random Subspace selection S1 S1 S’1
2 3 4 … P S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … S1 X13 X14 X12 X11 X1n S’1 Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn S’b Bootstrap sampling with replacement
13
b – number of bootstrap replicates Random Subspace selection S1 S1
2 3 4 … P S1 X’13 X’14 X’12 X’11 X’1n 1 2 3 P’ … S1 X13 X14 X12 X11 X1n Learning algorithm S’1 Training set S . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n Learning algorithm S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn Learning algorithm S’b Bootstrap sampling with replacement
14
b – number of bootstrap replicates Random Subspace selection S1
P features Random Subspace selection P’ features (P’<P) 1 2 4 … P 1 2 3 … S1 X13 X14 X12 X11 X1n X’11 X’12 Learning algorithm Classifier C1 S’1 X’13 X’14 S1 Training set S X’1n . X1 X2 X3 X4 Xn S1 X23 X24 X22 X21 X2n S1 X’23 X’24 X’22 X’21 X’2n Learning algorithm Classifier C2 S’2 S2 Sb Xb3 Xb4 Xb2 Xb1 Xbn S1 X’b3 X’b4 X’b2 X’b1 X’bn Learning algorithm Classifier Cb S’b Bootstrap sampling with replacement
15
Experiments 19 datasets from UCI Repository
WEKA environment used for experiments Comparison of SubBag (proposed method) to: Random Subspace Method Bagging Random Forest Three different base-level algorithms used J48 – decision tree JRip – rule learning IBk - nearest neighbor 10 –fold cross-validation was performed
16
Results Note Bold Best performance for a given dataset Statistically
significant degradation compared to SubBag improvement
17
Results Bold Best performance for a given dataset Statistically
significant degradation compared to SubBag improvement
18
Results Bold Best performance for a given dataset Statistically
significant degradation compared to SubBag improvement
19
Results – Wilcoxon test
Predictive performance using J48 as base level classifier Predictive performance using JRip as base level classifier Predictive performance using IBk as base level classifier
20
Summary SubBag is comparable to Random Forests in case of J48 as base and better than Bagging and Random Subspaces SubBag is comparable to Bagging and better than Random Subspaces in case of JRip SubBag is better than Bagging and Random Subspaces in case of IBk
21
Further work Investigate the diversity of ensemble and compare it with other methods Use different combinations of bagging and random subspaces (e.g. bags of RSM ensembles and RSM ensembles of bags) Compare bagged ensembles of randomized algorithms (e.g. rules)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.