Download presentation
Presentation is loading. Please wait.
1
Popular Ensemble Methods: An Empirical Study David Opitz and Richard Maclin Presented by Scott Wespi 5/22/07
2
Outline Ensemble methods Classifier Ensembles Bagging vs Boosting Results Conclusion
3
Ensemble Methods Sets of individually trained classifiers whose predictions are combined when classifying new data –Bagging (1996) –Boosting (1996) How are bagging and boosting influenced by the learning algorithm? –Decision trees –Neural networks *Note: Paper is from 1999
4
Classifier Ensembles Goal: highly accurate individual classifiers that disagree as much as possible Bagging and boosting create disagreement
5
Bagging vs. Boosting Training Data 1, 2, 3, 4, 5, 6, 7, 8 Bagging training set Set 1: 2, 7, 8, 3, 7, 6, 3, 1 Set 2: 7, 8, 5, 6, 4, 2, 7, 1 Set 3: 3, 6, 2, 7, 5, 6, 2, 2 Set 4: 4, 5, 1, 4, 6, 4, 3, 8 Boosting training set Set 1: 2, 7, 8, 3, 7, 6, 3, 1 Set 2: 1, 4, 5, 4, 1, 5, 6, 4 Set 3: 7, 1, 5, 8, 1, 8, 1, 4 Set 4: 1, 1, 6, 1, 1, 3, 1, 5
6
Ada-Boosting vs Arcing Ada-Boosting –Every sample has 1/N weight initially, increases every time sample was skipped or misclassified Arcing –If m i = number of times ith example was misclassified
7
stansimplebagarcadastanbagarcada breast-cancer-w3.43.53.43.8453.73.5 credit-a14.813.713.815.815.714.913.41413.7 credit-g27.924.724.225.225.329.625.225.926.7 diabetes23.92322.824.423.327.824.42625.7 glass38.635.233.13231.131.325.825.523.3 heart-cleveland18.617.41720.721.124.319.521.520.8 hepatitis20.119.517.81919.721.217.316.917.2 house-votes-844.94.84.15.15.33.6 54.8 hypo6.46.2 0.50.4 ionosphere9.77.59.27.68.38.16.466.1 iris4.33.943.73.95.24.95.15.6 kr-vs-kp2.30.8 0.40.30.6 0.30.4 labor6.13.24.23.2 16.513.71311.6 letter1812.810.55.74.61474.13.9 promoters-9365.34.844.54.612.810.66.86.4 ribosome-bind9.38.58.48.18.211.210.29.39.6 satellite1310.910.69.91013.89.98.68.4 segmentation6.65.35.43.53.33.731.71.5 sick5.95.7 4.74.51.31.21.11 sonar16.615.916.812.91329.725.321.521.7 soybean9.26.76.96.76.387.97.26.7 splice4.743.944.25.95.45.15.3 vehicle24.921.220.719.119.729.427.122.522.9
8
Neural Networks Ada-Boosting Arcing Bagging White bar represents 1 standard deviation
9
Decision Trees
10
Composite Error Rates
11
Neural Networks: Bagging vs Simple
12
NN DT Box represents reduction in error Ada-Boost: Neural Networks vs. Decision Trees
13
Arcing
14
Bagging
15
Noise Hurts boosting the most
16
Conclusions Performance depends on data and classifier In some cases, ensembles can overcome bias of component learning algorithm Bagging is more consistent than boosting Boosting can give much better results on some data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.