Presentation is loading. Please wait.

Presentation is loading. Please wait.

Popular Ensemble Methods: An Empirical Study David Opitz and Richard Maclin Presented by Scott Wespi 5/22/07.

Similar presentations


Presentation on theme: "Popular Ensemble Methods: An Empirical Study David Opitz and Richard Maclin Presented by Scott Wespi 5/22/07."— Presentation transcript:

1 Popular Ensemble Methods: An Empirical Study David Opitz and Richard Maclin Presented by Scott Wespi 5/22/07

2 Outline Ensemble methods Classifier Ensembles Bagging vs Boosting Results Conclusion

3 Ensemble Methods Sets of individually trained classifiers whose predictions are combined when classifying new data –Bagging (1996) –Boosting (1996) How are bagging and boosting influenced by the learning algorithm? –Decision trees –Neural networks *Note: Paper is from 1999

4 Classifier Ensembles Goal: highly accurate individual classifiers that disagree as much as possible Bagging and boosting create disagreement

5 Bagging vs. Boosting Training Data 1, 2, 3, 4, 5, 6, 7, 8 Bagging training set Set 1: 2, 7, 8, 3, 7, 6, 3, 1 Set 2: 7, 8, 5, 6, 4, 2, 7, 1 Set 3: 3, 6, 2, 7, 5, 6, 2, 2 Set 4: 4, 5, 1, 4, 6, 4, 3, 8 Boosting training set Set 1: 2, 7, 8, 3, 7, 6, 3, 1 Set 2: 1, 4, 5, 4, 1, 5, 6, 4 Set 3: 7, 1, 5, 8, 1, 8, 1, 4 Set 4: 1, 1, 6, 1, 1, 3, 1, 5

6 Ada-Boosting vs Arcing Ada-Boosting –Every sample has 1/N weight initially, increases every time sample was skipped or misclassified Arcing –If m i = number of times ith example was misclassified

7 stansimplebagarcadastanbagarcada breast-cancer-w3.43.53.43.8453.73.5 credit-a14.813.713.815.815.714.913.41413.7 credit-g27.924.724.225.225.329.625.225.926.7 diabetes23.92322.824.423.327.824.42625.7 glass38.635.233.13231.131.325.825.523.3 heart-cleveland18.617.41720.721.124.319.521.520.8 hepatitis20.119.517.81919.721.217.316.917.2 house-votes-844.94.84.15.15.33.6 54.8 hypo6.46.2 0.50.4 ionosphere9.77.59.27.68.38.16.466.1 iris4.33.943.73.95.24.95.15.6 kr-vs-kp2.30.8 0.40.30.6 0.30.4 labor6.13.24.23.2 16.513.71311.6 letter1812.810.55.74.61474.13.9 promoters-9365.34.844.54.612.810.66.86.4 ribosome-bind9.38.58.48.18.211.210.29.39.6 satellite1310.910.69.91013.89.98.68.4 segmentation6.65.35.43.53.33.731.71.5 sick5.95.7 4.74.51.31.21.11 sonar16.615.916.812.91329.725.321.521.7 soybean9.26.76.96.76.387.97.26.7 splice4.743.944.25.95.45.15.3 vehicle24.921.220.719.119.729.427.122.522.9

8 Neural Networks Ada-Boosting Arcing Bagging White bar represents 1 standard deviation

9 Decision Trees

10 Composite Error Rates

11 Neural Networks: Bagging vs Simple

12 NN DT Box represents reduction in error Ada-Boost: Neural Networks vs. Decision Trees

13 Arcing

14 Bagging

15 Noise Hurts boosting the most

16 Conclusions Performance depends on data and classifier In some cases, ensembles can overcome bias of component learning algorithm Bagging is more consistent than boosting Boosting can give much better results on some data


Download ppt "Popular Ensemble Methods: An Empirical Study David Opitz and Richard Maclin Presented by Scott Wespi 5/22/07."

Similar presentations


Ads by Google