Download presentation
Presentation is loading. Please wait.
1
Bagging and Boosting in Data Mining Carolina Ruiz ruiz@cs.wpi.edu http://www.cs.wpi.edu/~ruiz
2
2 Motivation and Background Problem Definition: Given: a dataset of instances and a target concept Find: a model (e.g. set of association rules, decision tree, neural network) that helps in predicting the classification of unseen instances. Difficulties: The model should be stable (i.e. shouldn’t depend too much on input data used to construct it) The model should be a good predictor (difficult to achieve when input dataset is small)
3
3 Two Approaches Bagging (Bootstrap Aggregating) Leo Breiman, UC Berkeley Boosting Rob Schapire, ATT Research Jerry Friedman, Stanford U.
4
4 Bagging Model Creation: Create bootstrap replicates of the dataset and fit a model to each one Prediction: Average/vote predictions of each model Advantages Stabilizes “unstable” methods Easy to implement, parallelizable.
5
5 Bagging Algorithm 1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models
6
6 Boosting Creating the model: Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it. Prediction: “Merge” the models in the sequence Advantages: Improves classification accuracy
7
7 Generic Boosting Algorithm 1. Equally weight all instance in dataset 2. For I = 1 to T 2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances 3. Merge the models in the sequence to obtain the final model
8
8 Conclusions and References Boosted naïve Bayes tied for first place in KDD-cup 1997 Reference: “Combining Estimators to Improve Performance” KDD-99 tutorial notes John F. Elder Greg Ridgeway
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.