Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bagging and Boosting in Data Mining Carolina Ruiz

Similar presentations


Presentation on theme: "Bagging and Boosting in Data Mining Carolina Ruiz"— Presentation transcript:

1 Bagging and Boosting in Data Mining Carolina Ruiz ruiz@cs.wpi.edu http://www.cs.wpi.edu/~ruiz

2 2 Motivation and Background Problem Definition: Given: a dataset of instances and a target concept Find: a model (e.g. set of association rules, decision tree, neural network) that helps in predicting the classification of unseen instances. Difficulties: The model should be stable (i.e. shouldn’t depend too much on input data used to construct it) The model should be a good predictor (difficult to achieve when input dataset is small)

3 3 Two Approaches Bagging (Bootstrap Aggregating) Leo Breiman, UC Berkeley Boosting Rob Schapire, ATT Research Jerry Friedman, Stanford U.

4 4 Bagging Model Creation: Create bootstrap replicates of the dataset and fit a model to each one Prediction: Average/vote predictions of each model Advantages Stabilizes “unstable” methods Easy to implement, parallelizable.

5 5 Bagging Algorithm 1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models

6 6 Boosting Creating the model: Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it. Prediction: “Merge” the models in the sequence Advantages: Improves classification accuracy

7 7 Generic Boosting Algorithm 1. Equally weight all instance in dataset 2. For I = 1 to T 2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances 3. Merge the models in the sequence to obtain the final model

8 8 Conclusions and References Boosted naïve Bayes tied for first place in KDD-cup 1997 Reference: “Combining Estimators to Improve Performance” KDD-99 tutorial notes John F. Elder Greg Ridgeway


Download ppt "Bagging and Boosting in Data Mining Carolina Ruiz"

Similar presentations


Ads by Google