Bagging and Boosting in Data Mining Carolina Ruiz

Bagging and Boosting in Data Mining Carolina Ruiz ruiz@cs.wpi.edu http://www.cs.wpi.edu/~ruiz

2 Motivation and Background Problem Definition: Given: a dataset of instances and a target concept Find: a model (e.g. set of association rules, decision tree, neural network) that helps in predicting the classification of unseen instances. Difficulties: The model should be stable (i.e. shouldn’t depend too much on input data used to construct it) The model should be a good predictor (difficult to achieve when input dataset is small)

3 Two Approaches Bagging (Bootstrap Aggregating) Leo Breiman, UC Berkeley Boosting Rob Schapire, ATT Research Jerry Friedman, Stanford U.

4 Bagging Model Creation: Create bootstrap replicates of the dataset and fit a model to each one Prediction: Average/vote predictions of each model Advantages Stabilizes “unstable” methods Easy to implement, parallelizable.

5 Bagging Algorithm 1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models

6 Boosting Creating the model: Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it. Prediction: “Merge” the models in the sequence Advantages: Improves classification accuracy

7 Generic Boosting Algorithm 1. Equally weight all instance in dataset 2. For I = 1 to T 2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances 3. Merge the models in the sequence to obtain the final model

8 Conclusions and References Boosted naïve Bayes tied for first place in KDD-cup 1997 Reference: “Combining Estimators to Improve Performance” KDD-99 tutorial notes John F. Elder Greg Ridgeway

Bagging and Boosting in Data Mining Carolina Ruiz

Similar presentations

Presentation on theme: "Bagging and Boosting in Data Mining Carolina Ruiz"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bagging and Boosting in Data Mining Carolina Ruiz

Similar presentations

Presentation on theme: "Bagging and Boosting in Data Mining Carolina Ruiz"— Presentation transcript:

Similar presentations

About project

Feedback