Download presentation
Published byJulius Ingmire Modified over 10 years ago
1
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Professor Carolina Ruiz Department of Computer Science WPI Worcester, Massachusetts
2
Constructing predictors/models
1. Given labeled data, use a data mining technique to train a model 2. Given a new unlabeled data instance, use the trained model to predict its label Data new data prediction Techniques: Decision trees Bayesian nets Neural nets … Wish list: - Good predictor: low error Stable: small variations in training data => small variations in resulting model
3
Looking for a good model
Varying data used Varying DM technique/parameters subset of the attributes - different parameters for a technique subset of the data instances - different techniques … - … prediction prediction Data prediction prediction prediction prediction prediction prediction Until a “good” (low error, stable, …) model is found. But, what if a good model is not found? And even if one is found, how can we improve it?
4
Approach: Ensemble of models
Data prediction Form an ensemble of models and combine their predictions into a single prediction
5
Constructing Ensembles – How?
1. Given labeled data, how to construct an ensemble of models? 2. Given a new unlabeled data instance, how to use the ensemble to predict its label? Data new data prediction Data: What (part of the) data to use to train each model in the ensemble? Data Mining Techniques: What technique and/or what parameters to use to train each model? How to combine the individual model predictions into a unified prediction?
6
Several Approaches Bagging (Bootstrap Aggregating) Boosting Stacking
Breiman, UC Berkeley Boosting Schapire, ATT Research (now at Princeton U). Friedman, Stanford U. Stacking Wolpert, NASA Ames Research Center Model Selection Meta-learning Floyd, Ruiz, Alvarez, WPI and Boston College Mixture of Experts in Neural Nets Alvarez, Ruiz, Kawato, Kogel, Boston College and WPI …
7
Bagging (Bootstrap Aggregation) Breiman, UC Berkeley
1. Create bootstrap replicates of the data (e.g., randomly sampled subsets of the data instances) and train a model on each replicate. 2. Given a new unlabeled data instance, input it to each model: data R1 new data prediction data R2 … … … … data Rn the ensemble prediction is the (weighted) average of the individual model predictions (voting system) Usually the same data mining technique is used to train each model May help stabilize models
8
Boosting Schapire, ATT Research/Princeton U. Friedman, Stanford U.
1. Assign equal weights to data instances. 2. Train a model. Increase (decrease) the weight of incorrectly (correctly) predicted data instances. Repeat 2. 3. Given a new unlabeled data instance, run it by the merged model: data new data prediction data’ the ensemble prediction is the prediction of the merged model (e.g., majority vote, weighted average, …) … … … data’’’ Usually same data mining technique May help decrease prediction error
9
Stacking Wolpert, NASA Ames Research Center
1. Train different models on the same data (“Level-0 models”) 2. Train a new (“Level-1”) model with the outputs of the Level-0 models 2. Given a new unlabeled data instance, input it to each Level-0 model: prediction new data data … … … the ensemble prediction is the Level-1 model prediction based on the Level-0 model predictions Level-0 Level-1 Using different parameters and/or different data mining techniques May help reduce prediction error
10
Model Selection Meta-learning Floyd, Ruiz, Alvarez, WPI and Boston College
1. Train different Level-0 models 2. Train a Level-1 model to predict which is the best Level-0 model for a given data instance 2. Given a new unlabeled data instance, input it to the Level-1 model: … prediction … new data … data … … … the ensemble prediction is the prediction of the Level-0 model selected by the Level-1 model for the input data instance Level-0 Level-1 Using different parameters and/or different data mining techniques May help determine what technique/model works best on given data
11
Mixture of Experts Architecture Alvarez, Ruiz, Kawato, Kogel, Boston College and WPI
1. Split data attributes into domain meaningful subgroups: A’, A”, … 2. Create and train a Mixture of Experts Feed-Forward Neural Net: 3. Given a new unlabeled data instance, feed it forward through the mixture of experts A’ A” A’’’ A’ A” A’’’ new data Data prediction the mixture of experts prediction is the output produced by the network ANN layers: input hidden output Note that not all connections between input and hidden nodes are included May help speed-up ANN training without increasing prediction error
12
Conclusions Ensemble methods construct and/or combine collection of predictors with the purpose of improving upon the properties of the individual predictors: stabilize models reduce prediction error aggregate individual predictors that make different errors more resistant to noise
13
References J.F. Elder, G. Ridgeway. “Combining Estimators to Improve Performance” KDD-99 tutorial notes L. Breiman. “Bagging Predictors”. Machine Learning, 24(2), R.E. Schapire. “The strength of weak learnability.” Machine Learning. 5(2), Y. Freund, R. Schapire. “Experiments with a new boosting algorithm.” Proc. of the 13th Intl. Conf. on Machine Learning J. Friedman, T. Hastie, R. Tibshirani. “Additive Logistic Regression: a statistical view of boosting”. Annals of Statistics D.H Wolpert. “Stacked Generalization.” Neural Networks. 5(2), S. Floyd, C. Ruiz, S. A. Alvarez, J. Tseng, and G. Whalen. "Model Selection Meta-Learning for the Prognosis of Pancreatic Cancer", full paper, Proc. 3rd Intl. Conf. on Health Informatics (HEALTHINF 2010), pp S.A. Alvarez, C. Ruiz , T. Kawato, and W. Kogel. Faster neural networks for combined collaborative and content based recommendation. Journal of Computational Methods in Sciences and Engineering (JCMSE). IOS Press. Vol. 11, N. 4, pp
14
The End Questions?
15
Bagging (Bootstrap Aggregation)
Model Creation: Create bootstrap replicates of the dataset and fit a model to each one Prediction: Average/vote predictions of each model Advantages Stabilizes “unstable” methods Easy to implement, parallelizable.
16
Bagging Algorithm 1. Create k bootstrap replicates of the dataset
2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models
17
Boosting Creating the model: Prediction: Advantages:
Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it. Prediction: “Merge” the models in the sequence Advantages: Improves classification accuracy
18
Generic Boosting Algorithm
1. Equally weight all instance in dataset 2. For I = 1 to T 2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances 3. Merge the models in the sequence to obtain the final model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.