CLASSIFICATION: Ensemble Methods Combines multiple models Construct multiple classifiers from training set Aggregate their predictions on testing set Meta-algorithm
CLASSIFICATION: Ensemble Methods Improves stability and accuracy Reduces variance Helps avoid overfitting Compensates for poor learning algorithms Uses more computation
ENSEMBLE METHODS: Examples Bagging (bootstrap aggregation) Bagging with MetaCost Random forests Boosting Stacked generalization Usually used on different learning algorithms Bayesian model combination
ENSEMBLE METHODS: Bagging Randomly create samples (with replacement) from a data set Create classifiers (same type) for each sample Run classifiers on testing sample Use majority voting to determine classification of testing sample
ENSEMBLE METHODS: Bagging with MetaCost Used when each model can output probability estimates Probability estimates used to obtain expected cost of each prediction Classifies training instances to minimize the expected cost Learns new classifier
ENSEMBLE METHODS: Random Forests Modification of applying bagging to tree learners Uses only random subsets of features at each split Promotes tree diversity
ENSEMBLE METHODS: Boosting Seeks models that complement one another Combines models of same type New models constructed to better handle those instances incorrectly handled by previous models – focuses on hard to classify examples Uses weighted averaging often adaptively
ENSEMBLE METHODS: Stacked Generalization Introduced by David Wolpert, 1992 Other algorithms trained from training set Stacking (“level-1”) algorithm uses predicitions from base (“level-0”) algorithms as inputs
ENSEMBLE METHODS: Stacked Generalization Employs j-fold cross validation of training set Train and test each of the level-0 algorithms using the split training data to create the level-0 models Test each model on each split to create level-1 data
ENSEMBLE METHODS: Stacked Generalization
ENSEMBLE METHODS: Stacked Generalization Can be used for both supervised and unsupervised learning Best performers in Netflix competition were forms of stacked generalization Can even create multiple levels of stacking(“level-2”, etc.)
ENSEMBLE METHODS: Stacked Generalization Best performers in Netflix competition were forms of stacked generalization Can even create multiple layers (“stacked stacking”) Works best with class probabilities (Tang and Witten, 1999)
ENSEMBLE METHODS: Bayesian Model Combination Built upon Bayes Model Averaging and Bayes Optimal Classifier Bayes Optimal Classifier Ensemble (using Bayes’ rule) of all hypotheses in hypothesis space On average, it is the ideal ensemble
ENSEMBLE METHODS: Bayesian Model Combination Bayes Model Averaging Approximates Bayes optimal classifer Samples from hypothesis space Monte Carlo sampling Tends to promote overfitting Performs worse in practice than simpler techniques (eg bagging)
ENSEMBLE METHODS: Bayesian Model Combination Bayes Model Combination Correction to Bayes Model Averaging Uses model weightings to create samples Overcomes drawback of BMA giving weight to single model Better performance than BMA or bagging