Ensemble Methods
“No free lunch theorem” Wolpert and Macready 1995
“No free lunch theorem” Wolpert and Macready 1995 Solution search also involves searching for learners
Different algorithms
Different parameters
Different algorithms Different parameters Different input representations/features
Different algorithms Different parameters Different input representations/features Different data
Base learner
Diversity over accuracy
Model combination
Voting Bagging Boosting Cascading
Data set = [1,2,3,4,5,6,7,8,9,10] Samples: Input to learner 1 = [10,2,5,10,3] Input to learner 2 = [4,5,2,7,6,3] Input to learner 3 = [8,8,4,9,1]
Create complementary learners
Train successive learners on the mistakes of predecessors
Weak learners combine to a strong learner
Adaboost – Adaptive Boosting
Allows for a smaller training set
Adaboost – Adaptive Boosting Allows for a smaller training set Simple classifiers
Adaboost – Adaptive Boosting Allows for a smaller training set Simple classifiers Binary
Modify probability of drawing examples from a training set based on errors
Step 3
Demo
Sequence classifiers by complexity
Use classifier j+1 if classifier j doesn’t meet a confidence threshold
Sequence classifiers by complexity Use classifier j+1 if classifier j doesn’t meet a confidence threshold Train cascading classifiers on instances the previous classifier is not confident about
Sequence classifiers by complexity Use classifier j+1 if classifier j doesn’t meet a confidence threshold Train cascading classifiers on instances the previous classifier is not confident about Most examples classified quickly, harder ones passed to more expensive classifiers
Boosting and Cascading
Object detection/tracking Collaborative filtering Neural networks Optical character recognition ++ Biometrics Data mining
Ensemble methods are proven effective, but why?