Download presentation
Presentation is loading. Please wait.
1
A “Holy Grail” of Machine Learing
Outputs Just a Data Set or just an explanation of the problem Automated Learner Hypothesis Input Features
2
Ensembles Multiple diverse models (Inductive Biases) are trained on the same problem and then their outputs are combined The specific overfit of each learning model is averaged out If models are diverse (uncorrelated errors) then even if the individual models are weak generalizers, the ensemble can be very accurate Many different Ensemble approaches Stacking, Gating/Mixture of Experts, Bagging, Boosting, Wagging, Mimicking, Combinations Combining Technique M1 M2 M3 Mn
3
Bias vs. Variance Multiple trained models can average out the variance
Leaving just the Bias Weak learners?
4
Bagging Bootstrap aggregating (Bagging)
Each TS chosen uniformly at random with replacement from the original data set All hypotheses have an equal vote Bagging mostly focused on getting rid of variance Consistent strong empirical improvement Does not overfit (whereas boosting may), but may be more conservative overall on accuracy improvements Often used with the same learning algorithms and thus best for those which tend to give more diverse hypotheses based on initial random conditions
5
Boosting Many variations
Boosting more aggressive on accuracy but in some cases could overfit and do worse – can theoretically converge to training set – similar to a constructive neural network (DMP) Boosting by resampling - (Each TS chosen randomly with distribution Di with replacement from the original data set. Typically the same size as the original data set.) Some learning algorithms can handle the weighted samples directly Potential to overfit, but empirically quite good
6
Ensemble Creation Approaches
Goal is to get less correlated errors Injecting randomness – initial weights, different learning parameters, etc. Different Training sets – Bagging, Boosting, different features, etc. Forcing differences – different objective functions, auxiliary tasks Different machine learning models
7
Ensemble Combining Approaches
Stacking Unweighted Voting Weighted voting – learned (single layer), based on accuracy, training set, Gating function – The gating function uses the input features to decide which combination (weights) of expert voting to use
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.