Download presentation
Presentation is loading. Please wait.
Published byChristopher Burns Modified over 9 years ago
1
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo
2
No Free Lunch theorem: There is no algorithm that is always the most accurate. Generate a group of algorithms which when combined display higher accuracy. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2
3
… Different Algorithms Different Datasets …
4
Bagging
5
Bagging can easily be extended to regression. Bagging is most efficient when the base-learner is unstable. Bagging typically increases accuracy. Interpretability is lost.
6
“Breiman's work bridged the gap between statisticians and computer scientists, particularly in the field of machine learning. Perhaps his most important contributions were his work on classification and regression trees and ensembles of trees fit to bootstrap samples. Bootstrap aggregation was given the name bagging by Breiman. Another of Breiman's ensemble approaches is the random forest.” ( Extracted from Wikipedia ).
7
Boosting
8
Boosting tries to combine weak learners into a strong learner. Originally all examples have the same weight, but in following iterations examples wrongly classified increase their weight. Boosting can be applied to any learner.
9
Boosting Initialize all weights w i to 1/N (N: no. of examples) error = 0 Repeat (until error > 0.5 or max. iterations reached) Train classifier and get hypothesis h t (x) Compute error as the sum of weights for misclassified exs. error = Σ w i if w i is incorrectly classified. Set α t = log ( 1-error / error ) Updates weights w i = [ w i e - ( α t yi h t (xi) ] / Z Output f(x) = sign ( Σ t α t h t (x) )
10
Boosting Misclassified ExampleIncrease Weights
11
… Different Algorithms Different Datasets …
12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Voting where weights are input-dependent (gating) (Jacobs et al., 1991) Experts or gating can be nonlinear
13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Robert Jacobs University of Rochester Michael Jordan UC Berkeley
14
Stacking
15
Variations are among learners. The predictions of the base learners form a new meta-dataset. A testing example is first transformed into a new meta-example and then classified. Several variations have been proposed around stacking.
16
Cascade Generalization
17
Variations are among learners. Classifiers are used in sequence rather than in parallel as in stacking. The prediction of the first classifier is added to the example feature vector to form an extended dataset. The process can go on through many iterations.
18
Cascading
19
Like boosting, distribution changes across datasets. But unlike boosting we will vary the classifiers. Classification is based on prediction confidence. Cascading creates rules that account for most instances catching exceptions at the final step.
20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 K classes; L problems (Dietterich and Bakiri, 1995) Code matrix W codes classes in terms of learners One per class L=K Pairwise L=K(K-1)/2
21
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Full code L=2 (K-1) -1 With reasonable L, find W such that the Hamming distance between rows and columns is maximized.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.