Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Similar presentations


Presentation on theme: "ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for."— Presentation transcript:

1 ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Lecture Slides for

2

3 Rationale No Free Lunch Theorem: There is no algorithm that is always the most accurate Generate a group of base-learners which when combined has higher accuracy Different learners use different Algorithms Hyperparameters Representations /Modalities/Views Training sets Subproblems Diversity vs accuracy 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

4 Voting Linear combination Classification 4 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5 Bayesian perspective: If d j are iid Bias does not change, variance decreases by L If dependent, error increase with positive correlation 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

6 Fixed Combination Rules 6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

7 Error-Correcting Output Codes 7 K classes; L problems (Dietterich and Bakiri, 1995) Code matrix W codes classes in terms of learners One per class L=K Pairwise L=K(K-1)/2 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

8 8 Full code L=2 (K-1) -1 With reasonable L, find W such that the Hamming distance btw rows and columns are maximized. Voting scheme Subproblems may be more difficult than one-per-K Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

9 Bagging Use bootstrapping to generate L training sets and train one base-learner with each (Breiman, 1996) Use voting (Average or median with regression) Unstable algorithms profit from bagging 9Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

10 AdaBoost Generate a sequence of base-learners each focusing on previous one’s errors (Freund and Schapire, 1996) 10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

11 Voting where weights are input-dependent (gating) (Jacobs et al., 1991) Experts or gating can be nonlinear Mixture of Experts 11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

12 Stacking Combiner f () is another learner (Wolpert, 1992) 12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

13 Fine-Tuning an Ensemble Given an ensemble of dependent classifiers, do not use it as is, try to get independence 1. Subset selection: Forward (growing)/Backward (pruning) approaches to improve accuracy/diversity/independence 2. Train metaclassifiers: From the output of correlated classifiers, extract new combinations that are uncorrelated. Using PCA, we get “eigenlearners.” Similar to feature selection vs feature extraction 13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

14 Cascading Use d j only if preceding ones are not confident Cascade learners in order of complexity 14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

15 Combining Multiple Sources Early integration: Concat all features and train a single learner Late integration: With each feature set, train one learner, then either use a fixed rule or stacking to combine decisions Intermediate integration: With each feature set, calculate a kernel, then use a single SVM with multiple kernels Combining features vs decisions vs kernels 15Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)


Download ppt "ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for."

Similar presentations


Ads by Google