Combining Base Learners

Slides:



Advertisements
Similar presentations
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics
Advertisements

Combining Multiple Learners Ethem Chp. 15 Haykin Chp. 7, pp
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Data Mining Classification: Alternative Techniques
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ensemble Learning what is an ensemble? why use an ensemble?
2D1431 Machine Learning Boosting.
Ensemble Learning: An Introduction
Adaboost and its application
Examples of Ensemble Methods
Machine Learning: Ensemble Methods
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
For Better Accuracy Eick: Ensemble Learning
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Benk Erika Kelemen Zsolt
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Computational Intelligence: Methods and Applications Lecture 36 Meta-learning: committees, sampling and bootstrap. Włodzisław Duch Dept. of Informatics,
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
CLASSIFICATION: Ensemble Methods
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
Ensemble Methods in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
HW 2.
Data Mining Practical Machine Learning Tools and Techniques
Reading: R. Schapire, A brief introduction to boosting
Ensembles (Bagging, Boosting, and all that)
Trees, bagging, boosting, and stacking
Machine Learning: Ensembles
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Neuro-Computing Lecture 5 Committee Machine
ECE 5424: Introduction to Machine Learning
Introduction Feature Extraction Discussions Conclusions Results
A “Holy Grail” of Machine Learing
INTRODUCTION TO Machine Learning
Adaboost Team G Youngmin Jun
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Introduction to Boosting
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Ensemble learning.
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
INTRODUCTION TO Machine Learning 3rd Edition
CS 391L: Machine Learning: Ensembles
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Ensembles (Bagging, Boosting, and all that)
Presentation transcript:

Combining Base Learners Meta-Learning Algo … Algo 1 Algo 2 Algo 3 Algo N

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Rationale No Free Lunch theorem: There is no algorithm that is always the most accurate. Generate a group of algorithms which when combined display higher accuracy.

Data and Algo Variation … Different Datasets … Different Algorithms

Bagging Bagging can easily be extended to regression. Bagging is most efficient when the base-learner is unstable. Bagging typically increases accuracy. Interpretability is lost.

Bagging

“Breiman's work bridged the gap between statisticians and computer scientists, particularly in the field of machine learning. Perhaps his most important contributions were his work on classification and regression trees and ensembles of trees fit to bootstrap samples. Bootstrap aggregation was given the name bagging by Breiman. Another of Breiman's ensemble approaches is the random forest.” (Extracted from Wikipedia).

Boosting Boosting tries to combine weak learners into a strong learner. Originally all examples have the same weight, but in following iterations examples wrongly classified increase their weight. Boosting can be applied to any learner.

Boosting

Boosting Initialize all weights wi to 1/N (N: no. of examples) error = 0 Repeat (until error > 0.5 or max. iterations reached) Train classifier and get hypothesis ht(x) Compute error as the sum of weights for misclassified exs. error = Σ wi if wi is incorrectly classified. Set αt = log ( 1-error / error ) Updates weights wi = [ wi e - (αt yi ht(xi) ] / Z Output f(x) = sign ( Σt αt ht(x) )

Boosting Misclassified Example Increase Weights

Data and Algo Variation … Different Datasets … Different Algorithms

Mixture of Experts Voting where weights are input-dependent (gating) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Mixture of Experts Voting where weights are input-dependent (gating) (Jacobs et al., 1991) Experts or gating can be nonlinear

Mixture of Experts Robert Jacobs Michael Jordan Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Mixture of Experts Robert Jacobs University of Rochester Michael Jordan UC Berkeley

Stacking Variations are among learners. The predictions of the base learners form a new meta-dataset. A testing example is first transformed into a new meta-example and then classified. Several variations have been proposed around stacking.

Stacking

Cascade Generalization Variations are among learners. Classifiers are used in sequence rather than in parallel as in stacking. The prediction of the first classifier is added to the example feature vector to form an extended dataset. The process can go on through many iterations.

Cascade Generalization

Cascading Like boosting, distribution changes across datasets. But unlike boosting we will vary the classifiers. Classification is based on prediction confidence. Cascading creates rules that account for most instances catching exceptions at the final step.

Cascading

Error-Correcting Output Codes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Error-Correcting Output Codes K classes; L problems (Dietterich and Bakiri, 1995) Code matrix W codes classes in terms of learners One per class L=K Pairwise L=K(K-1)/2

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Full code L=2(K-1)-1 With reasonable L, find W such that the Hamming distance between rows and columns is maximized.