… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Slides:

Advertisements

Similar presentations

Ensemble Learning – Bagging, Boosting, and Stacking, and other topics

Advertisements

Combining Multiple Learners Ethem Chp. 15 Haykin Chp. 7, pp

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

Data Mining and Machine Learning

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Data Mining Classification: Alternative Techniques

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Data Mining Classification: Alternative Techniques

Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Ensemble Learning what is an ensemble? why use an ensemble?

2D1431 Machine Learning Boosting.

A Brief Introduction to Adaboost

Ensemble Learning: An Introduction

Adaboost and its application

Three kinds of learning

Examples of Ensemble Methods

Machine Learning: Ensemble Methods

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

For Better Accuracy Eick: Ensemble Learning

3 ème Journée Doctorale G&E, Bordeaux, Mars 2015 Wei FENG Geo-Resources and Environment Lab, Bordeaux INP (Bordeaux Institute of Technology), France Supervisor:

Ensembles of Classifiers Evgueni Smirnov

Machine Learning CS 165B Spring 2012

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

CS 391L: Machine Learning: Ensembles

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Benk Erika Kelemen Zsolt

CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Computational Intelligence: Methods and Applications Lecture 36 Meta-learning: committees, sampling and bootstrap. Włodzisław Duch Dept. of Informatics,

Ensemble Methods: Bagging and Boosting

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.

CLASSIFICATION: Ensemble Methods

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.

Ensemble Methods in Machine Learning

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.

Boosting ---one of combining models Xin Li Machine Learning Course.

1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Trees, bagging, boosting, and stacking

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

A “Holy Grail” of Machine Learing

INTRODUCTION TO Machine Learning

Combining Base Learners

Adaboost Team G Youngmin Jun

Introduction to Data Mining, 2nd Edition

Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)

Ensemble learning.

Model Combination.

Ensemble learning Reminder - Bagging of Trees Random Forest

INTRODUCTION TO Machine Learning 3rd Edition

Presentation transcript:

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo

 No Free Lunch theorem: There is no algorithm that is always the most accurate.  Generate a group of algorithms which when combined display higher accuracy. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2

… Different Algorithms Different Datasets …

Bagging

Bagging can easily be extended to regression. Bagging is most efficient when the base-learner is unstable. Bagging typically increases accuracy. Interpretability is lost.

“Breiman's work bridged the gap between statisticians and computer scientists, particularly in the field of machine learning. Perhaps his most important contributions were his work on classification and regression trees and ensembles of trees fit to bootstrap samples. Bootstrap aggregation was given the name bagging by Breiman. Another of Breiman's ensemble approaches is the random forest.” ( Extracted from Wikipedia ).

Boosting

Boosting tries to combine weak learners into a strong learner. Originally all examples have the same weight, but in following iterations examples wrongly classified increase their weight. Boosting can be applied to any learner.

Boosting Initialize all weights w i to 1/N (N: no. of examples) error = 0 Repeat (until error > 0.5 or max. iterations reached) Train classifier and get hypothesis h t (x) Compute error as the sum of weights for misclassified exs. error = Σ w i if w i is incorrectly classified. Set α t = log ( 1-error / error ) Updates weights w i = [ w i e - ( α t yi h t (xi) ] / Z Output f(x) = sign ( Σ t α t h t (x) )

Boosting Misclassified ExampleIncrease Weights

… Different Algorithms Different Datasets …

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Voting where weights are input-dependent (gating) (Jacobs et al., 1991) Experts or gating can be nonlinear

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Robert Jacobs University of Rochester Michael Jordan UC Berkeley

Stacking

Variations are among learners. The predictions of the base learners form a new meta-dataset. A testing example is first transformed into a new meta-example and then classified. Several variations have been proposed around stacking.

Cascade Generalization

Variations are among learners. Classifiers are used in sequence rather than in parallel as in stacking. The prediction of the first classifier is added to the example feature vector to form an extended dataset. The process can go on through many iterations.

Cascading

Like boosting, distribution changes across datasets. But unlike boosting we will vary the classifiers. Classification is based on prediction confidence. Cascading creates rules that account for most instances catching exceptions at the final step.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20  K classes; L problems (Dietterich and Bakiri, 1995)  Code matrix W codes classes in terms of learners  One per class L=K  Pairwise L=K(K-1)/2

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21  Full code L=2 (K-1) -1  With reasonable L, find W such that the Hamming distance between rows and columns is maximized.