COMP61011 : Machine Learning Ensemble Models

Slides:



Advertisements
Similar presentations
Data Mining and Machine Learning
Advertisements

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Boosting Approach to ML
An Overview of Machine Learning
Longin Jan Latecki Temple University
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Bayesian Learning Rong Jin.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Benk Erika Kelemen Zsolt
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
CLASSIFICATION: Ensemble Methods
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Learning with AdaBoost
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
COM24111: Machine Learning Decision Trees Gavin Brown
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Cost-Sensitive Boosting algorithms: Do we really need them?
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
Week 2 Presentation: Project 3
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Neuro-Computing Lecture 5 Committee Machine
ECE 5424: Introduction to Machine Learning
Asymmetric Gradient Boosting with Application to Spam Filtering
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Lecture 18: Bagging and Boosting
Pattern Recognition & Machine Learning
Ensemble learning.
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Physics-guided machine learning for milling stability:
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

COMP61011 : Machine Learning Ensemble Models

The Usual Supervised Learning Approach data + labels Learning Algorithm Model Predicted label Testing data

Ensemble Methods data + labels Learning Algorithm “Committee” of Models Model 1 Model 2 Predicted label Testing data Model m

Ensemble Methods Like a “committee”…. don’t want all models to be identical. don’t want them to be different for the sake of it – sacrificing individual performance This is the “Diversity” trade-off. Equivalent to underfitting/overfitting – need the right level, just enough. “Committee” of Models Model 1 Model 2 Model m

Parallel Ensemble Methods data + labels Dataset 1 Dataset 2 Dataset m Model 1 Model 2 Model m Ensemble Vote

Generating a new dataset by “Bootstrapping” - sample N items with replacement from the original N

“Bagging” : Bootstrap AGGregatING data + labels Bootstrap: Sample N examples, with replacement, from the original N. Dataset 1 Model 1

“Bagging” : Bootstrap AGGregatING data + labels Bootstrap 1 Bootstrap 2 Bootstrap m Model 1 Model 2 Model m Bagged Ensemble Vote

The “Bagging” algorithm

Model Stability Some models are almost completely unaffected by bootstrapping. These are stable models. Not so suitable for ensemble methods. Highly Stable Highly Unstable SVM KNN Naïve Bayes Logistic Regression / Perceptrons Neural Networks Decision Trees

Use bagging on a set of decision trees. How can we make them even more diverse?

Random Forests

Real-time human pose recognition in parts from single depth images Computer Vision and Pattern Recognition 2011 Shotton et al, Microsoft Research Basis of Kinect controller Features are simple image properties Test phase: 200 frames per sec on GPU Train phase more complex but still parallel

Sequential Ensemble Methods data + labels Dataset 2 Dataset 3 Dataset 4 Model 1 Model 2 Model 3 Model 4 Each model corrects the mistakes of its predecessor.

“Boosting”

Half the distribution (~effort) goes on the incorrect examples.

Half the distribution (~effort) goes on the incorrect examples.

Half the distribution (~effort) goes on the incorrect examples.

Half the distribution (~effort) goes on the incorrect examples.

“weak model” Requirement Each model should be slightly better than random guessing on its own dataset, i.e. it is a... “weak model”

Boosting by “Reweighting” (e.g. using Naive Bayes) yes Can your classifier naturally take advantage of a weighted distribution? no Boosting by “Resampling” (e.g. using linear discriminants)

The Boosting Family of Algorithms Very rich history in computational learning theory Existence of algorithm was predicted before invention! Most well known = “Adaboost”, ADAptive BOOSTing Naturally for two classes, multiclass extensions possible. MANY MANY extensions, e.g. regression, ranking etc

Boosting – an informal description Input: Training dataset T Probability distribution D over the examples in T For j = 1 to M Sample new dataset from T according to D Build new model h(j) using this sampled set Calculate voting strength for h(j) according to errors on the sample Update D according to where h(j) makes mistakes on original T End

Boosting

Ensemble Methods: Summary Treats models as a committee Responsible for major advances in industrial ML Random Forests = Kinect Boosting = Face Recognition in Phone Cameras Tend to work extremely well “off-the-shelf” – no parameter tuning.