Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007.

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

Data Mining and Machine Learning
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Longin Jan Latecki Temple University
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Sparse vs. Ensemble Approaches to Supervised Learning
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
CS 4700: Foundations of Artificial Intelligence
For Better Accuracy Eick: Ensemble Learning
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Modified from the slides by Dr. Raymond J. Mooney
CS 391L: Machine Learning: Ensembles
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Benk Erika Kelemen Zsolt
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Introduction to Machine Learning
Trees, bagging, boosting, and stacking
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
Combining Base Learners
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Ensembles.
Ensemble learning.
Data Mining Ensembles Last modified 1/9/19.
Ch13. Ensemble method (draft)
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University

Outline Introduction – Bias and variance problems The Netflix Prize – Success of ensemble methods in the Netflix Prize Why Ensemble Methods Work Algorithms – AdaBoost – BrownBoost – Random forests

1-Slide Intro to Supervised Learning We want to approximate a function, Find a function h among a fixed subclass of functions for which the error E(h) is minimal, Given examples, Independent of h The distance from of f Variance of the predictions

Bias Problem – The hypothesis space made available by a particular classification method does not include sufficient hypotheses Variance Problem – The hypothesis space made available is too large for the training data, and the selected hypothesis may not be accurate on unseen data Bias and Variance

from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods Decision Trees Small trees have high bias. Large trees have high variance. Why?

Definition Ensemble Classification Aggregation of predictions of multiple classifiers with the goal of improving accuracy.

Teaser: How good are ensemble methods? Let’s look at the Netflix Prize Competition…

Supervised learning task – Training data is a set of users and ratings (1,2,3,4,5 stars) those users have given to movies. – Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars $1 million prize for a 10% improvement over Netflix’s current movie recommender/classifier (MSE = ) Began October 2006

Just three weeks after it began, at least 40 teams had bested the Netflix classifier. Top teams showed about 5% improvement.

from However, improvement slowed…

Today, the top team has posted a 8.5% improvement. Ensemble methods are the best performers…

“Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75” Rookies

“My approach is to combine the results of many methods (also two- way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” Arek Paterek

“When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” U of Toronto

Gravity home.mit.bme.hu/~gtakacs/download/gravity.pdf

“Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name… When Gravity and Dinosaurs Unite

And, yes, the top team which is from AT&T… “Our final solution (RMSE=0.8712) consists of blending 107 individual results. “ BellKor / KorBell

Some Intuitions on Why Ensemble Methods Work…

Intuitions Utility of combining diverse, independent opinions in human decision-making – Protective Mechanism (e.g. stock portfolio diversity) Violation of Ockham’s Razor – Identifying the best model requires identifying the proper "model complexity" See Domingos, P. Occam’s two razors: the sharp and the blunt. KDD

Majority vote Suppose we have 5 completely independent classifiers… – If accuracy is 70% for each 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) 83.7% majority vote accuracy – 101 such classifiers 99.9% majority vote accuracy Intuitions

Strategies Boosting – Make examples currently misclassified more important (or less, in some cases) Bagging – Use different samples or attributes of the examples to generate diverse classifiers

Boosting Types AdaBoost BrownBoost Make examples currently misclassified more important (or less, if lots of noise). Then combine the hypotheses given…

AdaBoost Algorithm 1. Initialize Weights 2. Construct a classifier. Compute the error. 3. Update the weights, and repeat step 2. … 4. Finally, sum hypotheses…

Classifications (colors) and Weights (size) after 1 iteration Of AdaBoost 3 iterations 20 iterations from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods

AdaBoost Advantages – Very little code – Reduces variance Disadvantages – Sensitive to noise and outliers. Why?

BrownBoost Reduce the weight given to misclassified example Good (only) for very noisy data.

Bagging (Constructing for Diversity) 1.Use random samples of the examples to construct the classifiers 2.Use random attribute sets to construct the classifiers Random Decision Forests Leo Breiman

Random forests At every level, choose a random subset of the attributes (not examples) and choose the best split among those attributes Doesn’t overfit

Random forests Let the number of training cases be M, and the number of variables in the classifier be N. For each tree, 1.Choose a training set by choosing N times with replacement from all N available training cases. 2.For each node, randomly choose n variables on which to base the decision at that node. Calculate the best split based on these.

Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1), 5-32

Questions / Comments?

Sources David Mease. Statistical Aspects of Data Mining. Lecture &q=stats+202+engEDU&total=13&start=0&num=10&so=0&type=search&plindex=8http://video.google.com/videoplay?docid= &q=stats+202+engEDU&total=13&start=0&num=10&so=0&type=search&plindex=8 Dietterich, T. G. Ensemble Learning. In The Handbook of Brain Theory and Neural Networks, Second edition, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, Elder, John and Seni Giovanni. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. KDD videolectures.net/kdd07_elder_ftfr/ Netflix Prize. Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press