Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

On-line learning and Boosting
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Boosting Rong Jin.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
Review of : Yoav Freund, and Robert E
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Boosting CMPUT 615 Boosting Idea We have a weak classifier, i.e., it’s error rate is a little bit better than 0.5. Boosting combines a lot of such weak.
Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Examples of Ensemble Methods
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
By Wang Rui State Key Lab of CAD&CG
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 11a Boosting and Naïve Bayes All lecture slides will be available as.ppt,.ps, &.htm at
Boosting and other Expert Fusion Strategies. References Chapter 9.5 Duda Hart & Stock Leo Breiman Boosting Bagging Arcing Presentation.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Learning with AdaBoost
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
Boosting ---one of combining models Xin Li Machine Learning Course.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
Boosting and Additive Trees (2)
The Boosting Approach to Machine Learning
Boosting and Additive Trees
Adaboost Team G Youngmin Jun
Data Mining Practical Machine Learning Tools and Techniques
The
Introduction to Data Mining, 2nd Edition
Introduction to Boosting
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Presentation transcript:

Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!

Ideas Boosting is considered to be one of the most significant developments in machine learning Finding many weak rules of thumb is easier than finding a single, highly prediction rule Key in combining the weak rules

Boosting(Algorithm) W(x) is the distribution of weights over the N training points ∑ W(x i )=1 Initially assign uniform weights W 0 (x) = 1/N for all x, step k=0 At each iteration k : Find best weak classifier C k (x) using weights W k (x) With error rate ε k and based on a loss function: weight α k the classifier C k ‘s weight in the final hypothesis For each x i, update weights based on ε k to get W k+1 (x i ) C FINAL (x) =sign [ ∑ α i C i (x) ]

Boosting (Algorithm)

Boosting As Additive Model The final prediction in boosting f(x) can be expressed as an additive expansion of individual classifiers The process is iterative and can be expressed as follows. Typically we would try to minimize a loss function on the training examples

Boosting As Additive Model Simple case: Squared-error loss Forward stage-wise modeling amounts to just fitting the residuals from previous iteration. Squared-error loss not robust for classification

Boosting As Additive Model AdaBoost for Classification: L(y, f (x)) = exp(-y ∙ f (x)) - the exponential loss function

Boosting As Additive Model First assume that β is constant, and minimize w.r.t. G:

Boosting As Additive Model err m : It is the training error on the weighted samples The last equation tells us that in each iteration we must find a classifier that minimizes the training error on the weighted samples.

Boosting As Additive Model Now that we have found G, we minimize w.r.t. β:

AdaBoost(Algorithm) W(x) is the distribution of weights over the N training points ∑ W(x i )=1 Initially assign uniform weights W 0 (x) = 1/N for all x. At each iteration k : Find best weak classifier C k (x) using weights W k (x) Compute ε k the error rate as ε k = [ ∑ W(x i ) ∙ I(y i ≠ C k (x i )) ] / [ ∑ W(x i )] weight α k the classifier C k ‘s weight in the final hypothesis Set α k = log ((1 – ε k )/ε k ) For each x i, W k+1 (x i ) = W k (x i ) ∙ exp[α k ∙ I(y i ≠ C k (x i ))] C FINAL (x) =sign [ ∑ α i C i (x) ]

AdaBoost(Example) Original Training set : Equal Weights to all training samples Taken from “A Tutorial on Boosting” by Yoav Freund and Rob Schapire

AdaBoost(Example) ROUND 1

AdaBoost(Example) ROUND 2

AdaBoost(Example) ROUND 3

AdaBoost(Example)

AdaBoost (Characteristics) Why exponential loss function? Computational Simple modular re-weighting Derivative easy so determing optimal parameters is relatively easy Statistical In a two label case it determines one half the log odds of P(Y=1|x) => We can use the sign as the classification rule Accuracy depends upon number of iterations ( How sensitive.. we will see soon).

Boosting performance Decision stumps are very simple rules of thumb that test condition on a single attribute. Decision stumps formed the individual classifiers whose predictions were combined to generate the final prediction. The misclassification rate of the Boosting algorithm was plotted against the number of iterations performed.

Boosting performance Steep decrease in error

Boosting performance Pondering over how many iterations would be sufficient…. Observations First few ( about 50) iterations increase the accuracy substantially.. Seen by the steep decrease in misclassification rate. As iterations increase training error decreases ? and generalization error decreases ?

Can Boosting do well if? Limited training data? Probably not.. Many missing values ? Noise in the data ? Individual classifiers not very accurate ? It cud if the individual classifiers have considerable mutual disagreement.

Application : Data mining Challenges in real world data mining problems Data has large number of observations and large number of variables on each observation. Inputs are a mixture of various different kinds of variables Missing values, outliers and variables with skewed distribution. Results to be obtained fast and they should be interpretable. So off-shelf techniques are difficult to come up with. Boosting Decision Trees ( AdaBoost or MART) come close to an off-shelf technique for Data Mining.

AT&T “May I help you?”