A Brief Introduction to Adaboost

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Boosting Rong Jin.
A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Review of : Yoav Freund, and Robert E
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Ensemble Learning: An Introduction
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Examples of Ensemble Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
For Better Accuracy Eick: Ensemble Learning
Kullback-Leibler Boosting Ce Liu, Hueng-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek Hoiem.
Machine Learning CS 165B Spring 2012
AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
A speech about Boosting Presenter: Roberto Valenti.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
Benk Erika Kelemen Zsolt
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Learning with AdaBoost
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
1 CHUKWUEMEKA DURUAMAKU.  Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Boosting ---one of combining models Xin Li Machine Learning Course.
Ensemble Methods for Machine Learning. COMBINING CLASSIFIERS: ENSEMBLE APPROACHES.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
Ch 14. Combining Models Pattern Recognition and Machine Learning, C. M
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
A “Holy Grail” of Machine Learing
Adaboost Team G Youngmin Jun
Data Mining Practical Machine Learning Tools and Techniques
A New Boosting Algorithm Using Input-Dependent Regularizer
Introduction to Data Mining, 2nd Edition
Introduction to Boosting
Support Vector Machine _ 2 (SVM)
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Recitation 10 Oznur Tastan
CS 391L: Machine Learning: Ensembles
Presentation transcript:

A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman.

Outline Background Adaboost Algorithm Theory/Interpretations

What’s So Good About Adaboost Can be used with many different classifiers Improves classification accuracy Commonly used in many areas Simple to implement Not prone to overfitting Can be used in conjunction with many other learning algorithms to improve their performance.

A Brief History Resampling for estimating statistic Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) Resampling for classifier design (Bootstrapping: Resampling for estimating statistic) (Resampling for classifier design)

Bootstrap Estimation Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance The training set D

Bagging - Aggregate Bootstrapping For i = 1 .. M Draw n*<n samples from D with replacement Learn classifier Ci Final classifier is a vote of C1 .. CM Increases classifier stability/reduces variance The previous section addressed the use of resampling in estimating statistics Now we turn to a number of general resampling methods for training classifiers. Bagging uses multiple versions of a training set, each created by drawing n* samples from D with replacement. Each of these bootstrap data sets is used to train a different “component classifier” and the final classification decision is based on the vote of each component classifier. D2 D1 D3 D

Boosting (Schapire 1989) Consider creating three component classifiers for a two-category problem through boosting. Randomly select n1 < n samples from D without replacement to obtain D1 Train weak learner C1 Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2 Train weak learner C2 Select all remaining samples from D that C1 and C2 disagree on Train weak learner C3 Final classifier is vote of weak learners The goal of boosting is to improve the accuracy of any given learning algorithm. Consider creating three component classifiers for a two-category problem through boosting. First, ……, call this set D1. Based on D1, train the first classifier, C1. Now, we seek a second training set D2, half of the patterns in D2 should be correctly classified by C1, half incorrectly classified by C1. Next, we seek a third data set D3, D D3 D1 D2 - + + -

Adaboost - Adaptive Boosting Instead of resampling, uses training set re-weighting Each training sample uses a weight to determine the probability of being selected for a training set. AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier Final classification based on weighted vote of weak classifiers

Adaboost Terminology ht(x) … “weak” or basis classifier (Classifier = Learner = Hypothesis) … “strong” or final classifier Weak Classifier: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak classifier outputs

Discrete Adaboost Algorithm Each training sample has a weight, which determines the probability of being selected for training the component classifier D(i) is weight for each training sample, which determines the probability of being selected for a training set for an individual component classifier. (Epislon is the sum of the weights of those misclassified samples.) Specifically, we initialize the weights across the training set to be uniform. On each iteration t, we find a classifier h(x) that minimizes the error with respect to the distribution. Next we increase weights of training examples misclassified by h(x), and decrease weights of the examples correctly classified by h(x) The new distribution is used to train the next classifier, and the process is iterated.

Find the Weak Classifier The error is determined with respect to the distribution D

Find the Weak Classifier

The algorithm core

Reweighting y * h(x) = 1 y * h(x) = -1

Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.

Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.

Algorithm recapitulation

Algorithm recapitulation

Algorithm recapitulation

Algorithm recapitulation

Algorithm recapitulation

Algorithm recapitulation

Algorithm recapitulation

Algorithm recapitulation

Pros and cons of AdaBoost Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers Sensitive to noisy data and outliers Be wary if using noisy/unlabeled data

References Duda, Hart, ect – Pattern Classification Freund – “An adaptive version of the boost by majority algorithm” Freund – “Experiments with a new boosting algorithm” Freund, Schapire – “A decision-theoretic generalization of on-line learning and an application to boosting” Friedman, Hastie, etc – “Additive Logistic Regression: A Statistical View of Boosting” Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer” Li, Zhang, etc – “Floatboost Learning for Classification” Opitz, Maclin – “Popular Ensemble Methods: An Empirical Study” Ratsch, Warmuth – “Efficient Margin Maximization with Boosting” Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods” Schapire, Singer – “Improved Boosting Algorithms Using Confidence-Weighted Predictions” Schapire – “The Boosting Approach to Machine Learning: An overview” Zhang, Li, etc – “Multi-view Face Detection with Floatboost”

Appendix Bound on training error Adaboost Variants

Bound on Training Error (Schapire)

Discrete Adaboost (DiscreteAB) (Friedman’s wording)

Discrete Adaboost (DiscreteAB) (Freund and Schapire’s wording)

Adaboost with Confidence Weighted Predictions (RealAB) The bigger absolute value of the output of classifier, the more confidence of the classifier.

Adaboost Variants Proposed By Friedman LogitBoost Solves Requires care to avoid numerical problems GentleBoost Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of Bounded [0 1]

Adaboost Variants Proposed By Friedman LogitBoost

Adaboost Variants Proposed By Friedman GentleBoost

Thanks!!! Any comments or questions?