Download presentation
1
A Brief Introduction to Adaboost
Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman.
2
Outline Background Adaboost Algorithm Theory/Interpretations
3
What’s So Good About Adaboost
Can be used with many different classifiers Improves classification accuracy Commonly used in many areas Simple to implement Not prone to overfitting Can be used in conjunction with many other learning algorithms to improve their performance.
4
A Brief History Resampling for estimating statistic Bootstrapping
Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) Resampling for classifier design (Bootstrapping: Resampling for estimating statistic) (Resampling for classifier design)
5
Bootstrap Estimation Repeatedly draw n samples from D
For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance The training set D
6
Bagging - Aggregate Bootstrapping
For i = 1 .. M Draw n*<n samples from D with replacement Learn classifier Ci Final classifier is a vote of C1 .. CM Increases classifier stability/reduces variance The previous section addressed the use of resampling in estimating statistics Now we turn to a number of general resampling methods for training classifiers. Bagging uses multiple versions of a training set, each created by drawing n* samples from D with replacement. Each of these bootstrap data sets is used to train a different “component classifier” and the final classification decision is based on the vote of each component classifier. D2 D1 D3 D
7
Boosting (Schapire 1989) Consider creating three component classifiers for a two-category problem through boosting. Randomly select n1 < n samples from D without replacement to obtain D1 Train weak learner C1 Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2 Train weak learner C2 Select all remaining samples from D that C1 and C2 disagree on Train weak learner C3 Final classifier is vote of weak learners The goal of boosting is to improve the accuracy of any given learning algorithm. Consider creating three component classifiers for a two-category problem through boosting. First, ……, call this set D1. Based on D1, train the first classifier, C1. Now, we seek a second training set D2, half of the patterns in D2 should be correctly classified by C1, half incorrectly classified by C1. Next, we seek a third data set D3, D D3 D1 D2 - + + -
8
Adaboost - Adaptive Boosting
Instead of resampling, uses training set re-weighting Each training sample uses a weight to determine the probability of being selected for a training set. AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier Final classification based on weighted vote of weak classifiers
9
Adaboost Terminology ht(x) … “weak” or basis classifier (Classifier = Learner = Hypothesis) … “strong” or final classifier Weak Classifier: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak classifier outputs
10
Discrete Adaboost Algorithm
Each training sample has a weight, which determines the probability of being selected for training the component classifier D(i) is weight for each training sample, which determines the probability of being selected for a training set for an individual component classifier. (Epislon is the sum of the weights of those misclassified samples.) Specifically, we initialize the weights across the training set to be uniform. On each iteration t, we find a classifier h(x) that minimizes the error with respect to the distribution. Next we increase weights of training examples misclassified by h(x), and decrease weights of the examples correctly classified by h(x) The new distribution is used to train the next classifier, and the process is iterated.
11
Find the Weak Classifier
The error is determined with respect to the distribution D
12
Find the Weak Classifier
13
The algorithm core
14
Reweighting y * h(x) = 1 y * h(x) = -1
15
Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.
16
Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.
17
Algorithm recapitulation
18
Algorithm recapitulation
19
Algorithm recapitulation
20
Algorithm recapitulation
21
Algorithm recapitulation
22
Algorithm recapitulation
23
Algorithm recapitulation
24
Algorithm recapitulation
25
Pros and cons of AdaBoost
Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers Sensitive to noisy data and outliers Be wary if using noisy/unlabeled data
26
References Duda, Hart, ect – Pattern Classification
Freund – “An adaptive version of the boost by majority algorithm” Freund – “Experiments with a new boosting algorithm” Freund, Schapire – “A decision-theoretic generalization of on-line learning and an application to boosting” Friedman, Hastie, etc – “Additive Logistic Regression: A Statistical View of Boosting” Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer” Li, Zhang, etc – “Floatboost Learning for Classification” Opitz, Maclin – “Popular Ensemble Methods: An Empirical Study” Ratsch, Warmuth – “Efficient Margin Maximization with Boosting” Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods” Schapire, Singer – “Improved Boosting Algorithms Using Confidence-Weighted Predictions” Schapire – “The Boosting Approach to Machine Learning: An overview” Zhang, Li, etc – “Multi-view Face Detection with Floatboost”
27
Appendix Bound on training error Adaboost Variants
28
Bound on Training Error (Schapire)
29
Discrete Adaboost (DiscreteAB) (Friedman’s wording)
30
Discrete Adaboost (DiscreteAB) (Freund and Schapire’s wording)
31
Adaboost with Confidence Weighted Predictions (RealAB)
The bigger absolute value of the output of classifier, the more confidence of the classifier.
32
Adaboost Variants Proposed By Friedman
LogitBoost Solves Requires care to avoid numerical problems GentleBoost Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of Bounded [0 1]
33
Adaboost Variants Proposed By Friedman
LogitBoost
34
Adaboost Variants Proposed By Friedman
GentleBoost
35
Thanks!!! Any comments or questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.