Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Brief Introduction to Adaboost

Similar presentations


Presentation on theme: "A Brief Introduction to Adaboost"— Presentation transcript:

1 A Brief Introduction to Adaboost
Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman.

2 Outline Background Adaboost Algorithm Theory/Interpretations

3 What’s So Good About Adaboost
Can be used with many different classifiers Improves classification accuracy Commonly used in many areas Simple to implement Not prone to overfitting Can be used in conjunction with many other learning algorithms to improve their performance.

4 A Brief History Resampling for estimating statistic Bootstrapping
Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) Resampling for classifier design (Bootstrapping: Resampling for estimating statistic) (Resampling for classifier design)

5 Bootstrap Estimation Repeatedly draw n samples from D
For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance The training set D

6 Bagging - Aggregate Bootstrapping
For i = 1 .. M Draw n*<n samples from D with replacement Learn classifier Ci Final classifier is a vote of C1 .. CM Increases classifier stability/reduces variance The previous section addressed the use of resampling in estimating statistics Now we turn to a number of general resampling methods for training classifiers. Bagging uses multiple versions of a training set, each created by drawing n* samples from D with replacement. Each of these bootstrap data sets is used to train a different “component classifier” and the final classification decision is based on the vote of each component classifier. D2 D1 D3 D

7 Boosting (Schapire 1989) Consider creating three component classifiers for a two-category problem through boosting. Randomly select n1 < n samples from D without replacement to obtain D1 Train weak learner C1 Select n2 < n samples from D with half of the samples misclassified by C1 to obtain D2 Train weak learner C2 Select all remaining samples from D that C1 and C2 disagree on Train weak learner C3 Final classifier is vote of weak learners The goal of boosting is to improve the accuracy of any given learning algorithm. Consider creating three component classifiers for a two-category problem through boosting. First, ……, call this set D1. Based on D1, train the first classifier, C1. Now, we seek a second training set D2, half of the patterns in D2 should be correctly classified by C1, half incorrectly classified by C1. Next, we seek a third data set D3, D D3 D1 D2 - + + -

8 Adaboost - Adaptive Boosting
Instead of resampling, uses training set re-weighting Each training sample uses a weight to determine the probability of being selected for a training set. AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier Final classification based on weighted vote of weak classifiers

9 Adaboost Terminology ht(x) … “weak” or basis classifier (Classifier = Learner = Hypothesis) … “strong” or final classifier Weak Classifier: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak classifier outputs

10 Discrete Adaboost Algorithm
Each training sample has a weight, which determines the probability of being selected for training the component classifier D(i) is weight for each training sample, which determines the probability of being selected for a training set for an individual component classifier. (Epislon is the sum of the weights of those misclassified samples.) Specifically, we initialize the weights across the training set to be uniform. On each iteration t, we find a classifier h(x) that minimizes the error with respect to the distribution. Next we increase weights of training examples misclassified by h(x), and decrease weights of the examples correctly classified by h(x) The new distribution is used to train the next classifier, and the process is iterated.

11 Find the Weak Classifier
The error is determined with respect to the distribution D

12 Find the Weak Classifier

13 The algorithm core

14 Reweighting y * h(x) = 1 y * h(x) = -1

15 Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.

16 Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.

17 Algorithm recapitulation

18 Algorithm recapitulation

19 Algorithm recapitulation

20 Algorithm recapitulation

21 Algorithm recapitulation

22 Algorithm recapitulation

23 Algorithm recapitulation

24 Algorithm recapitulation

25 Pros and cons of AdaBoost
Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers Sensitive to noisy data and outliers Be wary if using noisy/unlabeled data

26 References Duda, Hart, ect – Pattern Classification
Freund – “An adaptive version of the boost by majority algorithm” Freund – “Experiments with a new boosting algorithm” Freund, Schapire – “A decision-theoretic generalization of on-line learning and an application to boosting” Friedman, Hastie, etc – “Additive Logistic Regression: A Statistical View of Boosting” Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer” Li, Zhang, etc – “Floatboost Learning for Classification” Opitz, Maclin – “Popular Ensemble Methods: An Empirical Study” Ratsch, Warmuth – “Efficient Margin Maximization with Boosting” Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods” Schapire, Singer – “Improved Boosting Algorithms Using Confidence-Weighted Predictions” Schapire – “The Boosting Approach to Machine Learning: An overview” Zhang, Li, etc – “Multi-view Face Detection with Floatboost”

27 Appendix Bound on training error Adaboost Variants

28 Bound on Training Error (Schapire)

29 Discrete Adaboost (DiscreteAB) (Friedman’s wording)

30 Discrete Adaboost (DiscreteAB) (Freund and Schapire’s wording)

31 Adaboost with Confidence Weighted Predictions (RealAB)
The bigger absolute value of the output of classifier, the more confidence of the classifier.

32 Adaboost Variants Proposed By Friedman
LogitBoost Solves Requires care to avoid numerical problems GentleBoost Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of Bounded [0 1]

33 Adaboost Variants Proposed By Friedman
LogitBoost

34 Adaboost Variants Proposed By Friedman
GentleBoost

35 Thanks!!! Any comments or questions?


Download ppt "A Brief Introduction to Adaboost"

Similar presentations


Ads by Google