Download presentation
Presentation is loading. Please wait.
Published byAngel Welch Modified over 9 years ago
1
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte
2
E NSEMBLE L EARNING A machine learning paradigm where multiple learners are used to solve the problem Proble m …... Proble m Learner Previously: Ensemble: The generalization ability of the ensemble is usually significantly better than that of an individual learner Boosting is one of the most important families of ensemble methods
3
3 Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) A B RIEF H ISTORY Resampling for estimating statistic Resampling for classifier design
4
B OOTSTRAP E STIMATION Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance
5
B AGGING - A GGREGATE B OOTSTRAPPING For i = 1.. M Draw n * <n samples from D with replacement Learn classifier C i Final classifier is a vote of C 1.. C M Increases classifier stability/reduces variance
6
B AGGING f1f1 f2f2 fTfT ML f Random sample with replacement Random sample with replacement
7
B OOSTING Training Sample Weighted Sample fTfT f1f1 … f2f2 f ML
8
R EVISIT B AGGING
9
B OOSTING C LASSIFIER
10
B AGGING VS B OOSTING Bagging: the construction of complementary base-learners is left to chance and to the unstability of the learning methods. Boosting: actively seek to generate complementary base-learner--- training the next base-learner based on the mistakes of the previous learners.
11
B OOSTING (S CHAPIRE 1989) Randomly select n 1 < n samples from D without replacement to obtain D 1 Train weak learner C 1 Select n 2 < n samples from D with half of the samples misclassified by C 1 to obtain D 2 Train weak learner C 2 Select all samples from D that C 1 and C 2 disagree on Train weak learner C 3 Final classifier is vote of weak learners
12
A DA B OOST (S CHAPIRE 1995) Instead of sampling, re-weight Previous weak learner has only 50% accuracy over new distribution Can be used to learn weak classifiers Final classification based on weighted vote of weak classifiers
13
A DABOOST T ERMS Learner = Hypothesis = Classifier Weak Learner: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak learner outputs
14
AdaBoost Adaptive A learning algorithm Building a strong classifier a lot of weaker ones Boosting
15
A DA B OOST C ONCEPT...... weak classifiers slightly better than random strong classifier
16
W EAKER C LASSIFIERS...... weak classifiers slightly better than random strong classifier Each weak classifier learns by considering one simple feature T most beneficial features for classification should be selected How to – define features? – select beneficial features? – train weak classifiers? – manage (weight) training samples? – associate weight to each weak classifier?
17
T HE S TRONG C LASSIFIERS...... weak classifiers slightly better than random strong classifier How good the strong one will be?
18
T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution:
19
T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier:
20
B OOSTING ILLUSTRATION Weak Classifier 1
21
B OOSTING ILLUSTRATION Weights Increased
22
T HE A DA B OOST A LGORITHM typicallywhere the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set where
23
B OOSTING ILLUSTRATION Weak Classifier 2
24
B OOSTING ILLUSTRATION Weights Increased
25
B OOSTING ILLUSTRATION Weak Classifier 3
26
B OOSTING ILLUSTRATION Final classifier is a combination of weak classifiers
27
T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach?
28
T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach? They are goal dependent.
29
G OAL Minimize exponential loss Final classifier:
30
G OAL Minimize exponential loss Final classifier: Maximize the margin yH(x)
31
G OAL Final classifier: Minimize Definewith Then,
32
Final classifier: Minimize Definewith Then, Set 0
33
Final classifier: Minimize Definewith Then, 0
34
with Final classifier: Minimize Define Then, 0
35
with Final classifier: Minimize Define Then, 0
36
with Final classifier: Minimize Define Then,
37
with Final classifier: Minimize Define Then,
38
with Final classifier: Minimize Define Then, maximized when
39
with Final classifier: Minimize Define Then, At time t
40
with Final classifier: Minimize Define Then, At time t At time 1 At time t+1
41
41 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers
42
I NTUITION Train a set of weak hypotheses: h 1, …., h T. The combined hypothesis H is a weighted majority vote of the T weak hypotheses. Each hypothesis h t has a weight α t. During the training, focus on the examples that are misclassified. At round t, example x i has the weight D t (i).
43
B ASIC S ETTING Binary classification problem Training data: D t (i): the weight of x i at round t. D 1 (i)=1/m. A learner L that finds a weak hypothesis h t : X Y given the training set and D t The error of a weak hypothesis h t :
44
T HE BASIC A DA B OOST ALGORITHM For t=1, …, T Train weak learner using training data and D t Get ht: X {-1,1} with error Choose Update
45
T HE GENERAL A DA B OOST ALGORITHM
46
46 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.