E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

E NSEMBLE L EARNING A machine learning paradigm where multiple learners are used to solve the problem Proble m …... Proble m Learner Previously: Ensemble: The generalization ability of the ensemble is usually significantly better than that of an individual learner Boosting is one of the most important families of ensemble methods

3 Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) A B RIEF H ISTORY Resampling for estimating statistic Resampling for classifier design

B OOTSTRAP E STIMATION Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance

B AGGING - A GGREGATE B OOTSTRAPPING For i = 1.. M Draw n * <n samples from D with replacement Learn classifier C i Final classifier is a vote of C 1.. C M Increases classifier stability/reduces variance

B AGGING f1f1 f2f2 fTfT ML f Random sample with replacement Random sample with replacement

B OOSTING Training Sample Weighted Sample fTfT f1f1 … f2f2 f ML

R EVISIT B AGGING

B OOSTING C LASSIFIER

B AGGING VS B OOSTING Bagging: the construction of complementary base-learners is left to chance and to the unstability of the learning methods. Boosting: actively seek to generate complementary base-learner--- training the next base-learner based on the mistakes of the previous learners.

B OOSTING (S CHAPIRE 1989) Randomly select n 1 < n samples from D without replacement to obtain D 1 Train weak learner C 1 Select n 2 < n samples from D with half of the samples misclassified by C 1 to obtain D 2 Train weak learner C 2 Select all samples from D that C 1 and C 2 disagree on Train weak learner C 3 Final classifier is vote of weak learners

A DA B OOST (S CHAPIRE 1995) Instead of sampling, re-weight Previous weak learner has only 50% accuracy over new distribution Can be used to learn weak classifiers Final classification based on weighted vote of weak classifiers

A DABOOST T ERMS Learner = Hypothesis = Classifier Weak Learner: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak learner outputs

AdaBoost Adaptive A learning algorithm Building a strong classifier a lot of weaker ones Boosting

A DA B OOST C ONCEPT...... weak classifiers slightly better than random strong classifier

W EAKER C LASSIFIERS...... weak classifiers slightly better than random strong classifier Each weak classifier learns by considering one simple feature T most beneficial features for classification should be selected How to – define features? – select beneficial features? – train weak classifiers? – manage (weight) training samples? – associate weight to each weak classifier?

T HE S TRONG C LASSIFIERS...... weak classifiers slightly better than random strong classifier How good the strong one will be?

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution:

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier:

B OOSTING ILLUSTRATION Weak Classifier 1

B OOSTING ILLUSTRATION Weights Increased

T HE A DA B OOST A LGORITHM typicallywhere the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set where

B OOSTING ILLUSTRATION Weights Increased

B OOSTING ILLUSTRATION Final classifier is a combination of weak classifiers

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach?

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach? They are goal dependent.

G OAL Minimize exponential loss Final classifier:

G OAL Minimize exponential loss Final classifier: Maximize the margin yH(x)

G OAL Final classifier: Minimize Definewith Then,

Final classifier: Minimize Definewith Then, Set 0

Final classifier: Minimize Definewith Then, 0

with Final classifier: Minimize Define Then, 0

with Final classifier: Minimize Define Then,

with Final classifier: Minimize Define Then, maximized when

with Final classifier: Minimize Define Then, At time t

with Final classifier: Minimize Define Then, At time t At time 1 At time t+1

41 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers

I NTUITION Train a set of weak hypotheses: h 1, …., h T. The combined hypothesis H is a weighted majority vote of the T weak hypotheses.  Each hypothesis h t has a weight α t. During the training, focus on the examples that are misclassified.  At round t, example x i has the weight D t (i).

B ASIC S ETTING Binary classification problem Training data: D t (i): the weight of x i at round t. D 1 (i)=1/m. A learner L that finds a weak hypothesis h t : X  Y given the training set and D t The error of a weak hypothesis h t :

T HE BASIC A DA B OOST ALGORITHM For t=1, …, T Train weak learner using training data and D t Get ht: X  {-1,1} with error Choose Update

T HE GENERAL A DA B OOST ALGORITHM

46 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

Similar presentations

Presentation on theme: "E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

Similar presentations

Presentation on theme: "E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte."— Presentation transcript:

Similar presentations

About project

Feedback