Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.

Similar presentations


Presentation on theme: "Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example."— Presentation transcript:

1 Boosting Rong Jin

2 Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example has equal chance to be sampled No distinction between “easy” examples and “difficult” examples Inefficiency with model combination A constant weight for each classifier No distinction between accurate classifiers and inaccurate classifiers

3 Improve the Efficiency of Bagging  Better sampling strategy Focus on the examples that are difficult to classify correctly  Better combination strategy Accurate model should be assigned with more weights

4 Intuition: Education in China Training Examples X1Y1X1Y1 X2Y2X2Y2 X3Y3X3Y3 X4Y4X4Y4 Mistakes X1Y1X1Y1 X3Y3X3Y3 Classifier1 Classifier2 Mistakes X1Y1X1Y1 + Classifier3  No training mistakes !!  May overfitting to training data !! +

5 AdaBoost Algorithm

6 AdaBoost Example:  t =ln2 x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5 1/5 D0:D0: x 5, y 5 x 3, y 3 x 1, y 1 Sample h1h1 Training 2/71/72/7 1/7 D1:D1: x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5 Update Weights h1h1   Sample x 3, y 3 x 1, y 1 h2h2 Training x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5    3/5h 1 + 2/5h 2 Update Weights 2/91/94/9 1/9 D2:D2: Sample …

7 How To Choose  t in AdaBoost?  Problem with constant weight  t No distinguish between accurate classifiers and inaccurate classifiers  Consider how to construct the best distribution D t+1 (i) given D t (i) and h t 1. D t+1 (i) should be significantly differen from D t (i) 2. D t+1 (i) should create a situation that classifier h t performs poorly

8 Optimization View for Choosing  t  h t (x): x  {1,-1}; a basis (weak) classifier  H T (x): a linear combination of basic classifiers  Goal: minimize training error  Approximate the training error with a exponential function

9 AdaBoost: A Greedy Approach to Optimize the Exponential Function Exponential cost function Use the inductive form H T (x)=H T-1 (x)+  T h T (x) Minimize the exponential function  Data points that h T (x) predict correctly Data points that h T (x) predict incorrectly AdaBoost is a greedy approach  overfitting ? Empirical studies show that AdaBoost is robust in general AdaBoost tends to overfit with noisy data

10 Empirical Study of AdaBoost  AdaBoosting decision trees Generate 50 decision trees through the AdaBoost procedure Linearly combine decision trees using the weights computed by the AdaBoost Algorithm  In general: AdaBoost = Bagging > C4.5 AdaBoost usually needs less number of classifiers than Bagging

11 Bia-Variance Tradeoff for AdaBoost  AdaBoost can reduce both model variance and model bias single decision tree Bagging decision tree bias variance AdaBoosting decision trees


Download ppt "Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example."

Similar presentations


Ads by Google