Download presentation
Presentation is loading. Please wait.
Published byVera Lesmana Modified over 6 years ago
1
A New Boosting Algorithm Using Input-Dependent Regularizer
Rong Jin1, Yan Liu2, Luo Si2, Jamie Carbonell2, Alex G. Hauptmann2 1. Michigan State University, 2. Carnegie Mellon University
2
Outline Introduction of AdaBoost algorithm Problems with AdaBoost
New boosting algorithm: input-dependent regularizer Experiment Conclusion and future work
3
AdaBoost Algorithm (I)
Boost a weak classifier into a strong classifier by linearly combine an ensemble of weak classifiers AdaBoost Given:A weak classifier h(x) with a large classification error E(x,y)~P(x,y)(h(x)y) Output: HT(x)= 1h1(x) + 2h2(x) +…+ThT(x) with a low classification error E(x,y)~P(x,y)(H(x)y) Theoretically and empirically effective to improve the performance Intuitively: goal and basic idea Strictly speaking:
4
AdaBoosting Algorithm (II)
Sampling distribution Only focus on the examples that are misclassified or weakly classified by previous weak classifiers Combining Weak Classifiers Combination constants are computed in order to minimize the training error The detailed algorithm involves two steps: updating sampling distribution and combining base classifiers Initializes as evenly distributed, then update a way as only focus on the misclassified examples As linearly combination and the combination constants are computed in order to minimize the training error Choice of t:
5
Problems 1: Overfitting
AdaBoost seldom overfits Not only minimizes the training error but also tends to maximize the classification margin (Ondar & Muller, 1998; Friedman et al., 1998) AdaBoost does overfit when the data are noisy (Dietterich, 2000; Ratsch & Muller, 2000; Grove & Schuurmans, 1998) Sampling distribution Dt(x) can have overly emphasis on noisy patterns Due to the “hard margin” criteria (Ratsch et al., 2000) Since Adaboost is a greedy algorithm, there have been many studies on the issue of overfitting Early studies show that .. Recent research found that .. Sampling distribution might have overly emphasis on noisy patterns so that the it is not general enough for most data; Hard margin, namely the maximal margin of those noisy data patterns. The margin may decrease significantly and thus force the generalization error bound to increase
6
Problems 1: Overfitting
Introduce regularization Not only just minimize the training error Typical solutions Smooth the combination constant (Schapire & Singer, 1998) Epsilon boosting: equal to L1 regularization (Friedman & Tibshirani, 1998) Boosting with soft margin (Ratsch et. al, 2000) BrownBoost: a non monotonic cost function (Freund, 2001) In order to solve the overfitting problem, several strategies have been proposed. typical solutions include: change cost function go add regularizer (similar ides as ridge regression) or introduce soft margin (SVM)
7
Problem 2: Why Linear Combination?
Each weak classifier ht(x) is trained on a different sampling distribution Dt(x) only good for particular types of input patterns {ht(x)} is a diverse ensemble Linear combination is not able to take full strength of the diverse ensemble {ht(x)} Solution: combination constants should be input dependent Similar idea as Hierarchical mixture model by Mikeal Jordan.
8
Input Dependent Regularizer
Solve the two problems overfitting and constant combination Input dependent regularizer Main idea: different combination form The main idea is trying a different combination form
9
Role of Regularizer Router Prevent |HT(x)| from growing too fast
Theorem: if all t are bounded max, |HT(x)| a ln(bT+c) For the of linear combination in AdaBoost, |HT(x)|~O(T) Router Input dependent combination constant The prediction of ht(x) is used only when Ht-1(x) is small Consistent with the training procedure ht(x) is trained on the examples that Ht-1(x) is uncertain To make a more clear. Regularizer: the loss function will be polynomial instead of exponential
10
WeightBoost Algorithm (1)
Similar to AdaBoost: minimize the exponential cost function Training setup hi(x): x{1,-1}; a basis (weak) classifier HT(x): a linear combination of basic classifiers Goal: minimize training error
11
WeightBoost Algorithm (2)
Emphasize misclassified data patterns As Simple As AdaBoost ! Avoid overemphasis on noisy data patterns Choice of t:
12
Empirical studies Datasets: eight different UCI datasets with only binary classes Methods to compare with AdaBoost algorithm WeightDecay Boost algorithm: close to L2 regularization Epsilon Boosting: related to L1 regularization
13
Experiment 1: Effectiveness
Compare to AdaBoost The WeightBoost performs better than AdaBoost algorithm. In many cases, the WeightBoost performs substantially better than AdaBoost algorithm
14
Experiment 2: Beyond Regularization
Compare to other regularized boosting WeightDecay Boost and Epsilon Boost The WeightBoost performs slightly better than other regularized boosting algorithms In several cases, the WeightBoost performs better than the other two regularized boosting algorithms
15
Experiment 3: Resistance to Noise
Randomly select 10%, 20%, and 30% of training data and set the labels of training data to be random value Results for 10% Noise The WeightBoost is more resistant to training noise than AdaBoost algorithm In several cases, when AdaBoost overfits the training noises, WeightBoost is still able to perform well
16
Experiments with Text Categorization
Reuter corpus with 10 most popular categories: WeightBoost improves 7 out of 10 categories
17
Conclusion and Future Work
Introduce an input dependent regularizer into the combination form Prevent |H(x)| from increasing too fast resistant to training noise ‘Route’ a testing data pattern to it’s appropriate classifier improve the classification accuracy even further than standard regularization Future research issues How to determine the constant ? Other input dependent regularizer?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.