A New Boosting Algorithm Using Input-Dependent Regularizer

A New Boosting Algorithm Using Input-Dependent Regularizer
Rong Jin1, Yan Liu2, Luo Si2, Jamie Carbonell2, Alex G. Hauptmann2 1. Michigan State University, 2. Carnegie Mellon University

Outline Introduction of AdaBoost algorithm Problems with AdaBoost
New boosting algorithm: input-dependent regularizer Experiment Conclusion and future work

AdaBoost Algorithm (I)
Boost a weak classifier into a strong classifier by linearly combine an ensemble of weak classifiers AdaBoost Given:A weak classifier h(x) with a large classification error E(x,y)~P(x,y)(h(x)y) Output: HT(x)= 1h1(x) + 2h2(x) +…+ThT(x) with a low classification error E(x,y)~P(x,y)(H(x)y) Theoretically and empirically effective to improve the performance Intuitively: goal and basic idea Strictly speaking:

AdaBoosting Algorithm (II)
Sampling distribution Only focus on the examples that are misclassified or weakly classified by previous weak classifiers Combining Weak Classifiers Combination constants are computed in order to minimize the training error The detailed algorithm involves two steps: updating sampling distribution and combining base classifiers Initializes as evenly distributed, then update a way as only focus on the misclassified examples As linearly combination and the combination constants are computed in order to minimize the training error Choice of t:

Problems 1: Overfitting
AdaBoost seldom overfits Not only minimizes the training error but also tends to maximize the classification margin (Ondar & Muller, 1998; Friedman et al., 1998) AdaBoost does overfit when the data are noisy (Dietterich, 2000; Ratsch & Muller, 2000; Grove & Schuurmans, 1998) Sampling distribution Dt(x) can have overly emphasis on noisy patterns Due to the “hard margin” criteria (Ratsch et al., 2000) Since Adaboost is a greedy algorithm, there have been many studies on the issue of overfitting Early studies show that .. Recent research found that .. Sampling distribution might have overly emphasis on noisy patterns so that the it is not general enough for most data; Hard margin, namely the maximal margin of those noisy data patterns. The margin may decrease significantly and thus force the generalization error bound to increase

Problems 1: Overfitting
Introduce regularization Not only just minimize the training error Typical solutions Smooth the combination constant (Schapire & Singer, 1998) Epsilon boosting: equal to L1 regularization (Friedman & Tibshirani, 1998) Boosting with soft margin (Ratsch et. al, 2000) BrownBoost: a non monotonic cost function (Freund, 2001) In order to solve the overfitting problem, several strategies have been proposed. typical solutions include: change cost function go add regularizer (similar ides as ridge regression) or introduce soft margin (SVM)

Problem 2: Why Linear Combination?
Each weak classifier ht(x) is trained on a different sampling distribution Dt(x) only good for particular types of input patterns {ht(x)} is a diverse ensemble Linear combination is not able to take full strength of the diverse ensemble {ht(x)} Solution: combination constants should be input dependent Similar idea as Hierarchical mixture model by Mikeal Jordan.

Input Dependent Regularizer
Solve the two problems overfitting and constant combination Input dependent regularizer Main idea: different combination form The main idea is trying a different combination form

Role of Regularizer Router Prevent |HT(x)| from growing too fast
Theorem: if all t are bounded max, |HT(x)| a ln(bT+c) For the of linear combination in AdaBoost, |HT(x)|~O(T) Router Input dependent combination constant The prediction of ht(x) is used only when Ht-1(x) is small Consistent with the training procedure ht(x) is trained on the examples that Ht-1(x) is uncertain To make a more clear. Regularizer: the loss function will be polynomial instead of exponential

WeightBoost Algorithm (1)
Similar to AdaBoost: minimize the exponential cost function Training setup hi(x): x{1,-1}; a basis (weak) classifier HT(x): a linear combination of basic classifiers Goal: minimize training error

WeightBoost Algorithm (2)
Emphasize misclassified data patterns As Simple As AdaBoost ! Avoid overemphasis on noisy data patterns Choice of t:

Empirical studies Datasets: eight different UCI datasets with only binary classes Methods to compare with AdaBoost algorithm WeightDecay Boost algorithm: close to L2 regularization Epsilon Boosting: related to L1 regularization

Experiment 1: Effectiveness
Compare to AdaBoost The WeightBoost performs better than AdaBoost algorithm. In many cases, the WeightBoost performs substantially better than AdaBoost algorithm

Experiment 2: Beyond Regularization
Compare to other regularized boosting WeightDecay Boost and Epsilon Boost The WeightBoost performs slightly better than other regularized boosting algorithms In several cases, the WeightBoost performs better than the other two regularized boosting algorithms

Experiment 3: Resistance to Noise
Randomly select 10%, 20%, and 30% of training data and set the labels of training data to be random value Results for 10% Noise The WeightBoost is more resistant to training noise than AdaBoost algorithm In several cases, when AdaBoost overfits the training noises, WeightBoost is still able to perform well

Experiments with Text Categorization
Reuter corpus with 10 most popular categories: WeightBoost improves 7 out of 10 categories

Conclusion and Future Work
Introduce an input dependent regularizer into the combination form Prevent |H(x)| from increasing too fast  resistant to training noise ‘Route’ a testing data pattern to it’s appropriate classifier  improve the classification accuracy even further than standard regularization Future research issues How to determine the constant ? Other input dependent regularizer?

A New Boosting Algorithm Using Input-Dependent Regularizer

Similar presentations

Presentation on theme: "A New Boosting Algorithm Using Input-Dependent Regularizer"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A New Boosting Algorithm Using Input-Dependent Regularizer

Similar presentations

Presentation on theme: "A New Boosting Algorithm Using Input-Dependent Regularizer"— Presentation transcript:

Similar presentations

About project

Feedback