Ensemble Methods for Machine Learning: The Ensemble Strikes Back

Ensemble Methods for Machine Learning: The Ensemble Strikes Back

Outline Motivations and techniques Bias, variance: bagging
Combining learners vs choosing between them: bucket of models stacking & blending Pac-learning theory: boosting Relation of boosting to other learning methods—optimization, SVMs, …

Review Of Boosting

Sample with replacement
Increase weight of xi if ht is wrong, decrease weight if ht is right. Linear combination of base hypotheses - best weight αt depends on error of ht.

Boosting: A toy example
Thanks, Rob Schapire

Boosting improved decision trees…
KV S DSS FS T …

Analysis Of Boosting

Theorem 1: error rate

upper bound on “[error on i ]”
Theorem 1: error rate Proof: = sign(f(x)) where upper bound on “[error on i ]” QED!

imequality holds for -1 <= u <= +1
Theorem 1: So: pick h’s and α’s to minimize Z’s Simplified notation: drop the t’s, let ui=yiht(xi), remember that ui = +1 or -1 Claim: 1 1 = sign(f(x)) where ui = +1 ui = -1 equality for u = +1, -1 imequality holds for -1 <= u <= +1 So: let’s minimize f(α) = to pick a best α

Minimize f(α) = = sign(f(x)) where

and hence training error is bounded by
Theorem 1: So: pick h’s and α’s to minimize Z’s Theorem 2: when for then and hence training error is bounded by Comment if h(x)=+/- 1 then

Boosting as Optimization

Even boosting single features worked well…
KV S DSS FS T … Reuters newswire corpus

Some background facts Coordinate descent optimization to minimize f(w)
For t=1,…,T or till convergence: For i=1,…,N where w=<w1,…,wN> Pick w* to minimize f(<w1,…,wi-1,w*,wi+1,…,wN> Set wi = w* V KV S DSS FS T …

Boosting as optimization using coordinate descent
With a small number of possible h’s, you can think of boosting as finding a linear combination of these: So boosting is sort of like stacking: Boosting uses coordinate descent to minimize an upper bound on error rate:

Boosting and optimization
V KV S DSS FS T … Jerome Friedman, Trevor Hastie and Robert Tibshirani. Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 2000. Compared using AdaBoost to set feature weights vs direct optimization of feature weights to minimize log-likelihood, squared error, … FHT

Boosting as Margin Learning

Boosting didn’t seem to overfit…(!)
KV S DSS FS T … test error train error

…because it turned out to be increasing the margin of the classifier
V KV S DSS FS T … 1000 rounds 100 rounds

Boosting movie

Some background facts Coordinate descent optimization to minimize f(w)
For t=1,…,T or till convergence: For i=1,…,N where w=<w1,…,wN> Pick w* to minimize f(<w1,…,wi-1,w*,wi+1,…,wN> Set wi = w* V KV S DSS FS T …

Boosting is closely related to margin classifiers like SVM, voted perceptron, … (!)
KV S DSS FS T … Boosting: The “coordinates” are being extended by one in each round of boosting --- usually, unless you happen to generate the same tree twice

Boosting is closely related to margin classifiers like SVM, voted perceptron, … (!)
KV S DSS FS T … Boosting: Linear SVMs:

Wrapup On Boosting

Boosting in the real world
V KV S DSS FS T … William’s wrap up: Boosting is not discussed much in the ML research community any more It’s much too well understood It’s really useful in practice as a meta-learning method Eg, boosted Naïve Bayes usually beats Naïve Bayes Boosted decision trees are almost always competitive with respect to accuracy very robust against rescaling numeric features, extra features, non-linearities, … somewhat slower to learn and use than many linear classifiers But getting probabilities out of them is a little less reliable. now

Ensemble Methods for Machine Learning: The Ensemble Strikes Back

Similar presentations

Presentation on theme: "Ensemble Methods for Machine Learning: The Ensemble Strikes Back"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ensemble Methods for Machine Learning: The Ensemble Strikes Back

Similar presentations

Presentation on theme: "Ensemble Methods for Machine Learning: The Ensemble Strikes Back"— Presentation transcript:

Similar presentations

About project

Feedback