Download presentation
Published byJemima Owens Modified over 8 years ago
1
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Lecture 6 Ensemble Learning (1) Boosting Adaboost Boosting is an additive model Brief intro to lasso The relationship
2
Boosting Combine multiple classifiers.
Construct a sequence of weak classifiers, and combine them into a strong classifier by a weighted majority vote. “weak”: better than random coin-tossing Some properties: Flexible. Able to select features. Good generalization. Could fit noise.
3
Boosting Adaboost: (Freund &Schapire 1995)
4
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
5
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
6
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
7
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
8
Boosting
9
Boosting This is the weight of the current weak classifier in the final model. This weight is for individual observations. Notice it is stacked from step 1. If an observation is correctly classified at this step, its weight doesn’t change. If incorrectly classified, its weight increases.
10
Boosting
11
Boosting
12
Boosting
13
Boosting 10 predictors The weak classifier is a Stump: a two-level tree.
14
Boosting Boosting can be seen as fitting an additive model, with the general form: Expansion coefficients Basis functions: Simple functions of feature x, with parameters γ Examples of γ: Sigmoidal function in neural networks; A split in a tree model;
15
Boosting In general, such functions are fit by minimizing a loss function This could be computationally intensive. An alternative is to go stepwise, fitting a sub-problem of a single basis function
16
Boosting Forward stagewise additive modeling --- add new basis functions without adjusting previously added ones. Example: * Squared loss function is not good for classification.
17
Boosting The version of Adaboost we discussed uses this loss function:
The basis functions are individual weak classifiers.
18
Boosting Margin: y*f(x) >0, correct <0, incorrect
The goal of classification – to produce positive margin as much as possible. Negative margin should be penalized more. Exponential penalize negative margin more heavily.
19
Boosting To be solved: Independent from β and G
20
Boosting Observations are either correctly or incorrectly classified. Then the target function to be minimized is: For any β> 0, Gm has to satisfy: G is the classifier that minimizes the weighted error rate.
21
Boosting Solving for the Gm will give us a weighted error rate.
Plug it back to get β: Update the overall classifier by plugging these in:
22
Boosting The weight for next iteration becomes: Using
Independent of i. Ignored.
23
Lasso The equivalent Lagrangian form: Ridge regression: Elastic Net:
24
Lasso
25
Lasso Orthogonal x are the least squares estimates
26
Lasso Lasso Ridge Error contour in parameter space.
27
Boosted linear regression
{Tk} : a collection of basis functions
28
Boosted linear regression
Here the T’s are X’s themselves in a linear regression setting.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.