Download presentation
1
Announcements….
2
What’s left in this class?
4/17 (today): trees, matrix factorization, … I’m lecturing (also: last assignment, due in 2 weeks, is up) 4/22 (Monday): scalable tensors Guest lecture, Evangelos Papalexakis (Christos Faloutsos student) 4/24 (Wed), 4/29 (Mon), 4/31 (Wed): project reports in random order each project: 9 min + 2 min for questions submit slides by noon before your presentation we understand about “future/ongoing work” at this point it’s fine if not everyone in the group speaks but make sure your partner’s talk is good 5/3 (Fri): Project report due I am extending this to 9am Tuesday May 7.
3
Gradient Boosting and Decision Trees
4
(non-stochastic) Gradient Descent
Suppose you use m iterations of gradient descent to learn parameters θm: then first gradient step m-th gradient step
5
Functional Gradient Descent
how we want the function to change instead lets define a sum of functions
6
Functional Gradient Descent
how we want the function to change ≅ηm we can find the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)
7
Functional Gradient Descent
instead lets define a sum of functions functional gradient: how we want the function to change Put this together: we want to find a function Δm and we know what value we’d like it to have on a bunch of examples… so….? learn the next gradient-step function Δm we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)
8
Functional Gradient Descent
instead lets define a sum of functions functional gradient: how we want the function to change learn the next gradient-step function Δm using a regression tree trained against the target value: yi - P(Y|xi; Ψm-1) …. plus a line search to find η I.e.: examples are (xi,yi) where yi=yi - P(Y|xi; Ψm-1) ~ ~ we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)
9
Functional Gradient Descent
instead lets define a sum of functions functional gradient: how we want the function to change learn the next gradient-step function Δm using a regression tree trained against the target value: yi - P(Y|xi; Ψm-1) I.e.: we’re fitting regression trees to residuals we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)
10
Gradient Boosting Algorithm
Note: not the same as Shapire/Schapire & Freund’s boosting algorithm AdaBoost End result is a sum of many regression trees Advantages: all the advantages of regression trees (combinations of features, indifference to scale of numeric values, …) Flexibility about loss function Disadvantages: sequential nature of the boosting algorithm
11
Functional Gradient Descent
more generally, this can be the loss of previous classifier
12
Gradient boosting with arbitrary loss
13
Gradient boosting with square loss
14
Gradient boosting with log loss
Line search heuristically-sized step for each region of the learned tree
15
Gradient boosting with log loss
23
~ Yi = yi - P(Y|xi; Fm-1)
24
~ Yi = yi - P(Y|xi; Fm-1) computed with a line search (e.g.)
31
Pr(Z|x)=F(x) Pr(Y|z,w)=G(w)
32
z fixed Pr(Z|x)=F(x) Pr(Y|z,w)=G(w)
35
Bagging regression trees using a learning-to-rank loss function…..
SIGIR 2011 Bagging regression trees using a learning-to-rank loss function…..
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.