Announcements…..

Name: Announcements…..
Uploaded: 2017-12-19T06:20:33+00:00
Duration: PTM7S24
Channel: Ashlie Fisher
Description: Announcements…..

Announcements….

What’s left in this class?
4/17 (today): trees, matrix factorization, … I’m lecturing (also: last assignment, due in 2 weeks, is up) 4/22 (Monday): scalable tensors Guest lecture, Evangelos Papalexakis (Christos Faloutsos student) 4/24 (Wed), 4/29 (Mon), 4/31 (Wed): project reports in random order each project: 9 min + 2 min for questions submit slides by noon before your presentation we understand about “future/ongoing work” at this point it’s fine if not everyone in the group speaks but make sure your partner’s talk is good  5/3 (Fri): Project report due I am extending this to 9am Tuesday May 7.

Gradient Boosting and Decision Trees

(non-stochastic) Gradient Descent
Suppose you use m iterations of gradient descent to learn parameters θm: then first gradient step m-th gradient step

Functional Gradient Descent
how we want the function to change instead lets define a sum of functions

how we want the function to change ≅ηm we can find the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

instead lets define a sum of functions functional gradient: how we want the function to change Put this together: we want to find a function Δm and we know what value we’d like it to have on a bunch of examples… so….? learn the next gradient-step function Δm we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

instead lets define a sum of functions functional gradient: how we want the function to change learn the next gradient-step function Δm using a regression tree trained against the target value: yi - P(Y|xi; Ψm-1) …. plus a line search to find η I.e.: examples are (xi,yi) where yi=yi - P(Y|xi; Ψm-1) ~ ~ we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

instead lets define a sum of functions functional gradient: how we want the function to change learn the next gradient-step function Δm using a regression tree trained against the target value: yi - P(Y|xi; Ψm-1) I.e.: we’re fitting regression trees to residuals we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

Gradient Boosting Algorithm
Note: not the same as Shapire/Schapire & Freund’s boosting algorithm AdaBoost End result is a sum of many regression trees Advantages: all the advantages of regression trees (combinations of features, indifference to scale of numeric values, …) Flexibility about loss function Disadvantages: sequential nature of the boosting algorithm

more generally, this can be the loss of previous classifier

Gradient boosting with arbitrary loss

Gradient boosting with square loss

Gradient boosting with log loss
Line search  heuristically-sized step for each region of the learned tree

Gradient boosting with log loss

~ Yi = yi - P(Y|xi; Fm-1)

~ Yi = yi - P(Y|xi; Fm-1) computed with a line search (e.g.)

Pr(Z|x)=F(x) Pr(Y|z,w)=G(w)

z fixed Pr(Z|x)=F(x) Pr(Y|z,w)=G(w)

Bagging regression trees using a learning-to-rank loss function…..
SIGIR 2011 Bagging regression trees using a learning-to-rank loss function…..

Announcements…..

Similar presentations

Presentation on theme: "Announcements….."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Announcements…..

Similar presentations

Presentation on theme: "Announcements….."— Presentation transcript:

Similar presentations

About project

Feedback