Presentation is loading. Please wait.

Presentation is loading. Please wait.

Announcements…..

Similar presentations


Presentation on theme: "Announcements….."— Presentation transcript:

1 Announcements….

2 What’s left in this class?
4/17 (today): trees, matrix factorization, … I’m lecturing (also: last assignment, due in 2 weeks, is up) 4/22 (Monday): scalable tensors Guest lecture, Evangelos Papalexakis (Christos Faloutsos student) 4/24 (Wed), 4/29 (Mon), 4/31 (Wed): project reports in random order each project: 9 min + 2 min for questions submit slides by noon before your presentation we understand about “future/ongoing work” at this point it’s fine if not everyone in the group speaks but make sure your partner’s talk is good  5/3 (Fri): Project report due I am extending this to 9am Tuesday May 7.

3 Gradient Boosting and Decision Trees

4 (non-stochastic) Gradient Descent
Suppose you use m iterations of gradient descent to learn parameters θm: then first gradient step m-th gradient step

5 Functional Gradient Descent
how we want the function to change instead lets define a sum of functions

6 Functional Gradient Descent
how we want the function to change ≅ηm we can find the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

7 Functional Gradient Descent
instead lets define a sum of functions functional gradient: how we want the function to change Put this together: we want to find a function Δm and we know what value we’d like it to have on a bunch of examples… so….? learn the next gradient-step function Δm we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

8 Functional Gradient Descent
instead lets define a sum of functions functional gradient: how we want the function to change learn the next gradient-step function Δm using a regression tree trained against the target value: yi - P(Y|xi; Ψm-1) …. plus a line search to find η I.e.: examples are (xi,yi) where yi=yi - P(Y|xi; Ψm-1) ~ ~ we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

9 Functional Gradient Descent
instead lets define a sum of functions functional gradient: how we want the function to change learn the next gradient-step function Δm using a regression tree trained against the target value: yi - P(Y|xi; Ψm-1) I.e.: we’re fitting regression trees to residuals we could also define the desired change at each example: log P(yi|xi; Ψm-1) yi - P(Y|xi; Ψm-1)

10 Gradient Boosting Algorithm
Note: not the same as Shapire/Schapire & Freund’s boosting algorithm AdaBoost End result is a sum of many regression trees Advantages: all the advantages of regression trees (combinations of features, indifference to scale of numeric values, …) Flexibility about loss function Disadvantages: sequential nature of the boosting algorithm

11 Functional Gradient Descent
more generally, this can be the loss of previous classifier

12 Gradient boosting with arbitrary loss

13 Gradient boosting with square loss

14 Gradient boosting with log loss
Line search  heuristically-sized step for each region of the learned tree

15 Gradient boosting with log loss

16

17

18

19

20

21

22

23 ~ Yi = yi - P(Y|xi; Fm-1)

24 ~ Yi = yi - P(Y|xi; Fm-1) computed with a line search (e.g.)

25

26

27

28

29

30

31 Pr(Z|x)=F(x) Pr(Y|z,w)=G(w)

32 z fixed Pr(Z|x)=F(x) Pr(Y|z,w)=G(w)

33

34

35 Bagging regression trees using a learning-to-rank loss function…..
SIGIR 2011 Bagging regression trees using a learning-to-rank loss function…..

36


Download ppt "Announcements….."

Similar presentations


Ads by Google