1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram Siemens medical solutions USA AISTATS 2007

2 Learning Many learning tasks can be viewed as function estimation.

3 Learning from examples Learning algorithm Training Not all supervised learning procedures fit in the standard classification/regression framework. In this talk we are mainly concerned with ranking/ordering.

4 Ranking / Ordering For some applications ordering is more important Example 1: Information retrieval Sort in the order of relevance

5 Ranking / Ordering For some applications ordering is more important Example 2: Recommender systems Sort in the order of preference

6 Ranking / Ordering For some applications ordering is more important Example 3: Medical decision making Decide over different treatment options

7 Ranking formulation Algorithm Fast algorithm Results Plan of the talk

8 Preference relations Given a we can order/rank a set of instances. Goal - Learn a preference relation Training data – Set of pairwise preferences

9 Ranking function Why not use classifier/ordinal regressor as the ranking function? Goal - Learn a preference relation New Goal - Learn a ranking function Provides a numerical score Not unique

10 Why is ranking different? Learning algorithm Training Pairwise preference Relations Pairwise disagreements

11 Training data..more formally From these two we can get a set of pairwise preference realtions

12 Loss function.. Generalized Wilcoxon-Mann-Whitney (WMW) statistic Minimize fraction of pairwise disagreements Maximize fraction of pairwise agreements Total # of pairwise agreements Total # of pairwise preference relations

13 Consider a two class problem + + + + + + + - - - - - -

14 Function class..Linear ranking function Different algorithms use different function class RankNet – neural network RankSVM – RKHS RankBoost – boosted decision stumps

15 Ranking formulation –Training data – Pairwise preference relations –Ideal Loss function – WMW statistic –Function class – linear ranking functions Algorithm Fast algorithm Results Plan of the talk

16 The Likelihood Discrete optimization problem Log-likelihood Assumption : Every pair is drawn independently Sigmoid [Burges et.al.] Choose w to maximize

17 The MAP estimator

18 Another interpretation O-1 indicator function Log-sigmoid What we want to maximize What we actually maximize Log-sigmoid is a lower bound for the indicator function

19 Lower bounding the WMW Log-likelihood <= WMW

20 Gradient based learning Use nonlinear conjugate-gradient algorithm. Requires only gradient evaluations. No function evaluations. No second derivatives. Gradient is given by

21 RankNet Learning algorithm Training Pairwise preference relations Cross entropy neural net Backpropagation

22 RankSVM Learning algorithm Training Pairwise preference relations Pairwise disagreements RKHS SVM

23 RankBoost Learning algorithm Training Pairwise preference relations Pairwise disagreements Decision stumps Boosting

24 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm Results Plan of the talk

25 Key idea Use approximate gradient. Extremely fast in linear time. Converges to the same solution. Requires a few more iterations.

26 Core computational primitive Weighted summation of erfc functions

27 Notion of approximation

28 Example

29 1. Beauliu’s series expansion Retain only the first few terms contributing to the desired accuracy. Derive bounds for this to choose the number of terms

30 2. Error bounds

31 3. Use truncated series

32 3. Regrouping Does not depend on y. Can be computed in O(pN) Once A and B are precomputed Can be computed in O(pM) Reduced from O(MN) to O(p(M+N))

33 3. Other tricks Rapid saturation of the erfc function. Space subdivision Choosing the parameters to achieve the error bound See the technical report

34 Numerical experiments

35 Precision vs Speedup

36 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results Plan of the talk

37 Datasets 12 public benchmark datasets Five-fold cross-validation experiments CG tolerance 1e-3 Accuracy for the gradient computation 1e-6

38 Direct vs Fast -WMW statistic Dataset DirectFast 10.5360.534 20.917 30.623 4*0.979 WMW is similar for both the exact and the fast approximate version.

39 Dataset DirectFast 11736 secs.2 secs. 26731 secs.19 secs. 32557 secs.4 secs. 4*47 secs. Direct vs Fast – Time taken

40 Effect of gradient approximation

41 Comparison with other methods RankNet - Neural network RankSVM - SVM RankBoost - Boosting

42 Comparison with other methods WMW is almost similar for all the methods. Proposed method faster than all the other methods. Next best time is shown by RankBoost. Only proposed method can handle large datasets.

43 Sample result Dataset 8 N=950 d=10 S=5 Time taken (secs) WMW RankNCG direct3330.984 RankNCG fast30.984 RankNet linear12640.951 RankNet two layer24640.765 RankSVM linear340.984 RankSVM quadratic13320.996 RankBoost60.958

44 Sample result Dataset 11 N=4177 d=9 S=3 Time taken (secs) WMW RankNCG direct17360.536 RankNCG fast20.534 RankNet linear RankNet two layer RankSVM linear RankSVM quadratic RankBoost630.535

45 Application to collaborative filtering Predict movie ratings for a user based on the ratings provided by other users. MovieLens dataset (www.grouplens.org) 1 million ratings (1-5) 3592 movies 6040 users Feature vector for each movie – rating provided by d other users

46 Collaborative filtering results

47 Collaborative filtering results

48 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results –Similar accuracy as other methods –But much much faster Plan/Conclusion of the talk

49 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results –Similar accuracy as other methods –But much much faster Future work Other applications neural network Probit regression Code coming soon

50 Ranking formulation –Training data – Pairwise preference relations –Loss function – WMW statistic –Function class – linear ranking functions Algorithm –Maximize a lower bound on WMW –Use conjugate-gradient –Quadratic complexity Fast algorithm –Use fast approximate gradient –Fast summation of erfc functions Results –Similar accuracy as other methods –But much much faster Future work Other applications neural network Probit regression Nonlinear Kernelized Variation.

51 Thank You ! | Questions ?

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

Similar presentations

Presentation on theme: "1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.

Similar presentations

Presentation on theme: "1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram."— Presentation transcript:

Similar presentations

About project

Feedback