Learning to Rank – Theory and 夏粉 _ 百度 自动化所 1.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Local Discriminative Distance Metrics and Their Real World Applications Local Discriminative Distance Metrics and Their Real World Applications Yang Mu,
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
SVM—Support Vector Machines
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Learning to Rank from heuristics to theoretic approaches Guest Lecture by Hongning Wang
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang
TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.
Learning to Rank for Information Retrieval
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Benk Erika Kelemen Zsolt
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Using Entropy-Related Measures in Categorical Data Visualization  Jamal Alsakran The University of Jordan  Xiaoke Huang, Ye Zhao Kent State University.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric,
Learning to Rank From Pairwise Approach to Listwise Approach.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Learning to Rank from heuristics to theoretic approaches Hongning Wang.
Is Top-k Sufficient for Ranking? Yanyan Lan, Shuzi Niu, Jiafeng Guo, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
1 Learning to Rank --A Brief Review Yunpeng Xu. 2 Ranking and sorting Rank: only has K structured categories Sorting: each sample has a distinct rank.
NTU & MSRA Ming-Feng Tsai
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Cross-modal Hashing Through Ranking Subspace Learning
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Quadratic Perceptron Learning with Applications
Big data classification using neural network
Deep Feedforward Networks
Zeyu You, Raviv Raich, Yonghong Huang (presenter)
Machine Learning & Deep Learning
Intro to Machine Learning
Boosting and Additive Trees (2)
Restricted Boltzmann Machines for Classification
Source: Procedia Computer Science(2015)70:
Introductory Seminar on Research: Fall 2017
Learning to Rank from heuristics to theoretic approaches
Learning to Rank Shubhra kanti karmaker (Santu)
Artificial Intelligence Chapter 3 Neural Networks
Overview of Machine Learning
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Feature Selection for Ranking
Artificial Intelligence Chapter 3 Neural Networks
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Learning to Rank with Ties
Learning to Rank from heuristics to theoretic approaches
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Learning and Memorization
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Learning to Rank – Theory and 夏粉 _ 百度 自动化所 1

We are Overwhelmed by Flood of Information 2

Information Explosion ?

4

Ranking Plays Key Role in Many Applications 5

Numerous Applications Ranking Problem Information Retrieval Collaborative Filtering Ordinal Regression Example Applications 6

Overview of my Work before 2010 Machine Learning Theory and Principle Ranking Problems Information Retrieval Collaborati ve Filtering Ordinal Regression Theory Algorithm NIPS’09 PR’09 ICML’08 JCST’09 KAIS’08 IJICS’07 IJCNN’07 IEEE-IIB’06 7

Outline Listwise Approach to Learning to Rank – Theory and Algorithm – Related Work – Our Work – Future Work 8

Ranking Problem Example = Document Retrieval Ranking Systems Documents query ranked list of documents 9

Learning to Rank for Information Retrieval 10 Ranking System Ranking System Labels: 1) binary, 2) multiple-level, discrete, 3) pairwise preference, 4) Partial order or even total order of documents queries documents Training Data Test data Model Learning System Learning System min loss

State-of-the-art Approaches Pointwise: (Ordinal) regression / classification – Pranking, MCRank, etc. Pairwise: Preference learning – Ranking SVM, RankBoost, RankNet, etc. Listwise: Taking the entire set of documents associated with a query as the learning instance. – Direct optimization of IR measure AdaRank, SVM-MAP, SoftRank, LambdaRank, etc. – Listwise loss minimization RankCosine, ListNet, etc. 11

Motivations The listwise approach captures the ranking problem in a conceptually more natural way and performs better than other approaches on many benchmark datasets. However, the listwise approach lacks of theoretical analysis. – Existing work focuses more on algorithm and experiments, than theoretical analysis. – While many existing theoretical results on regression and classification can be applied to the pointwise and pairwise approaches, the theoretical study on the listwise approach is not sufficient. 12

Our Work Take listwise loss minimization as an example, to perform theoretical analysis on the listwise approach. – Give a formal definition of listwise approach. – Conduct theoretical analysis on listwise ranking algorithms in terms of their loss functions. – Propose a novel listwise ranking method with good loss function. – Validate the correctness of the theoretical findings through experiments. 13

Listwise Ranking Input space: X – Elements in X are sets of objects to be ranked Output space: Y – Elements in Y are permutations of objects Joint probability distribution: P XY Hypothesis space: H – Expected loss Empirical loss 14

True Loss in Listwise Ranking To analysis the theoretical properties of listwise loss functions, the “true” loss of ranking is to be defined. – The true loss describes the difference between a given ranked list (permutation) and the ground truth ranked list (permutation). Ideally, the “true” loss should be cost-sensitive, but for simplicity, we start with the investigation of the “0-1” loss. 15

Surrogate Loss in Listwise Ranking Widely-used ranking function – Corresponding empirical loss – Challenges – Due to the sorting function and the 0-1 loss, the empirical loss is non-differentiable. – To tackle the problem, a surrogate loss is used. 16

Surrogate Listwise Loss Minimization RankCosine, ListNet can all be well fitted into the framework of surrogate loss minimization. – Cosine Loss (RankCosine, IPM 2007) – Cross Entropy Loss (ListNet, ICML 2007) A new loss function – Likelihood Loss(ListMLE, our method) 17

Analysis on Surrogate Loss Continuity, differentiability and convexity Computational efficiency Statistical consistency Soundness These properties have been well studied in classification, but not sufficiently in ranking. 18

Continuity, Differentiability, Convexity, Efficiency LossContinuityDifferentiabilityConvexityEfficiency Cosine Loss (RankCosine) √√ XO(n) Cross-entropy loss (ListNet) √√√ O(n·n!) Likelihood loss (ListMLE) √√√ O(n) 19

Statistical Consistency When minimizing the surrogate loss is equivalent to minimizing the expected 0-1 loss, we say the surrogate loss function is consistent. A theory for verifying consistency in ranking. The ranking of an object is inherently determined by its own. Starting with a ground-truth permutation, the loss will increase after exchanging the positions of two objects in it, and the speed of increase in loss is sensitive to the positions of objects. 20

Statistical Consistency (2) It has been proven – Cosine Loss is statistically consistent. – Cross entropy loss is statistically consistent. – Likelihood loss is statistically consistent. 21

Soundness Cosine loss is not very sound – Suppose we have two documents D2 ⊳ D1. g1 g2g1=g2 α Correct rankingIncorrect Ranking 22

Soundness (2) Cross entropy loss is not very sound – Suppose we have two documents D2 ⊳ D1. g2g1=g2 g1 Correct rankingIncorrect Ranking 23

Soundness (3) Likelihood loss is sound – Suppose we have two documents D2 ⊳ D1. g2g1=g2 g1 Correct rankingIncorrect Ranking 24

Discussions All three losses can be minimized using common optimization technologies. (continuity and differentiability) When the number of traning samples is very large, the model learning can be effective. (consistency) The cross entropy loss and the cosine loss are both sensitive to the mapping function. (soundness) The cost of minimizing the cross entropy loss is high. (complexity) The cosine loss is sensitive to the initial setting of its minimization. (convexity) The likelihood loss is the best among the three losses. 25

Experimental Verification Synthetic data – Different mapping function(log, sqrt, linear, quadratic, and exp) – Different initial setting of the gradient descent algorithm (report the mean and var of 50 runs) Real data – OHSUMED dataset in the LETOR benchmark 26

Experimental Results on Synthetic Data 27

Experimental Results on OHSUMED 28

Conclusion and Future Work Study has been made on the listwise approach to learning to rank. Likelihood loss seems to be the best listwise loss functions under investigation, according to both theoretical and empirical studies. Future work In addition to consistency, rate of convergence and generalization ability should also be studies. In real ranking problems, the true loss should be cost- sensitive (e.g. NDCG in Information Retrieval). 29

References Fen Xia, Tie-Yan Liu and Hang Li. ― Statistical Consistency of Top-k Ranking. Proceeding of the 23rd Neural Information Processing Systems, (NIPS 2009). Huiqian Li, Fen Xia, Fei-Yue Wang, Daniel Dajun Zeng and Wenjie Mao. ―Exploring Social Annotations with The Application to Web Page Recommendation. Journal of Computer Science and Technology (JCST) (accepted). Fen Xia, Yanwu Yang, Liang Zhou, Fuxin Li, Min Cai and Daniel Zeng. ―A Closed-Form Reduction of Multi- class Cost-Sensitive Learning to Weighted Multi-class Learning. Pattern Recognition (PR), Vol.42, No.7, 2009: Fen Xia, Tieyan Liu, Jue Wang, Wensheng Zhang and Hang Li. ―Listwise Approach to Learning to Rank - Theory and Algorithm. In proceedings of the 25th International Conference on Machine Learning (ICML 2008). Helsinki, Finland, July 5-9, Fen Xia, Wensheng Zhang, Fuxin Li and Yanwu Yang. ―Ranking with Decision Tree. Knowledge and Information Systems(KAIS). Vol.17, No.3, 2008:381–395. Fen Xia, Liang Zhou, Yanwu Yang and Wensheng Zhang. ―Ordinal Regression as Multiclass Classification. The Internal Journal of Intelligent Control System (IJICS). Vol.12, No.3, Sep 2007: Fen Xia, Qing Tao, Jue Wang and Wensheng Zhang. ―Recursive Feature Extraction for Ordinal Regression. In Proceeding of International Joint Conference on Neural Networks (IJCNN 2007). Orlando, Florida, USA, August 12-17, Fen Xia, Wensheng Zhang, Wang Jue. ―An Effective Tree-Based Algorithm for Ordinal Regression. The IEEE Intelligent Informatics Bulletin (IEEE-IIB) Dec, Vol.7 No.1: 22 – 26. Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai and Hang Li. ― Learning to Rank: from Pairwise Approach to Listwise Approach. In proceedings of the 24th International Conference on Machine Learning (ICML 2007). Tao Qin, Xu-Dong Zhang, Ming-Feng Tsai, De-Sheng Wang, Tie-Yan Liu and Hang Li. ―Query-level loss Functions for Information Retrieival. Information Processing and Management. Vol. 44, 2008:

Thank You! 特别感谢:超级计算 夏粉 _ 百度 31