1 Learning to Rank --A Brief Review Yunpeng Xu. 2 Ranking and sorting Rank: only has K structured categories Sorting: each sample has a distinct rank.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Web Information Retrieval
ECG Signal processing (2)
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Voting and social choice Vincent Conitzer
Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 9: Social Choice Lecturer: Moni Naor.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
Support Vector Machines
Machine learning continued Image source:
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Rank Aggregation Methods for the Web CS728 Lecture 11.
CPS Voting and social choice
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 April 20, 2005
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Reduced Support Vector Machine
The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Unconstrained Optimization Problem
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Online Learning Algorithms
Support Vector Machines
Social choice (voting) Vincent Conitzer > > > >
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
CPS Voting and social choice Vincent Conitzer
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Benk Erika Kelemen Zsolt
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
NTU & MSRA Ming-Feng Tsai
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Semi-Supervised Clustering
Algorithms for Large Data Sets
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
CSCI B609: “Foundations of Data Science”
Voting and social choice
Support Vector Machines
Support vector machines
Learning to Rank with Ties
CPS Voting and social choice
Presentation transcript:

1 Learning to Rank --A Brief Review Yunpeng Xu

2 Ranking and sorting Rank: only has K structured categories Sorting: each sample has a distinct rank Generally, no need to differentiate them

3 Overview Rank aggregation Label ranking Query and rank by example Preference learning Problems left, what we can do?

4 Ranking aggregation Needs of combining different ranking results  Voting systems, welfare economics, decision making 1. Hillary Clinton > John Edwards > Barack Obama 2. Barack Obama >John Edwards > Hillary Clinton => ?

5 Ranking aggregation (cont.) Arrow’s impossibility theorem  Kenneth Arrow, 1951 If the decision-making body has at least two members and at least three options to decide among, then it is impossible to design a social welfare function that satisfies all these conditions at once.

6 Ranking aggregation (cont.) Arrow’s impossibility theorem  5 fair assumptions non-dictatorship, unrestricted domain or universality, independence of irrelevant alternatives, positive association of social and individual values or monotonicity, non-imposition or citizen sovereignty  Cannot be satisfied simultaneously

7 Ranking aggregation (cont.) Borda’s method (1971)  Given lists, each has n items  For each Define as the number of items rank below j in  Rank all items by Hillary Clinton: 2, John Edwards: 2, Barack Obama: 2

8 Ranking aggregation (cont.) -- Border Condorcet Criteria  If the majority prefers x to y, then x must be ranked above y Border’s method does not satisfy CC, neither any method that assigns weights to each rank position

9 Ranking aggregation (cont.) Assumption relaxation Maximize consensus criteria  Equivalent to minimize disagreement (Kemeny, Social Choice Theorem)  NP Hard!  Sub-optimal solutions using heuristics

10 Ranking aggregation (cont.) Basic idea  Assign different weights to different experts  Supervised aggregation Weighting according to a final judger (ground truth)  Unsupervised aggregation Aims to minimize the disagreement measured by certain distances

11 Ranking aggregation (cont.) Distance measure  Spearman footrule distance  Kendal tau distance  Kendal tau distance for multiple lists  Scaled footrule distance

12 Ranking aggregation (cont.) -Distance Measure Kemeny optimal ranking  Minimizing Kendal distance  Still NP-Hard to compute  Local Kemenization (local optimal aggregation) Can be computed in O(knlogn)

13 Ranking aggregation (cont.) Supervised Ranking Aggregation (SRA WWW07)  Ground truth: preference matrix H Example  Goal: rank by the score  It can be seen that, or with relaxation

14 Ranking aggregation (cont.) -- SRA Method  Use Borda’s score  Objective

15 Ranking aggregation (cont.) Markov Chain Rank Aggregation (MCRA, WWW05)  Map a ranked list to a Markov Chain M  Compute the stationary distribution of M  Rank items based on  Example: B > C > D A > D > E A > B > E

16 Ranking aggregation (cont.) - MCRA Different transition strategies  MC1 all out-degree edges have uniform probabilities  MC2 choose a list, then choose next item on the list;  …  For disconnected graph, define transition probability based on measure item similarity

17 Ranking aggregation (cont.) Unsupervised Learning Algorithm for Rank Aggregation (ULARA: Dan Roth ECML07)  Goal:  Method: maximize agreement

18 Ranking aggregation (cont.) - UCLRA Method Algorithm: iterative gradient decent Initially, w is uniform, then updated iteratively

19 Overview Rank aggregation Label ranking Query and rank by example Preference learning Problems left, what we can do?

20 Label Ranking  Goal: Map from the input space to the set of total order over a finite set of labels  Related to multi-label or multi-class problems Input: Customer information Output: Porsche > Toyota > Ford Mountain > Sea> Beach

21 Label Ranking (cont.) Pairwise ranking (ECML03)  Train a classifier for each pair of labels  When judge on an example : If the classifier predicts, then count it as a vote on Then rank all labels according to their votes  Total classifiers

22 Label Ranking (cont.) Constraint Classification (NIPS 02)  Consider a linear sorting function  Goal: learn the values of rank all labels by the score

23 Label Ranking (cont.) -- CC Expand the feature vector Generate positive/ negative samples in

24 Label Ranking (cont.) -- CC Learn a separating hyper plane Can be solved by SVM

25 Overview Rank aggregation Label ranking Query and rank by example Preference learning Problems left, what we can do?

26 Query and rank by example  Given one query, rank retrieved items according to their relevancy w.r.t the query.

27 Query and rank by example (cont.) Rank on manifold  Convergence form  Essentially, this is an one-class semi-supervised method

28 Preference learning Given a set of items, and a set of user preference over these items, to rank all items according to the user preference.  Motivated by the needs of personalized search.

29 Preference learning Input: preference: a set of partial order on X Output: a total order on X or, map X onto a structured label space Y Preference function

30 Existing methods Learning to order things [W. Cohen 98] Large margin ordinal regression [R. Herbrich 98] PRanking with Ranking [K Crammer 01] Optimizing Search Engines using Clickthrough Data [T Joachims 02] Efficient boosting algorithm for combining preferences [Yoav Freund 03] Classification Approach towards Ranking and Sorting Problems [S Rajaram 03]

31 Existing methods Learning to Rank using Gradient Descent [C Burges 05] Stability and Generalization of Bipartite Ranking [S Agarwal 05] Generalization Bounds for k-Partite Ranking[S Rajaram 05] Ranking with a p-norm push [C Rudin 05] Magnitutde-Preserving Ranking Algorithms [C Cortes 07] From Pairwise Approach to Listwise [Z Cao 07]

32 Large Margin Ordinal Regression Mapping to an axis using inner product

33 Large Margin Ordinal Regression Consider Then Introduce soft margin Solve using SVM

34 Learn to order things A greedy ordering algorithm to order things Calculate a score for each item

35 Learn to order things (cont.) Combine different ranking functions To learn the weight iteratively

36 Learn to order things Combine preference functions Do ranking aggregation Update weights based on feedbacks

37 Initially, w is uniform At each step  Compute a combined ranking function  Produce a ranking aggregation  Measure the loss

38 RankBoost Bipartite ranking problems Combine weaker rankers Sort based on values of H(x)

39 RankBoost (cont.) Bipartite ranking problem Sampling distribution Initialization Sampling distribution updation normalization Learn weak ranker Combine weak rankers

40 Stability and Generalization Bipartite ranking problems Expected rank error Empirical rank error

41 Stability and Generalization (cont.) Stability  Remove one training sample, how much changes Generalization Generalize to k-partite ranking problem…

42 Rank on graph data Objective

43 P-norm push Focus on the topmost ranked items  The top left region is the most important

44 P-norm push (cont.) Height of k (k is a negative sample) Cost of sample k: g is convex, monotonically incresasing

45 p-norm push Run RankBoost to solve the problem

46 Thanks!