A Support Vector Method for Optimizing Average Precision

A Support Vector Method for Optimizing Average Precision
SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski, Thorsten Joachims (Cornell University)

Motivation Learn to Rank Documents
Optimize for IR performance measures Mean Average Precision Leverage Structural SVMs [Tsochantaridis et al. 2005]

MAP vs Accuracy Average precision is the average of the precision scores at the rank locations of each relevant document. Ex: has average precision Mean Average Precision (MAP) is the mean of the Average Precision scores for a group of queries. A machine learning algorithm optimizing for Accuracy might learn a very different model than optimizing for MAP. Ex: has average precision of about 0.64, but has a max accuracy of 0.8 vs 0.6 in above ranking.

Recent Related Work Greedy & Local Search
[Metzler & Croft 2005] – optimized for MAP, used gradient descent, expensive for large number of features. [Caruana et al. 2004] – iteratively built an ensemble to greedily improve arbitrary performance measures. Surrogate Performance Measures [Burges et al. 2005] – used neural nets optimizing for cross entropy. [Cao et al. 2006] – used SVMs optimizing for modified ROC-Area. Relaxations [Xu & Li 2007] – used Boosting with exponential loss relaxation

Conventional SVMs Input examples denoted by x (high dimensional point)
Output targets denoted by y (either +1 or -1) SVMs learns a hyperplane w, predictions are sign(wTx) Training involves finding w which minimizes subject to The sum of slacks upper bounds the accuracy loss

Adapting to Average Precision
Let x denote the set of documents/query examples for a query Let y denote a (weak) ranking (each yij 2 {-1,0,+1}) Same objective function: Constraints are defined for each incorrect labeling y’ over the set of documents x. Joint discriminant score for the correct labeling at least as large as incorrect labeling plus the performance loss.

Adapting to Average Precision
Maximize subject to where and Sum of slacks upper bound MAP loss. After learning w, a prediction is made by sorting on wTxi

Too Many Constraints! For Average Precision, the true labeling is a ranking where the relevant documents are all ranked in the front, e.g., An incorrect labeling would be any other ranking, e.g., This ranking has Average Precision of about 0.8 with (y,y’) ¼ 0.2 Exponential number of rankings, thus an exponential number of constraints!

Structural SVM Training
STEP 1: Solve the SVM objective function using only the current working set of constraints. STEP 2: Using the model learned in STEP 1, find the most violated constraint from the exponential set of constraints. STEP 3: If the constraint returned in STEP 2 is more violated than the most violated constraint the working set by some small constant, add that constraint to the working set. Repeat STEP 1-3 until no additional constraints are added. Return the most recent model that was trained in STEP 1. STEP 1-3 is guaranteed to loop for at most a polynomial number of iterations. [Tsochantaridis et al. 2005]

Illustrative Example Original SVM Problem Structural SVM Approach
Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation.

Finding Most Violated Constraint
Structural SVM is an oracle framework. Requires subroutine to find the most violated constraint. Dependent on formulation of loss function and joint feature representation. Exponential number of constraints! Efficient algorithm in the case of optimizing MAP.

Observation MAP is invariant on the order of documents within a relevance class Swapping two relevant or non-relevant documents does not change MAP. Joint SVM score is optimized by sorting by document score, wTx Reduces to finding an interleaving between two sorted lists of documents

Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents ►

Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document ►

Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document Repeat for next non-relevant document ►

Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document Repeat for next non-relevant document Never want to swap past previous non-relevant document ►

Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document Repeat for next non-relevant document Never want to swap past previous non-relevant document Repeat until all non-relevant documents have been considered ►

Quick Recap SVM Formulation
SVMs optimize a tradeoff between model complexity and MAP loss Exponential number of constraints (one for each incorrect ranking) Structural SVMs finds a small subset of important constraints Requires sub-procedure to find most violated constraint Find Most Violated Constraint Loss function invariant to re-ordering of relevant documents SVM score imposes an ordering of the relevant documents Finding interleaving of two sorted lists Loss function has certain monotonic properties Efficient algorithm

Experiments Used TREC 9 & 10 Web Track corpus.
Features of document/query pairs computed from outputs of existing retrieval functions. (Indri Retrieval Functions & TREC Submissions) Goal is to learn a recombination of outputs which improves mean average precision.

Moving Forward Approach also works (in theory) for other measures.
Some promising results when optimizing for NDCG (with only 1 level of relevance). Currently working on optimizing for NDCG with multiple levels of relevance. Preliminary MRR results not as promising.

Conclusions Principled approach to optimizing average precision.
(avoids difficult to control heuristics) Performs at least as well as alternative SVM methods. Can be generalized to a large class of rank-based performance measures. Software available at

A Support Vector Method for Optimizing Average Precision

Similar presentations

Presentation on theme: "A Support Vector Method for Optimizing Average Precision"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Support Vector Method for Optimizing Average Precision

Similar presentations

Presentation on theme: "A Support Vector Method for Optimizing Average Precision"— Presentation transcript:

Similar presentations

About project

Feedback