A Support Vector Method for Optimizing Average Precision

Slides:



Advertisements
Similar presentations
Diversified Retrieval as Structured Prediction
Advertisements

Information Retrieval as Structured Prediction University of Massachusetts Amherst Machine Learning Seminar April 29 th, 2009 Yisong Yue Cornell University.
ICML 2009 Yisong Yue Thorsten Joachims Cornell University
Structured Prediction and Active Learning for Information Retrieval
Introduction to Support Vector Machines (SVM)
Support Vector Machine
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
ECG Signal processing (2)
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Linear Classifiers (perceptrons)
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
An Introduction to Structural SVMs and its Application to Information Retrieval Yisong Yue Carnegie Mellon University.
Infinite Horizon Problems
Learning to Rank: New Techniques and Applications Martin Szummer Microsoft Research Cambridge, UK.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
An Introduction to Support Vector Machines Martin Law.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
SVM by Sequential Minimal Optimization (SMO)
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.
Bug Localization with Machine Learning Techniques Wujie Zheng
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V.
An Introduction to Support Vector Machines (M. Law)
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Learning to Rank from heuristics to theoretic approaches Hongning Wang.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 9633 Machine Learning Support Vector Machines
Ranking and Learning 290N UCSB, Tao Yang, 2014
Boosting and Additive Trees (2)
Learning to Rank from heuristics to theoretic approaches
Learning to Rank Shubhra kanti karmaker (Santu)
Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli
Structured Learning of Two-Level Dynamic Rankings
Support Vector Machines and Kernels
Feature Selection for Ranking
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Learning to Rank with Ties
Primal Sparse Max-Margin Markov Networks
SVMs for Document Ranking
Presentation transcript:

A Support Vector Method for Optimizing Average Precision SIGIR 2007 Yisong Yue Cornell University In Collaboration With: Thomas Finley, Filip Radlinski, Thorsten Joachims (Cornell University)

Motivation Learn to Rank Documents Optimize for IR performance measures Mean Average Precision Leverage Structural SVMs [Tsochantaridis et al. 2005]

MAP vs Accuracy Average precision is the average of the precision scores at the rank locations of each relevant document. Ex: has average precision Mean Average Precision (MAP) is the mean of the Average Precision scores for a group of queries. A machine learning algorithm optimizing for Accuracy might learn a very different model than optimizing for MAP. Ex: has average precision of about 0.64, but has a max accuracy of 0.8 vs 0.6 in above ranking.

Recent Related Work Greedy & Local Search [Metzler & Croft 2005] – optimized for MAP, used gradient descent, expensive for large number of features. [Caruana et al. 2004] – iteratively built an ensemble to greedily improve arbitrary performance measures. Surrogate Performance Measures [Burges et al. 2005] – used neural nets optimizing for cross entropy. [Cao et al. 2006] – used SVMs optimizing for modified ROC-Area. Relaxations [Xu & Li 2007] – used Boosting with exponential loss relaxation

Conventional SVMs Input examples denoted by x (high dimensional point) Output targets denoted by y (either +1 or -1) SVMs learns a hyperplane w, predictions are sign(wTx) Training involves finding w which minimizes subject to The sum of slacks upper bounds the accuracy loss

Conventional SVMs Input examples denoted by x (high dimensional point) Output targets denoted by y (either +1 or -1) SVMs learns a hyperplane w, predictions are sign(wTx) Training involves finding w which minimizes subject to The sum of slacks upper bounds the accuracy loss

Adapting to Average Precision Let x denote the set of documents/query examples for a query Let y denote a (weak) ranking (each yij 2 {-1,0,+1}) Same objective function: Constraints are defined for each incorrect labeling y’ over the set of documents x. Joint discriminant score for the correct labeling at least as large as incorrect labeling plus the performance loss.

Adapting to Average Precision Maximize subject to where and Sum of slacks upper bound MAP loss. After learning w, a prediction is made by sorting on wTxi

Adapting to Average Precision Maximize subject to where and Sum of slacks upper bound MAP loss. After learning w, a prediction is made by sorting on wTxi

Too Many Constraints! For Average Precision, the true labeling is a ranking where the relevant documents are all ranked in the front, e.g., An incorrect labeling would be any other ranking, e.g., This ranking has Average Precision of about 0.8 with (y,y’) ¼ 0.2 Exponential number of rankings, thus an exponential number of constraints!

Structural SVM Training STEP 1: Solve the SVM objective function using only the current working set of constraints. STEP 2: Using the model learned in STEP 1, find the most violated constraint from the exponential set of constraints. STEP 3: If the constraint returned in STEP 2 is more violated than the most violated constraint the working set by some small constant, add that constraint to the working set. Repeat STEP 1-3 until no additional constraints are added. Return the most recent model that was trained in STEP 1. STEP 1-3 is guaranteed to loop for at most a polynomial number of iterations. [Tsochantaridis et al. 2005]

Illustrative Example Original SVM Problem Structural SVM Approach Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation.

Illustrative Example Original SVM Problem Structural SVM Approach Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation.

Illustrative Example Original SVM Problem Structural SVM Approach Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation.

Illustrative Example Original SVM Problem Structural SVM Approach Exponential constraints Most are dominated by a small set of “important” constraints Structural SVM Approach Repeatedly finds the next most violated constraint… …until set of constraints is a good approximation.

Finding Most Violated Constraint Structural SVM is an oracle framework. Requires subroutine to find the most violated constraint. Dependent on formulation of loss function and joint feature representation. Exponential number of constraints! Efficient algorithm in the case of optimizing MAP.

Finding Most Violated Constraint Observation MAP is invariant on the order of documents within a relevance class Swapping two relevant or non-relevant documents does not change MAP. Joint SVM score is optimized by sorting by document score, wTx Reduces to finding an interleaving between two sorted lists of documents

Finding Most Violated Constraint Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents ►

Finding Most Violated Constraint Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document ►

Finding Most Violated Constraint Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document Repeat for next non-relevant document ►

Finding Most Violated Constraint Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document Repeat for next non-relevant document Never want to swap past previous non-relevant document ►

Finding Most Violated Constraint Start with perfect ranking Consider swapping adjacent relevant/non-relevant documents Find the best feasible ranking of the non-relevant document Repeat for next non-relevant document Never want to swap past previous non-relevant document Repeat until all non-relevant documents have been considered ►

Quick Recap SVM Formulation SVMs optimize a tradeoff between model complexity and MAP loss Exponential number of constraints (one for each incorrect ranking) Structural SVMs finds a small subset of important constraints Requires sub-procedure to find most violated constraint Find Most Violated Constraint Loss function invariant to re-ordering of relevant documents SVM score imposes an ordering of the relevant documents Finding interleaving of two sorted lists Loss function has certain monotonic properties Efficient algorithm

Experiments Used TREC 9 & 10 Web Track corpus. Features of document/query pairs computed from outputs of existing retrieval functions. (Indri Retrieval Functions & TREC Submissions) Goal is to learn a recombination of outputs which improves mean average precision.

Moving Forward Approach also works (in theory) for other measures. Some promising results when optimizing for NDCG (with only 1 level of relevance). Currently working on optimizing for NDCG with multiple levels of relevance. Preliminary MRR results not as promising.

Conclusions Principled approach to optimizing average precision. (avoids difficult to control heuristics) Performs at least as well as alternative SVM methods. Can be generalized to a large class of rank-based performance measures. Software available at http://svmrank.yisongyue.com