Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Diversified Retrieval as Structured Prediction
Information Retrieval as Structured Prediction University of Massachusetts Amherst Machine Learning Seminar April 29 th, 2009 Yisong Yue Cornell University.
ICML 2009 Yisong Yue Thorsten Joachims Cornell University
Structured Prediction and Active Learning for Information Retrieval
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
A Framework for Result Diversification
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Optimizing search engines using clickthrough data
Linear Classifiers (perceptrons)
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
An Introduction to Structural SVMs and its Application to Information Retrieval Yisong Yue Carnegie Mellon University.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Multiple Instance Learning
Support Vector Machines and Kernel Methods
Personalized Search Result Diversification via Structured Learning
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Scalable Text Mining with Sparse Generative Models
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Online Search Evaluation with Interleaving Filip Radlinski Microsoft.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Diversity in Ranking via Resistive Graph Centers Avinava Dubey IBM Research India Soumen Chakrabarti IIT Bombay Chiranjib Bhattacharyya IISc Bangalore.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Modeling term relevancies in information retrieval using Graph Laplacian Kernels Shuguang Wang Joint work with Saeed Amizadeh and Milos Hauskrecht.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Automatic Labeling of Multinomial Topic Models
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Learning to Rank Shubhra kanti karmaker (Santu)
Applying Key Phrase Extraction to aid Invalidity Search
Structured Learning of Two-Level Dynamic Rankings
Modeling Diversity in Information Retrieval
John Lafferty, Chengxiang Zhai School of Computer Science
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Learning to Rank with Ties
Primal Sparse Max-Margin Markov Networks
Presentation transcript:

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell University Joint work with Thorsten Joachims

Need for Diversity (in IR) Ambiguous Queries –Different information needs using same query –“Jaguar” –At least one relevant result for each information need Learning Queries –User interested in “a specific detail or entire breadth of knowledge available” [Swaminathan et al., 2008] –Results with high information diversity

Optimizing Diversity Interest in information retrieval –[Carbonell & Goldstein, 1998; Zhai et al., 2003; Zhang et al., 2005; Chen & Karger, 2006; Zhu et al., 2007; Swaminathan et al., 2008] Requires inter-document dependencies –Impossible with standard independence assumptions –E.g., probability ranking principle No consensus on how to measure diversity.

This Talk A method for representing and optimizing information coverage Discriminative training algorithm –Based on structural SVMs Appropriate forms of training data –Requires sufficient granularity (subtopic labels) Empirical evaluation

Choose top 3 documents Individual Relevance:D3 D4 D1 Pairwise Sim MMR:D3 D1 D2 Best Solution:D3 D1 D5

How to Represent Information? Discrete feature space to represent information –Decomposed into “nuggets” For query q and its candidate documents: –All the words (title words, anchor text, etc) –Cluster memberships (topic models / dim reduction) –Taxonomy memberships (ODP) We will focus on words and title words.

Weighted Word Coverage More distinct words = more information Weight word importance Will work automatically w/o human labels Goal: select K documents which collectively cover as many distinct (weighted) words as possible –Budgeted max coverage problem (Khuller et al., 1997) –Greedy selection yields (1-1/e) bound. –Need to find good weighting function (learning problem).

Example D1D2D3Best Iter D1 Iter 2 Marginal Benefit V1V2V3V4V5 D1XXX D2XXX D3XXXX WordBenefit V11 V22 V33 V44 V55 Document Word Counts

Example D1D2D3Best Iter D1 Iter 2--23D3 Marginal Benefit V1V2V3V4V5 D1XXX D2XXX D3XXXX WordBenefit V11 V22 V33 V44 V55 Document Word Counts

How to Weight Words? Not all words created equal –“the” Conditional on the query –“computer” is normally fairly informative… –…but not for the query “ACM” Learn weights based on the candidate set –(for a query)

Prior Work Essential Pages [Swaminathan et al., 2008] –Uses fixed function of word benefit –Depends on word frequency in candidate set – - Local version of TF-IDF – - Frequent words low weight – (not important for diversity) – - Rare words low weight – (not representative)

Linear Discriminant x = (x 1,x 2,…,x n ) - candidate documents v – an individual word We will use thousands of such features

Linear Discriminant x = (x 1,x 2,…,x n ) - candidate documents y – subset of x (the prediction) V(y) – union of words from documents in y. Discriminant Function: Benefit of covering word v is then w T  (v,x)

Linear Discriminant Does NOT reward redundancy –Benefit of each word only counted once Greedy has (1-1/e)-approximation bound Linear (joint feature space) –Suitable for SVM optimization

More Sophisticated Discriminant Documents “cover” words to different degrees –A document with 5 copies of “Thorsten” might cover it better than another document with only 2 copies.

More Sophisticated Discriminant Documents “cover” words to different degrees –A document with 5 copies of “Thorsten” might cover it better than another document with only 2 copies. Use multiple word sets, V 1 (y), V 2 (y), …, V L (y) Each V i (y) contains only words satisfying certain importance criteria. Requires more sophisticated joint feature map.

Conventional SVMs Input: x (high dimensional point) Target: y (either +1 or -1) Prediction: sign(w T x) Training: subject to: The sum of slacks upper bounds the accuracy loss

Structural SVM Formulation Input: x (candidate set of documents) Target: y (subset of x of size K) Same objective function: Constraints for each incorrect labeling y’. Score  of best y at least as large as incorrect y’ plus loss Requires new training algorithm [Tsochantaridis et al., 2005]

Weighted Subtopic Loss Example: –x 1 covers t 1 –x 2 covers t 1,t 2,t 3 –x 3 covers t 1,t 3 Motivation –Higher penalty for not covering popular subtopics –Mitigates effects of label noise in tail subtopics # DocsLoss t1t1 31/2 t2t2 11/6 t3t3 21/3

Diversity Training Data TREC 6-8 Interactive Track –Queries with explicitly labeled subtopics –E.g., “Use of robots in the world today” Nanorobots Space mission robots Underwater robots –Manual partitioning of the total information regarding a query

Experiments TREC 6-8 Interactive Track Queries Documents labeled into subtopics. 17 queries used, –considered only relevant docs –decouples relevance problem from diversity problem 45 docs/query, 20 subtopics/query, 300 words/doc Trained using LOO cross validation

TREC 6-8 Interactive Track Retrieving 5 documents

Can expect further benefit from having more training data.

Moving Forward Larger datasets –Evaluate relevance & diversity jointly Different types of training data –Our framework can define loss in different ways –Can we leverage clickthrough data? Different feature representations –Build on top of topic modeling approaches? –Can we incorporate hierarchical retrieval?

References & Code/Data “Predicting Diverse Subsets Using Structural SVMs” –[Yue & Joachims, ICML 2008] Source code and dataset available online – Work supported by NSF IIS , Microsoft Fellowship, and Yahoo! KTC Grant.