Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
A Framework for Result Diversification
Introduction to Information Retrieval
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Super Awesome Presentation Dandre Allison Devin Adair.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Optimal Ad Ranking for Profit Maximization Raju Balakrishnan (Arizona State University) Subbarao Kambhampati (Arizona State University) TexPoint fonts.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Personalized Search Result Diversification via Structured Learning
A Utility-Theoretic Approach to Privacy and Personalization Andreas Krause Carnegie Mellon University work performed during an internship at Microsoft.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Diversifying Search Results Rakesh AgrawalSreenivas GollapudiSearch LabsMicrosoft Research Alan HalversonSamuel.
Learning to Rank From Pairwise Approach to Listwise Approach.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
What Does the User Really Want ? Relevance, Precision and Recall.
Post-Ranking query suggestion by diversifying search Chao Wang.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Approximation Algorithms for Generalized Min-Sum Set Cover Ravishankar Krishnaswamy Carnegie Mellon University joint work with Nikhil Bansal and Anupam.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 10 Evaluation.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Sampath Jayarathna Cal Poly Pomona
Evaluation of IR Systems
Lecture 10 Evaluation.
Personalizing Search on Shared Devices
Structured Learning of Two-Level Dynamic Rankings
Intent-Aware Semantic Query Annotation
Lecture 6 Evaluation.
Evaluating Information Retrieval Systems
Feature Selection for Ranking
Presentation transcript:

Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

Ambiguity and Diversification Many queries are ambiguous – “Barcelona” (City? Football team? Movie?) – “Michael Jordan” Michael I. JordanMichael J. Jordan

Ambiguity and Diversification Many queries are ambiguous – “Barcelona” (City? Football team? Movie?) – “Michael Jordan” (which one?) How best to answer ambiguous queries? Use context, make suggestions, … Under the premise of returning a single (ordered) set of results, how best to diversify the search results so that a user will find something useful?

Intuition behind Our Approach Analyze click logs for classifying queries and docs Maximize the probability that the average user will find a relevant document in the retrieved results Use the analogy of marginal utility to determine whether to include more results from an already covered category

Outline Problem formulation Theoretical analysis Metrics to measure diversity Experiments

Assumptions A taxonomy (categorization of intents) C – For each query q, P(c | q) denote distribution of intents –  c ∊ C P(c | q) = 1 Quality assessment of documents at intent level – For each doc d, V(d | q, c) denote probability of the doc satisfying the intent – Conditional independence Users are interested in finding at least one satisfying document

Problem Statement D IVERSIFY ( K ) Given a query q, a set of documents D, distribution P(c | q), quality estimates V(d | c, q), and integer k Find a set of docs S  D with |S| = k that maximizes interpreted as the probability that the set S is relevant to the query over all possible intentions Find at least one relevant docMultiple intents

Discussion of Objective Makes explicit use of taxonomy – In contrast, similarity-based: [CG98], [CK06], [RKJ08] Captures both diversification and doc relevance – In contrast, coverage-based: [Z+05], [C+08], [V+08] Specific form of “loss minimization” [Z02], [ZL06] “Diminishing returns” for docs w/ the same intent Objective is order-independent – Assumes that all users read k results – May want to optimize  k P(k) P(S | q)

Outline Problem formulation Theoretical analysis Metrics to measure diversity Experiments

Properties of the Objective D IVERSIFY ( K ) is NP-Hard – Reduction from Max-Cover No single ordering that will optimize for all k Can we make use of “diminishing returns”?

A Greedy Algorithm Input: k, q, C, D, P(c | q), V (d | q, c) Output : set of documents S S = ∅ ∀c ∈ C, U(c | q) ← P(c | q) while |S| < k do for d ∈ D do g(d | q, c) ←  c U(c | q)V (d | q, c) end for d ∗ ← argmax g(d | q, c) S ← S ∪ {d ∗ } ∀c ∈ C, U(c | q) ← (1 − V (d ∗ | q, c))U(c | q) D ← D \ {d ∗ } end while U(c | q): conditional prob of intent c given query q g(d | q, c): current prob of d satisfying q, c Update the posterior

Intent distribution: P(R | q) = 0.8, P(B | q) = A Greedy Algorithm DV(d | q, c) g(d | q, c) U(R | q) =U(B | q) = × 0.8 × 0.2 × 0.08 × × 0.08 × S Actually produces an ordered set of results Results not proportional to intent distribution Results not according to (raw) quality Better results ⇒ less needed to be shown

Formal Claims Lemma 1 P(S | q) is submodular. – Same intuition as diminishing returns – For sets of documents where S  T, and a document d, Theorem 1 Solution is an (1 – 1/e) approx from opt. – Consequence of Lemma 1 and [NWF78] Theorem 2 Solution is optimal when each document can only satisfy one category. – Relative quality of docs does not change

Outline Problem formulation Theoretical analysis Metrics to measure diversity Experiments

How to Measure Success? Many metrics for relevance – Normalized discounted cumulative gains at k – Mean average precision at k – Mean reciprocal rank (MRR) Some metrics for diversity – Maximal marginal relevance (MMR) [CG98] – Nugget-based instantiation of NDCG [C+08] Want a metric that can take into account both relevance and diversity [JK00]

Generalizing Relevance Metrics Take expectation over distribution of intents – Interpretation: how will the average user feel? Consider – Classic: – NDCG-IA depends on intent distribution and intent- specific NDCG

Outline Problem formulation Theoretical analysis Metrics to measure diversity Experiments

Setup 10,000 queries randomly sampled from logs – Queries classified acc. to ODP (level 2) [F+08] – Keep only queries with at least two intents (~900) Top 50 results from Live, Google, and Yahoo! Documents are rated on a 5-pt scale – >90% docs have ratings – Docs w/o ratings are assigned random grade according to the distribution of rated documents

Experiment Detail Documents are classified using a Rocchio classifier – Assumes that each doc belongs to only one category Quality scores of documents are estimated based on textual and link features of the webpage – Our approach is agnostic of how quality is determined – Can be interpreted as a re-ordering of search results that takes into account ambiguities in queries Evaluation using generalized NDCG, MAP, and MRR – f(relevance(d)) = 2^rel(d); discount(j) = 1 + lg 2 (j) – Take P(c | q) as ground truth

NDCG-IA

MAP-IA and MRR-IA

Evaluation using Mechanical Turk Created two types of HITs on Mechanical Turk – Query classification: workers are asked to choose among three interpretations – Document rating (under the given interpretation) Two additional evaluations – MT classification + current ratings – MT classification + MT document ratings

Evaluation using Mechanical Turk

Concluding Remarks Theoretical approach to diversification supported by empirical evaluation What to show is a function of both intent distribution and quality of documents – Less is needed when quality is high There are additional flexibilities in our approach – Not tied to any taxonomy – Can make use of context as well

Future Work When is it right to diversify? – Users have certain expectations about the workings of a search engine What is the best way to diversify? – Evaluate approaches beyond diversifying the retrieved results Metrics that capture both relevance and diversity – Some preliminary work suggests that there will be certain trade-offs to make

Thanks {rakesha, sreenig, alanhal,