Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie.

Slides:

Advertisements

Similar presentations

Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.

Advertisements

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,

Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct.

Structured Queries for Legal Search TREC 2007 Legal Track Yangbo Zhu, Le Zhao, Jamie Callan, Jaime Carbonell Language Technologies Institute School of.

Introduction to Information Retrieval (Part 2) By Evren Ermis.

How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.

Information Retrieval Review

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.

INFO 624 Week 3 Retrieval System Evaluation

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

©2007 H5 Simultaneous Achievement of high Precision and high Recall through Socio-Technical Information Retrieval Robert S. Bauer, Teresa Jade

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

WikiQuery.org -- An interactive collaboration interface for creating, storing and sharing effective CNF queries Le Zhao*, Xiaozhong Liu #, Jamie Callan*

Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.

Modeling and Solving Term Mismatch for Full-Text Retrieval

Query Expansion.

TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.

Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.

CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Information Retrieval Evaluation and the Retrieval Process.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Evaluation INST 734 Module 5 Doug Oard. Agenda  Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)

Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.

Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.

Information Retrieval

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,

Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Information Retrieval and Extraction 2009 Term Project – Modern Web Search Advisor: 陳信希 TA: 蔡銘峰＆許名宏.

Lecture 12: Relevance Feedback & Query Expansion - II

Text Based Information Retrieval

An Empirical Study of Learning to Rank for Entity Search

IR Theory: Evaluation Methods

John Lafferty, Chengxiang Zhai School of Computer Science

Relevance and Reinforcement in Interactive Browsing

Retrieval Performance Evaluation - Measures

Information Retrieval and Web Design

Presentation transcript:

Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, 2012, Portland, OR

Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 2

Average term mismatch rate: 30-40% [Zhao10] A common cause of search failure [Harman03, Zhao10] Frequent user frustration [Feild10] Here: 50% - 300% gain in retrieval accuracy Term Mismatch Problem 3 Relevant docs not returned Web, short queries, stemmed, inlinks included

Term Mismatch Problem Example query (TREC 2006 Legal discovery task): approval of (cigarette company) logos on television watched by children 4 approvallogostelevisionwatchedchildren Mismatch94%86%79%90%82% Highest mismatch rate High mismatch rate for all query terms in this query

The Traditional Solution: Boolean Conjunctive Normal Form (CNF) Expansion Keyword query: approval of logos on television watched by children Manual CNF (TREC Legal track 2006): (approval OR guideline OR strategy) AND (logos OR promotion OR signage OR brand OR mascot OR marque OR mark) AND (television OR TV OR cable OR network) AND (watched OR view OR viewer) AND (children OR child OR teen OR juvenile OR kid OR adolescent) – Expressive & compact (1 CNF == 100s alternatives) – Highly effective (this work: % over base keyword) 5

The Potential Query: approval logos television watched children % Recall approval 6.49% logos 14.1% television 21.3% watched 10.4% children 18.0% Overall 2.04% The Potential ? Recall +guideline+strategy == 12.8% +promotion+signage... == 19.7% +tv+cable+network == 22.4% +view+viewer == 19.5% +child+teen+kid... == 19.3% == 8.74%

CNF Expansion Widely used in practice – Librarians [Lancaster68, Harter86] – Lawyers [Lawlor62, Blair85, Baron07] – Search experts [Clarke95, Hearst96, Mitra98] Less well studied in research – Users do not create effective free form Boolean queries ([Hearst09] cites many studies). Question: How to guide user effort in productive directions – restricting to CNF expansion (to the mismatch problem) – focusing on problem terms when expanding 7 WikiQuery [Open Source IR Workshop] Ad

Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 8

Diagnostic Intervention Goal – Least amount user effort  near optimal performance – E.g. expand 2 terms  90% of total improvement 9 approval of logos on television watched by children High idf (rare) terms CNF (approval OR guideline OR strategy) AND logos AND television AND (watch OR view OR viewer) AND children (approval OR guideline OR strategy) AND logos AND (television OR tv OR cable OR network) AND watch AND children Query: approval of logos on television watched by children Diagnosis: Expansion:CNF

Diagnostic Intervention 10 [ 0.9 (approval logos television watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 view 0.4 viewer)] [ 0.9 (approval cigar television watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 tv 0.4 cable 0.2 network) ] Diagnosis: Expansion query Bag of word Expansion:Bag of word Original query High idf (rare) terms approval of logos on television watched by children Query: approval of logos on television watched by children Goal – Least amount user effort  near optimal performance – E.g. expand 2 terms  90% of total improvement approval of logos on television watched by children

Diagnostic Intervention Diagnosis methods – Baseline: rareness (high idf) – High predicted term mismatch P(t | R) [Zhao10] Intervention methods – Baseline: bag of word (Relevance Model [Lavrenko01]) w/ manual expansion terms w/ automatic expansion terms – CNF expansion (probabilistic Boolean ranking) E.g. 11 _ (approval OR guideline OR strategy) AND P logos AND P television AND P (watch OR view OR viewer) AND P children

Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Evaluation: Simulated user studies Diagnosis & intervention effectiveness 12

User Keyword query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation (child AND cigar) (child > cigar) (child  teen) (child OR teen) AND cigar Diagnostic Intervention (We Hope to) 13

Diagnostic Intervention (We Hope to) User Keyword query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation 14 (child AND cigar) (child > cigar) (child OR teen) AND cigar (child  teen)

Expert user Keyword query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation Online simulation We Ended up Using Simulation 15 (child  teen) (child OR teen) AND cigar (child OR teen) AND (cigar OR tobacco) Full CNF Offline (child AND cigar) (child > cigar)

Diagnostic Intervention Datasets Document sets – TREC 2007 Legal track, 7 million tobacco corp., train on 2006 – TREC 4 Ad hoc track, 0.5 million newswire, train on TREC 3 CNF Queries – TREC 2007 by lawyers, TREC 4 by Univ. Waterloo [ Clarke95 ] – 50 topics each, 2-3 keywords per query Relevance Judgments – TREC 2007 sparse, TREC 4 dense Evaluation measures – TREC 2007 statAP, TREC 4 MAP 16

Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 17

P(t | R) vs. idf diagnosis Results – Diagnosis 18 Diagnostic CNF expansion on TREC 4 and %-50% No Expansion Full Expansion

Results – Expansion Intervention CNF vs. bag-of-word expansion 19 P(t | R) guided expansion on TREC 4 and % to 300% gain Similar level of gain in top precision

Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 20

Conclusions One of the most effective ways to engage user interactions – CNF queries gain % over keyword baseline. Mismatch diagnosis  simple & effective interactions – Automatic diagnosis saves user effort by 33%. Expansion in CNF easier and better than in bag of word – Bag of word requires balanced expansion of all terms. New research questions: – How to learn from manual CNF queries to improve automatic CNF expansion – get ordinary users to create effective CNF expansions (with the help of interfaces or search tools) 21

Acknowledgements Chengtao Wen, Grace Hui Yang, Jin Young Kim, Charlie Clarke, SIGIR Reviewers Helpful discussions & feedback Charlie Clarke, Gordon Cormack, Ellen Voorhees, NIST Access to data NSF grant IIS Opinions are solely the authors’. 22

END 23

The Potential Query: approval logos television watched children 24 logos+promotion+signage+brandAll Mismatch85.9%81.1%80.9%80.3% Recall 14.1%18.9%19.1%19.7% % Recall Recall logos 14.1% +promotion+signage... == 19.7% approval 6.49% +guideline+strategy == 12.8% television 21.3% +tv+cable+network == 22.4% watched 10.4% +view+viewer == 19.5% children 18.0% +child+teen+kid... == 19.3% Overall 2.04% == 8.74% The Potential ?

Failure Analysis (vs. baseline) Diagnosis: 4 topics: wrong P(t | R) prediction, lower MAP Intervention: 3 topics: right diagnosis, but lower MAP 2 of the 3: no manual expansion for the selected term – Users do not always recognize which terms need help. 1 of the 3: wrong expansion terms by expert – “apatite rocks” in nature, not “apatite” chemical – CNF expansion can be difficult w/o looking at retrieval results. 25

Failure Analysis -- Comparing diagnosis methods: P(t | R) vs. idf 26 User didn’t expand Wrong expansion Legend query query with unexpanded term(s)

Term Mismatch Diagnosis Predicting term recall - P(t | R) [Zhao10] – Query dependent features (model causes of mismatch) Synonyms of term t based on query q’s context How likely these synonyms occur in place of t Whether t is an abstract term How rare t occurs in the collection C – Regression prediction: f i (t, q, C)  P(t | R) – Used in term weighting for long queries Lower predicted P(t | R)  higher likelihood of mismatch  t more problematic 27

Online or Offline Study? Controlling confounding variables – Quality of expansion terms – User’s prior knowledge of the topic – Interaction effectiveness & effort Enrolling many users Offline simulations can avoid all these and still make reasonable observations 28

Simulation Assumptions Full expansion to simulate partial expansions 3 assumptions about user expansion process – Independent expansion of query terms A1: same set of expansion terms for a given query term, no matter which subset of query terms gets expanded A2: same sequence of expansion terms, no matter … – A3: Re-constructing keyword query from CNF Procedure to ensure vocabulary faithful to that of the original keyword description Highly effective CNF queries ensure reasonable kw baseline 29

Results – Level of Expansion More expansion per query term, better retrieval Result of expansion terms being effective Queries with significant gain in retrieval after expanding more than 4 terms: – Topic 84, cigarette sales in James Bond movies 30

Expert User Keyword Query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation Online simulation 31 (child OR youth) AND (cigar OR tobacco) (child AND cigar) (child --> youth) (child OR youth) AND cigar (child > cigar) Online simulation Offline Full CNF Query Idf (rare) Most infrequent