Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Information Retrieval Models: Probabilistic Models
Evaluating Search Engine
How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Chapter 5: Information Retrieval and Web Search
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
Modeling and Solving Term Mismatch for Full-Text Retrieval
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Lecture 13: Language Models for IR
Lecture 12: Relevance Feedback & Query Expansion - II
CSCI 5417 Information Retrieval Systems Jim Martin
Information Retrieval Models: Probabilistic Models
Language Models for Information Retrieval
Applying Key Phrase Extraction to aid Invalidity Search
IR Theory: Evaluation Methods
Murat Açar - Zeynep Çipiloğlu Yıldız
John Lafferty, Chengxiang Zhai School of Computer Science
Chapter 5: Information Retrieval and Web Search
Language Model Approach to IR
CS 430: Information Discovery
Introduction to Search Engines
Presentation transcript:

Term Necessity Prediction P(t | R q ) Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Oct 15, Necessity is as important as idf (theory) Explains behavior of IR models (practice) Can be predicted Performance gain Main Points

Definition of Necessity P(t | R q ) Directly calculated given relevance judgements for q Docs that contain t Relevant (q) 2 P(t | R q ) = 0.4 Collection Necessity == 1 – mismatch == term recall

Why Necessity? Roots in Probabilistic Models Binary Independence Model –[Robertson and Spärck Jones 1976] –“Relevance Weight”, “Term Relevance” P(t | R) is effectively the only part about relevance. 3 Necessity odds idf (sufficiency) Necessity is as important as idf (theory) Explains behavior of IR models (practice) Can be predicted Performance gain Main Points

Without Necessity The emphasis problem for idf-only term weighting –Emphasize high idf terms in query “prognosis/viability of a political third party in U.S.” (Topic 206) 4

Ground Truth partypoliticalthirdviabilityprognosis True P(t | R) idf Emphasis TREC 4 topic 206

Indri Top Results 1. (ZF ) Recession concerns lead to a discouraging prognosis for (AP ) Politics … party … Robertson's viability as a candidate 3. (WSJ ) political parties … 4. (AP ) there is no viable opposition … 5. (WSJ ) A third of the votes 6. (WSJ ) politics, party, two thirds 7. (AP ) third ranking political movement… 8. (AP ) political parties 9. (AP ) prognosis for the Sunday school 10. (ZF ) third party provider (Google, Bing still have top 10 false positives. Emphasis also a problem for large search engines!) 6

Without Necessity The emphasis problem for idf-only term weighting –Emphasize high idf terms in query “prognosis/viability of a political third party in U.S.” (Topic 206) –False positives throughout rank list especially detrimental at top rank –No term recall hurts precision at all recall levels –(This is true for BIM, and also BM25, LM that use tf.) How significant is the emphasis problem? 7

Failure Analysis of 44 Topics from TREC RIA workshop 2003 (7 top research IR systems, >56 expert*weeks) Necessity term weighting Necessity guided expansion Basis: Term Necessity Prediction Necessity is as important as idf (theory) Explains behavior of IR models (practice) & Bigrams, &Term restriction using doc fields Can be predicted Performance gain Main Points

Given True Necessity +100% over BIM (in precision at all recall levels) [Robertson and Spärk Jones 1976] % over Language Model, BM25 (in MAP) This work For a new query w/o relevance judgements, need to predict necessity. –Predictions don’t need to be very accurate to show performance gain. 9

(Examples from TREC 3 topics) Term in Query Oil Spills Term limitations for US Congress members Insurance Coverage which pays for Long Term Care School Choice Voucher System and its effects on the US educational program Vitamin the cure or cause of human ailments P(t | R) How Necessary are Words? 10

Mismatch Statistics Mismatch variation across terms (TREC 3 title) (TREC 9 desc) –Not constant, need prediction 11

Mismatch Statistics (2) Mismatch variation for the same term in different queries TREC 3 recurring words –Query dependent features needed (1/3 term occurrences have necessity variation>0.1) 12

Prior Prediction Approaches Croft/Harper combination match (1979) –treats P(t | R) as a tuned constant –when >0.5, rewards docs that match more query terms Greiff’s (1998) exploratory data analysis –Used idf to predict overall term weighting –Improved over BIM Metzler’s (2008) generalized idf –Used idf to predict P(t | R) –Improved over BIM Years of simple idf feature, limited success –Missing piece: P(t | R) = term necessity = term recall 13

Factors that Affect Necessity What causes a query term to not appear in relevant documents? Topic Centrality (Concept Necessity) –E.g., Laser research related or potentially related to US defense, Welfare laws propounded as reforms Synonyms –E.g., movie == film == … Abstractness –E.g., Ailments in the vitamin query, Dog Maulings, Christian Fundamentalism –Worst thing is a rare & abstract term, e.g. prognosis 14

Features We need to –Identify synonyms/searchonyms of a query term –in a query dependent way Use Thesauri? –Biased (not collection dependent) –Static (not query dependent) –Not promising, Not easy Term-term similarity in concept space! –Local LSI (Latent Semantic Indexing) LSI of (e.g. 200) top ranked documents keep (e.g. 150) dimensions 15

Features Topic Centrality –Length of term vector after dimension reduction (local LSI) Synonymy (Concept Necessity) –Average similarity scores of top 5 similar terms Replaceability –Adjust the Synonymy measure by how many new documents the synonyms match Abstractness –Users modify abstract terms with concrete terms 16 effects on the US educational programprognosis of a political third party

Experiments Necessity Prediction Error –Regression problem Model: RBF kernel regression, M:  P(t | R) Necessity for Term Weighting –End-to-End retrieval performance –How to weight terms by their necessity In BM25 –Binary Independence Model In Language Models –Relevance model P m (t | R) – multinomial (Lavrenko and Croft 2001) 17

Necessity Prediction Example 18 partypoliticalthirdviabilityprognosis True P(t | R) Predicted Emphasis Trained on TREC 3, tested on TREC 4

Necessity Prediction Error 19 L1 Loss: The lower The better Necessity is as important as idf Explains behavior of IR models Can be predicted Performance gain Main Points

Predicted Necessity Weighting 20 TREC train sets Test/x-validation4688 LM desc – Baseline LM desc – Necessity Improvement26.38%23.52%20.33%21.32% Baseline Necessity Baseline Necessity % gain (necessity weight) 10-20% gain (top Precision)

TREC train sets Test/x-validation LM desc – Baseline LM desc – Necessity Improvement11.43%11.25%149.8%24.82% Baseline Necessity Baseline Necessity Predicted Necessity Weighting (ctd.) 21 Necessity is as important as idf Explains behavior of IR models Can be predicted Performance gain Main Points

vs. Relevance Model Test/x-validation Relevance Model desc RM reweight-Only desc RM reweight-Trained desc Weight Only ≈ Expansion Supervised > Unsupervised (5-10%) Relevance Model: #weight( 1-λ #combine( t 1 t 2 ) λ #weight( w 1 t 1 w 2 t 2 w 3 t 3 … ) x ~ y w 1 ~ P(t 1 |R) w 2 ~ P(t 2 |R) x y

23 Necessity is as important as idf (theory) Explains behavior of IR models (practice) Effective features can predict necessity Performance gain Take Home Messages

Acknowledgements Reviewers from multiple venues Ni Lao, Frank Lin, Yiming Yang, Stephen Robertson, Bruce Croft, Matthew Lease –Discussions & references David Fisher, Mark Hoy –Maintaining the Lemur toolkit Andrea Bastoni and Lorenzo Clemente –Maintaining LSI code for Lemur toolkit SVM-light, Stanford parser TREC –All the data NSF Grant IIS and IIS Feedback: Le Zhao

25 Not Concept Necessity, Not Necessity to good performance

Related Work P(t | R) or term weighting prediction –Berkeley regression, Cooper et al (1993) –Regression rank, Lease et al (2009) –Exploratory data analysis, Greiff (1998) –Generalized IDF, Metzler (2008) Key concepts and long query reduction –Bendersky and Croft (2008) “must be … in a retrieved document in order for it to be relevant.” –Kumaran and Carvalho (2009) 26

Future Research Directions To improve necessity prediction –Click-through data & query rewrites Close to relevance judgements Associate clicks w/ result snippets –Better understanding of necessity & better features For applying necessity in retrieval –Ad hoc retrieval For phrases and more complex structured terms, -- recall Structured query formulation –( (A1 OR A2 OR A3) AND (B1 OR B2 OR B3) ) –Google’s automatic synonym operator: ~ –Where to expand, which expansion terms to include –Relevance feedback 27

Knowledge How Necessity explains behavior of IR techniques Why weight query bigrams 0.1, while query unigrams 0.9? –Bigram decreases term recall, weight reflects recall Why Bigram not gaining stable improvements? –Term recall is more of a problem Why using document structure (field, semantic annotation) not improving performance? –Improves precision, need to solve structural mismatch Word sense disambiguation –Enhances precision, instead, should use in mismatch modeling! Identify query term sense, for searchonym id, or learning across queries Disambiguate collection term sense for more accurate replaceability Personalization –biases results to what a community/person likes to read (precision) –may work well in a mobile setting, short queries 28

Why Necessity? System Failure Analysis Reliable Information Access (RIA) workshop (2003) –Failure analysis for 7 top research IR systems 11 groups of researchers (both academia & industry) 28 people directly involved in the analysis (senior & junior) >56 human*weeks (analysis + running experiments) 45 topics selected from 150 TREC 6-8 (difficult topics) –Causes (necessity in various disguise) Emphasize 1 aspect, missing another aspect(14+2 topics) Emphasize 1 aspect, missing another term(7 topics) Missing either 1 of 2 aspects, need both(5 topics) Missing difficult aspect that need human help(7 topics) Need to expand a general term e.g. “Europe”(4 topics) Precision problem, e.g. “euro”, not “euro-…”(4 topics)Precision problem, e.g. “euro”, not “euro-…”(4 topics) 29

30

31

Recurring Words How much is necessity term dependent? –Term necessity in one query to predict same term’s necessity in another query Easy for industry scale relevance judgements or query logs 32

Local LSI Top Similar Terms Oil spillsInsurance coverage which pays for long term care Term limitations for US Congress members Vitamin the cure of or cause for human ailments oilterm ail spill0.5828term0.3310term0.3339ail oil0.4210long0.2173limit0.1696health tank0.0986nurse0.2114ballot0.1115disease crude0.0972care0.1694elect0.1042basler water0.0830home0.1268care0.0997dr

Predicting Necessity Problem definition –Training samples: features of q i from query q ~ –Prediction using model M : –Training objective (minimize prediction loss): 34

35

Necessity Term Weighting Baseline vs. True necessity weighting –30-80% gain in MAP Baseline vs. Predicted necessity weighting –10-25% gain in MAP Relevance Model vs. Reweight-Only & Necessity prediction on RM weights –Weighting matters more than expansion (long queries) Ablation study –All features help 36

(Examples from TREC 3 topics) Oil Spills Term limitations for US Congress members Insurance Coverage which pays for Long Term Care School Choice Voucher System and its effects on the US educational program Vitamin the cure or cause of human ailments How Necessary are Words? Query Term Oil Spills Term limitations for US Congress members Insurance Coverage which pays for Long Term Care School Choice Voucher System and its effects on the US educational program Vitamin the cure or cause of human ailments P(t | R) idf

True Necessity Weighting TREC Document collectiondisk 2,3disk 4,5d4,5 w/o crWT10g.GOV.GOV2 Topic numbers TD LM desc – Baseline LM desc – Necessity Improvement51.09%77.05%58.97%29.14%36.20%261.7%49.47% p - randomization p - sign test Multinomial-abs Multinomial RM Okapi desc – Baseline Okapi desc – Necessity LM title – BaselineN/A LM title – NecessityN/A

Feature Correlation f 1 Centrf 2 Synf 3 Replf 4 DepLeaff 5 idfRMw Predicted Necessity:

Prediction Based on RMw (x-axis) 40

Ablation Study Features usedMAPFeatures usedMAP IDF only0.1776All 5 features IDF + Centrality0.2076All but Centrality IDF + Synonymy0.2129All but Synonymy IDF + Replaceable0.1699All but Replaceable IDF + DepLeaf0.1900All but DepLeaf

Using Document Structure Stylistic: XML Syntactic/Semantic: POS, Semantic Role Label Current approaches –All precision oriented Need to solve mismatch first? 42

Be mean! Apply Necessity to your retrieval models! 43

Be mean! Is the term Necessary for doc relevance? 44 IR theory Potential in reality Prediction Factors Term weighting Features