A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Slides:

Advertisements

Similar presentations

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Beyond Bags of Words: A Markov Random Field Model for Information Retrieval Don Metzler.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Information Retrieval Models: Probabilistic Models

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

A Markov Random Field Model for Term Dependencies Chetan Mishra CS 6501 Paper Presentation Ideas, graphs, charts, and results from paper of same name by.

Evaluating Search Engine

Chapter 7 Retrieval Models.

Search Engines and Information Retrieval

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.

Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

INFO 624 Week 3 Retrieval System Evaluation

Integrating term dependencies according to their utility Jian-Yun Nie University of Montreal 1.

1 CS 430: Information Discovery Lecture 20 The User in the Loop.

Introduction to Language Models Evaluation in information retrieval Lecture 4.

Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.

Information Retrieval in Practice

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Search Engines and Information Retrieval Chapter 1.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.

1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.

Effective Query Formulation with Multiple Information Sources

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

SIGIR 2005 Relevance Information: A Loss of Entropy but a Gain for IDF? Arjen P. de Vries Thomas Roelleke,

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.

Lecture 13: Language Models for IR

Reading Notes Wang Ning Lab of Database and Information Systems

CSCI 5417 Information Retrieval Systems Jim Martin

Information Retrieval Models: Probabilistic Models

Compact Query Term Selection Using Topically Related Text

Language Models for Information Retrieval

Applying Key Phrase Extraction to aid Invalidity Search

A Markov Random Field Model for Term Dependencies

Murat Açar - Zeynep Çipiloğlu Yıldız

INF 141: Information Retrieval

Presentation transcript:

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information Retrieval

Overview Model Markov Random Field model Types of dependence Features Training Metric-based training Results Newswire data Web data

Past Work Co-occurrence Boolean queries [Croft et al.] Linked-dependence model [van Rijsbergen] Sequential / Structural Phrases [Fagan] N-gram language models [Song et al.] Recent work Dependence language model [Gao et. al.] Query operations [Mishne et. al.]

Motivation Terms are less likely to co-occur ‘by chance’ in small collections Need to enforce stricter dependencies in larger collections to filter out noise Query term order contains great deal of information “white house rose garden” “white rose house garden” Want a model that generalizes co-occurrence and phrase dependencies

Broad Overview Three steps: Create dependencies between query terms based on their order Define set of term and phrase features over query terms / document pairs Train model parameters

Markov Random Field Model Undirected graphical model representing joint probability over Q, D are feature functions over cliques Node types Document node D Query term nodes (i.e. Q = q 1 q 2 … q N ) Rank documents by P(D | Q)

Types of Dependence Independence (a) All terms are independent Sequential dependence (b) Terms independent of all non-adjacent terms given adjacent terms Full dependence (c) No independence assumptions (a) (b) (c)

Features Model allows us to define arbitrary features over the graph cliques Aspects we want to capture Term occurrences Query sub-phrases Example: prostate cancer treatment “prostate cancer”, “cancer treatment” Query term proximity Want query terms to occur within some proximity to each other in documents

prostate cancer treatment D prostatecancertreatment

Term Occurrence Features Features over cliques containing document and single query term node How compatible is this term with this document? D prostatecancertreatment

Term Occurrence Features Features over cliques containing document and single query term node How compatible is this term with this document? D prostatecancertreatment

Term Occurrence Features Features over cliques containing document and single query term node How compatible is this term with this document? D prostatecancertreatment

Query Sub-phrase Features Features over cliques containing document and contiguous set of (one or more) query terms How compatible is this query sub-phrase with this document? D prostatecancertreatment

Query Sub-phrase Features Features over cliques containing document and contiguous set of (one or more) query terms How compatible is this query sub-phrase with this document? D prostatecancertreatment

Query Sub-phrase Features Features over cliques containing document and contiguous set of (one or more) query terms How compatible is this query sub-phrase with this document? D prostatecancertreatment

Query Term Proximity Features Features over cliques containing document and any non-empty, non-singleton set of query terms How proximally compatible is this set of query terms? D prostatecancertreatment

Query Term Proximity Features Features over cliques containing document and any non-empty, non-singleton set of query terms How proximally compatible is this set of query terms? D prostatecancertreatment

Query Term Proximity Features Features over cliques containing document and any non-empty, non-singleton set of query terms How proximally compatible is this set of query terms? D prostatecancertreatment

Query Term Proximity Features Features over cliques containing document and any non-empty, non-singleton set of query terms How proximally compatible is this set of query terms? D prostatecancertreatment

Ranking Function Infeasible to have a parameter for every distinct feature Tie weights for each feature ‘type’ Final form of (log) ranking function: Need to tune

Likelihood-Based Training Maximum likelihood estimate Parameter point estimate that maximizes the likelihood of the model generating the sample Downfalls Small sample size TREC relevance judgments Unbalanced data “Metric divergence” [Morgan et. al] Need an alternative training method

Metric-Based Training Since systems are evaluated using mean average precision, why not directly maximize it? Feasible because of the small number of parameters Simple hill climbing

Alternate views of the model Full independence model, using our features, is exactly LM query likelihood Indri structured query Linear combination of features #weight(0.8 #combine(prostate cancer treatment ) 0.1 #combine( #1( prostate cancer ) #1( cancer treatment ) #1( prostate cancer treatment )) 0.1 #combine(#uw8( prostate cancer ) #uw8( cancer treatment ) #uw8( prostate treatment ) #uw12( prostate cancer treatment )) )

Experimental Setup Search engine: Indri Stemming: Porter Stopping: query-time Collection Statistics

Sequential Dependence Results Mean average precision results for various unordered window lengths 2 – ‘unordered bigrams’ 8 – sentence 50 – passage/paragraph Unlimited – co-occurrence Sentence-length windows appear to be good choice

Full Dependence Results Mean average precision using term, ordered, and unordered features Trained parameters generalize well across collections

Summary of Results Mean Average Precision 10 * stat. sig. better than FI ** stat. sig. over SD and FI † stat. sig. over FI, but stat. sig. worse than FD

Conclusions Model can take into account wide range of dependencies and features over query terms Metric-based training may be more appropriate than likelihood or geometric training methods Incorporating simple dependencies yields significant improvements Past work may have failed to produce good results due to data sparsity

Questions?

References W. B. Croft, H. R. Turtle, and D. D. Lewis. The Use of Phrases and Structured Queries in Information Retrieval. In Proceedings of SIGIR ’91, pages 32-45, J. L. Fagan. Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods. In Proceedings of SIGIR ’87, pages , W. Morgan, W. Greiff, and J. Henderson. Direct Maximization of Average Precision by Hill-Climbing, with a Comparison to a Maximum Entropy Approach. In Proceedings of the HLT-NAACL 2004: Short Papers, pages 93-96, F. Song and W. B. Croft. A General Language Model for Information Retrieval. In Proceedings of CIKM ’99, pages , C. J. van Rijsbergen. A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval. Journal of Documentation, 33(2): , 1977.

Relevance Distribution Population: Every imaginable user enters every imaginable query and produces a list of documents (from every possible document) they find relevant Intuition: the more “votes” a document has for a given query, the more relevant it is on average. Relevance distribution Could be applied on a per-user basis User-specific relevance distribution Given a large enough sample, we could directly estimate P( D | Q )

Metric Divergence Example ParametersTraining DataRanking Likelihood / MAP Logistic regression model: