İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR.

Slides:

Advertisements

Similar presentations

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.

Advertisements

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Exploiting Temporal References in Text Retrieval Irem Arikan advised by: Srikanta Bedathur, Klaus Berberich.

Beyond Boolean Queries Ranked retrieval  Thus far, our queries have all been Boolean.  Documents either match or don’t.  Good for expert users with.

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Scalable Text Mining with Sparse Generative Models

Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.

HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.

Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Text Mining In InQuery Vasant Kumar, Peter Richards August 25th, 1999.

Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Chapter 6: Information Retrieval and Web Search

Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Facilitating Document Annotation using Content and Querying Value.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.

Vector Space Models.

Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.

Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Facilitating Document Annotation Using Content and Querying Value.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.

Bayesian Query-Focused Summarization Slide 1 Hal Daumé III Bayesian Query-Focused Summarization Hal Daumé III and Daniel Marcu Information.

Language Models for Information Retrieval

Language Model Approach to IR

Category-Sensitive Question Routing in Community Question Answering

Retrieval Utilities Relevance feedback Clustering

INF 141: Information Retrieval

Information Retrieval and Web Design

A Neural Passage Model for Ad-hoc Document Retrieval

Presentation transcript:

İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR

Motivation  Documents contain temporal information in the form of temporal expressions

Motivation

 Users have temporal information needs  Query: Prime Minister United Kingdom 2000 Motivation

 Users have temporal information needs  Query: Prime Minister United Kingdom 2000 PROBLEM Traditional information retrieval systems do not exploit the temporal content in documents Temporal expressions are more than common terms Motivation

 Users have temporal information needs  Query: Prime Minister United Kingdom 2000 PROBLEM Traditional information retrieval systems do not exploit the temporal content in documents OUR APPROACH Integrates temporal dimension into a language model based retrieval framework Temporal expressions are more than common terms Motivation

 Motivation  Model  Our Approach  Experimental Evaluation Outline

 Document d = { d text,d temp }  d text : a bag of textual terms  d temp : a bag of temporal expressions Document Model

 Document d = { d text,d temp }  d text : a bag of textual terms  d temp : a bag of temporal expressions  a temporal expression is considered as a time interval T = [ begin, end ] beginend0 T [ ] Document Model

 Query q = { q text,q temp }  q text : set of textual terms  q temp : set of temporal expressions  Prime Minister United Kingdom 2000 q temp qtextqtext Query Model

 Motivation  Model  Our Approach  Filtering Approach  Weighted Approach  Experimental Evaluation Outline

Our Baseline: Ponte and Croft‘s Model (LM)  Each document has a language model associated  Query is a random process  Documents are ranked according to the likelihood that the query would be generated by the language model estimated for each document

Filtering Approach (LMF)  Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query t

Filtering Approach  Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query  our definition of temporal relevance  only relevant, if it overlaps with a temporal expression from the query t 28 Nov May May 1997 – 27 June beginend query

Filtering Approach  Idea: Discard all documents that do not contain any relevant temporal expressions to user‘s query  our definition of temporal relevance  only relevant, if it overlaps with a temporal expression from the query t 28 Nov May May 1997 – 27 June 2007 beginend Relevant X Irrelevant 2000 query

 Problem: has a black-and-white view of the world  Does not take into account  how many relevant temporal expressions a document contains  how closely they match the temporal expressions specified in the user‘s query Filtering Approach

 Problem: has a black-and-white view of the world  Does not take into account  how many relevant temporal expressions a document contains  how closely they match the temporal expressions specified in the user‘s query  query: 1980 – – 1989 is more relevant than 23 March 1984 Filtering Approach

 Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query Weighted Approach (LMW)

 Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query  We assume that q text and q temp are produced independently Weighted Approach

 Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query  We assume that q text and q temp are produced independently  Temporal expressions occur independently Weighted Approach

 Each temporal expression T in d is a sample from a different generative model Weighted Approach

 Each temporal expression T in d is a sample from a different generative model  Generating a temporal expression Q = [qBegin, qEnd] given d temp 1.draw a single temporal expression T=[dBegin, dEnd] at uniform from d 2.generate Q by the generative model that is associated with T Weighted Approach

 Each temporal expression T in d is a sample from a different generative model  Generating a temporal expression Q = [qBegin, qEnd] given d temp 1.draw a single temporal expression T=[dBegin, dEnd] at uniform from d 2.generate Q by the generative model that is associated with T  The likelihood of generating Q by the set of generative models that produced d temp Weighted Approach

 Generate Q = [qBegin, qEnd] from the query by the generative model that is associated with T = [dBegin, dEnd] from a document dEnddEnd+α(dEnd-dbegin)dBegindEnddBegin-α(dEnd-dBegin)qBeginqEnd P(qBegin)P(qEnd|qBegin) Weighted Approach qBegin

 Generate Q = [qBegin, qEnd] from the query by the generative model that is associated with T = [dBegin, dEnd] from a document dEnd dEnd + α (dEnd-dbegin) dBegindEnd dBegin - α (dEnd-dBegin) qBeginqEnd P(qBegin)P(qEnd|qBegin) Weighted Approach qBegin  produces only relevant temporal expressions of T  P(Q|T) gets smaller as the length of their overlap decreases

 Motivation  Model  Our Approach  Experimental Evaluation Outline

Dataset HTML snapshot of English Wikipedia from May 2007 containing ~ 2M documents Implementation  Terrier Information Retrieval Platform:  provides an implementation of Ponte & Croft's approach  LMF, LMW  Java + MySQL  A set of regular expressions for extracting temporal information Experimental Evaluation

Anectodal query results - 1 LMLMFLMW 1Art in Puerto RicoJose del Castillo 2Spanish ArtList of Spanish ArtistsRoybal 3Plazzo Bianco(Genoa)RoybalAugustine Esteve 4CaprichosAugustine EsteveMaldonado 5Portrait PaintingFrancisco Eduardo TresguerrasLuis Egidio Melendez Spanish painter 18 th century Experimental Evaluation

Anectodal query results - 2 LMLMFLMW 1Battle of Dunbar(1650)List of Norwegian BattlesBattle of Gabbard 2Monte MataiurBattle of Portland 3St. George CayeAction of 22 February 1812Battle of Schveningen 4Culrain ScottlandNaval StrategyBattle of Kentish Knock 5First Anglo-Dutch WarBattle of GabbardBattle of Dungeness Sea Battle Experimental Evaluation

User Study  20 queries  Pooling top-10 results returned by the three methods  Relevance assessment by 15 users  highly relevant: 2  marginally relevant: 1  irrelevant: 0  NDCG as a measure of effectiveness Experimental Evaluation

Thank you! Questions?

Conclusion  Documents are rich of temporal expressions, but existing retrieval models are ignorant of their inherent semantics  Our work proposes two methods addressing this problem  Initial experimental evidence shows that our methods improve retrieval effectiveness for temporal information needs

Experimental Evaluation

Queries 1Mergers and Acquisitions 2United States Railway 3Folklore Music 4Earthquake 5Sea Battle 6United States Secretary of State 7Native Americans 8German Architecture 9Internet 10Olympic Games

Queries 11Blues Music 12Personal Computer 13Clint Eastwood 14Black Death Spain 15Italian Fascism 16George Bush 17Flying Machine 18Spanish Painter 19Economic Situation Germany 20Ford Motor Company

 generative model associated with T =[b,e] ee+α(e-b)b’ be b-α(e-b) P(b’)P(e’) Weighted Approach only generates overlapping intervals of T P(b’,e’) ~ |overlap|

Our Baseline: Ponte and Croft‘s Model (LM)  Query likelihood: the likelihood that a query q and a document d is generated by the same language model  depends on the term frequency of query words in the document and their collection frequency