İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR.

İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR

Motivation  Documents contain temporal information in the form of temporal expressions

Motivation

 Users have temporal information needs  Query: Prime Minister United Kingdom 2000 Motivation

 Users have temporal information needs  Query: Prime Minister United Kingdom 2000 PROBLEM Traditional information retrieval systems do not exploit the temporal content in documents Temporal expressions are more than common terms Motivation

 Users have temporal information needs  Query: Prime Minister United Kingdom 2000 PROBLEM Traditional information retrieval systems do not exploit the temporal content in documents OUR APPROACH Integrates temporal dimension into a language model based retrieval framework Temporal expressions are more than common terms Motivation

 Motivation  Model  Our Approach  Experimental Evaluation Outline

 Document d = { d text,d temp }  d text : a bag of textual terms  d temp : a bag of temporal expressions Document Model

 Document d = { d text,d temp }  d text : a bag of textual terms  d temp : a bag of temporal expressions  a temporal expression is considered as a time interval T = [ begin, end ] beginend0 T [ ] Document Model

 Query q = { q text,q temp }  q text : set of textual terms  q temp : set of temporal expressions  Prime Minister United Kingdom 2000 q temp qtextqtext Query Model

 Motivation  Model  Our Approach  Filtering Approach  Weighted Approach  Experimental Evaluation Outline

Our Baseline: Ponte and Croft‘s Model (LM)  Each document has a language model associated  Query is a random process  Documents are ranked according to the likelihood that the query would be generated by the language model estimated for each document

Filtering Approach (LMF)  Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query t

Filtering Approach  Idea: Discard all documents that do not contain any temporal expression relevant to the user‘s query  our definition of temporal relevance  only relevant, if it overlaps with a temporal expression from the query t 28 Nov 1990 - 2 May 1997 2 May 1997 – 27 June 2007 2000 beginend query

Filtering Approach  Idea: Discard all documents that do not contain any relevant temporal expressions to user‘s query  our definition of temporal relevance  only relevant, if it overlaps with a temporal expression from the query t 28 Nov 1990 - 2 May 1997 2 May 1997 – 27 June 2007 beginend Relevant X Irrelevant 2000 query

 Problem: has a black-and-white view of the world  Does not take into account  how many relevant temporal expressions a document contains  how closely they match the temporal expressions specified in the user‘s query Filtering Approach

 Problem: has a black-and-white view of the world  Does not take into account  how many relevant temporal expressions a document contains  how closely they match the temporal expressions specified in the user‘s query  query: 1980 – 1990 1980 – 1989 is more relevant than 23 March 1984 Filtering Approach

 Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query Weighted Approach (LMW)

 Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query  We assume that q text and q temp are produced independently Weighted Approach

 Idea: Assign higher relevance to a document, if it contains more temporal expressions that match more closely to the temporal expressions from the user‘s query  We assume that q text and q temp are produced independently  Temporal expressions occur independently Weighted Approach

 Each temporal expression T in d is a sample from a different generative model Weighted Approach

 Each temporal expression T in d is a sample from a different generative model  Generating a temporal expression Q = [qBegin, qEnd] given d temp 1.draw a single temporal expression T=[dBegin, dEnd] at uniform from d 2.generate Q by the generative model that is associated with T Weighted Approach

 Each temporal expression T in d is a sample from a different generative model  Generating a temporal expression Q = [qBegin, qEnd] given d temp 1.draw a single temporal expression T=[dBegin, dEnd] at uniform from d 2.generate Q by the generative model that is associated with T  The likelihood of generating Q by the set of generative models that produced d temp Weighted Approach

 Generate Q = [qBegin, qEnd] from the query by the generative model that is associated with T = [dBegin, dEnd] from a document dEnddEnd+α(dEnd-dbegin)dBegindEnddBegin-α(dEnd-dBegin)qBeginqEnd P(qBegin)P(qEnd|qBegin) Weighted Approach qBegin

 Generate Q = [qBegin, qEnd] from the query by the generative model that is associated with T = [dBegin, dEnd] from a document dEnd dEnd + α (dEnd-dbegin) dBegindEnd dBegin - α (dEnd-dBegin) qBeginqEnd P(qBegin)P(qEnd|qBegin) Weighted Approach qBegin  produces only relevant temporal expressions of T  P(Q|T) gets smaller as the length of their overlap decreases

 Motivation  Model  Our Approach  Experimental Evaluation Outline

Dataset HTML snapshot of English Wikipedia from May 2007 containing ~ 2M documents Implementation  Terrier Information Retrieval Platform:  provides an implementation of Ponte & Croft's approach  LMF, LMW  Java + MySQL  A set of regular expressions for extracting temporal information Experimental Evaluation

Anectodal query results - 1 LMLMFLMW 1Art in Puerto RicoJose del Castillo 2Spanish ArtList of Spanish ArtistsRoybal 3Plazzo Bianco(Genoa)RoybalAugustine Esteve 4CaprichosAugustine EsteveMaldonado 5Portrait PaintingFrancisco Eduardo TresguerrasLuis Egidio Melendez Spanish painter 18 th century Experimental Evaluation

Anectodal query results - 2 LMLMFLMW 1Battle of Dunbar(1650)List of Norwegian BattlesBattle of Gabbard 2Monte MataiurBattle of Portland 3St. George CayeAction of 22 February 1812Battle of Schveningen 4Culrain ScottlandNaval StrategyBattle of Kentish Knock 5First Anglo-Dutch WarBattle of GabbardBattle of Dungeness Sea Battle 1650 - 1670 Experimental Evaluation

User Study  20 queries  Pooling top-10 results returned by the three methods  Relevance assessment by 15 users  highly relevant: 2  marginally relevant: 1  irrelevant: 0  NDCG as a measure of effectiveness Experimental Evaluation

Thank you! Questions?

Conclusion  Documents are rich of temporal expressions, but existing retrieval models are ignorant of their inherent semantics  Our work proposes two methods addressing this problem  Initial experimental evidence shows that our methods improve retrieval effectiveness for temporal information needs

Experimental Evaluation

Queries 1Mergers and Acquisitions 2United States Railway 3Folklore Music 4Earthquake 5Sea Battle 6United States Secretary of State 7Native Americans 8German Architecture 9Internet 10Olympic Games

Queries 11Blues Music 12Personal Computer 13Clint Eastwood 14Black Death Spain 15Italian Fascism 16George Bush 17Flying Machine 18Spanish Painter 19Economic Situation Germany 20Ford Motor Company

 generative model associated with T =[b,e] ee+α(e-b)b’ be b-α(e-b) P(b’)P(e’) Weighted Approach only generates overlapping intervals of T P(b’,e’) ~ |overlap|

Our Baseline: Ponte and Croft‘s Model (LM)  Query likelihood: the likelihood that a query q and a document d is generated by the same language model  depends on the term frequency of query words in the document and their collection frequency

İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR.

Similar presentations

Presentation on theme: "İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR.

Similar presentations

Presentation on theme: "İrem Arıkan, Srikanta Bedathur, Klaus Berberich Time Will Tell: Leveraging Temporal Expressions in IR."— Presentation transcript:

Similar presentations

About project

Feedback