© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of Computer Science & AI University of Malta
© 2004 Chris Staff CSAW’04 University of Malta of 15 Aims of this presentation Background –The Vocabulary Problem in IR Scenario –Using retrieved documents to determine how to expand query Approach Evaluation
© 2004 Chris Staff CSAW’04 University of Malta of 15 The Vocabulary Problem Furnas et al, 1987, find that any two people describe the same concept/object using the same term with a probability of less than.2 This is a huge problem for IR –High probability of finding some documents about your term (but watch ambiguous terms!) –Low probability of finding all documents about your concept (so low ‘coverage’)
© 2004 Chris Staff CSAW’04 University of Malta of 15 What’s Query Expansion? Adding terms to query to improve recall while keeping precision high Recall is 1 when all relevant docs are retrieved Precision is 1 when all retrieved docs are relevant
© 2004 Chris Staff CSAW’04 University of Malta of 15 What’s Query Expansion? Attempts to improve recall (adding synonyms) usually involve constructed thesaurus (Qiu et al, 1995, Mandala et al, 1999, Voorhees, 1994) Attempts to improve precision (by adding restricting terms) now based around automatic relevance feedback (e.g., Mitra et al, 1998) Indiscriminate query expansion can lead to loss of precision (Voorhees, 1994) or hurt recall
© 2004 Chris Staff CSAW’04 University of Malta of 15 Scenario Two users search for information related to the same concept C User queries Q 1 and Q 2 have no terms in common R 1 and R 2 are results sets of Q 1 and Q 2 respectively R common = R 1 R 2
© 2004 Chris Staff CSAW’04 University of Malta of 15 Scenario We assume that R common is small and non- empty (Furnas, 1985 and Furnas et al, 1987) If R common is large then Q 1 and Q 2 will both retrieve same set of documents Can determine (using WordNet) if any term in Q 1 is the synonym of a term in Q 2 –Some doc D k in R common probably includes both terms (because of way Web IR works)!
© 2004 Chris Staff CSAW’04 University of Malta of 15 Scenario If t 1 in Q 1 and t 2 in Q 2 are synonyms –Can expand either in future queries containing t 1 or t 2 –As long as doc D k appears in results set (the context)
© 2004 Chris Staff CSAW’04 University of Malta of 15 Approach ‘Learning’ synonyms in context Query Expansion
© 2004 Chris Staff CSAW’04 University of Malta of 15 ‘Learning’ Synonyms in Context A document is associated with a “bag of words” ever used to retrieve doc A term, document pair is associated with a synset for the term in the context of the doc –Word sense from WordNet also recorded to reduce ambiguity
© 2004 Chris Staff CSAW’04 University of Malta of 15 Query Expansion in Context Submit unexpanded original user query Q to obtain results set R For each document D k in R (k is rank) retrieve synsets for terms in Q Same query term in context of different docs in R may yield inconsistent synsets –Countered using Inverse Document Relevance
© 2004 Chris Staff CSAW’04 University of Malta of 15 Inverse Document Relevance IDR is relative frequency with which doc d is retrieved in rank k when term q occurs in the query IDR q,d = W q,d / W d (where W d is number of times d retrieved, W q,d number of times d retrieved when q occurs in query)
© 2004 Chris Staff CSAW’04 University of Malta of 15 Term Document Relevance We then re-rank documents in R based on their TDR TDR q,d,k = IDR q,d x W q,d,k / W d,k Synsets of top-10 re-ranked document are merged according to word category and sense Most frequently occurring word category, word sense pair synset used to expand q in query
© 2004 Chris Staff CSAW’04 University of Malta of 15 Evaluation Need huge query log, ideally, with relevance judgements for queries We have TREC QA collection, but we’ll need to index them before running the test queries through them (using, e.g., SMART) –Disadvantage that there might not be enough queries User Studies
© 2004 Chris Staff CSAW’04 University of Malta of 15 Thank you!