Download presentation
Presentation is loading. Please wait.
Published bySolomon Atkinson Modified over 9 years ago
1
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling, Koh Speaker : Shun-Chen, Cheng
2
Outline Introduction The PhRank Algorithm Graph Construction Edge Weight Random Walk Vertex weights Term ranking Diversity filter Experiment Conclusions
3
Introduction Query : Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories.
4
Introduction long queries contain words that are peripheral or shared across many topics so expansion is prone to query drift. Past : jointly optimize weights and term selection using both global statistics and local syntactic features Shortcoming : fail to detect or differentiate informative terms Don’t reflect local query context Don’t identify all the informative relations
5
Introduction Goal : novel term ranking algorithm, PhRank, that extends work on Markov chain frameworks for query expansion to select compact and focused terms from within a query itself.
6
Outline Introduction The PhRank Algorithm Graph Construction Edge Weight Random Walk Vertex weights Term ranking Diversity filter Experiment Conclusions
7
Principles for Term Selection An informative word : Is informative relative to a query accurately represent the meaning of a query. Is related to other informative words if one index term is good at discriminating relevant from non-relevant documents, then any closely associated index term is also likely to be good at this Contains informative words all terms must contain informative words. Is discriminative in the retrieval collection A term that occurs many times within a small number of documents gives a pronounced relevance signal.
8
Graph Construction C : retrieval collection & English Wikipedia Example : Q : a b Top k documents : d1,d2 (if k=2) N(Neighborhood set) : {d0,d1,d2} , d0 : query encoded Graph G : c b a f e d1 : c b ed2 : a f b
9
Edge Weight 、 : the counts of stem co-occurrence in window size=2 and 10 in N : the probability of the document in which the stems i and j co- occur given Q With idf-weight : factor r confirms the importance of a connection between i and j in N
10
Random Walk 1 2 3 0.9 0.001 0.1 0.6 0.1 0.005 0.009 0.395 0.8 0.009 0.9 0.01 0.1 0.8 0.1 0.6 0.005 0.395 H =H = If it starts from node 1 at time=0 Then the probability that walks to node 3 at time=1 1 0 0 0.009 0.9 0.01 0.1 0.8 0.1 0.6 0.005 0.395 =
11
Vertex weights Factor s balances exhaustivity with global saliency to identify stems that are poor discriminators been relevant and non- relevant documents frequency of a word wn in N,averaged over k + 1 documents, and normalized by the maximum average frequency of any term in N the number of documents in C containing wn TREC query #840 : ‘Give the definition, locations, or characteristics of geysers’. => “definition geysers” is not more informative
12
Example Wn = geysers The avg frequency of “geysers” in N = 12/3 |N| = 3, |C|=35 max avg frequency of any term in N = 4 dfwn = 3 Wn = definition The frequency of “definition” in N = 2/3 max avg frequency of any term in N = 4dfwn = 1
13
Term ranking Input : all combinations of 1-3 words in a query that are not stopwords. Output : Rank list sorted by f(x,Q) score To avoid a bias towards longer terms, a term x is scored by averaging the affinity scores for its component words factor z x that represents the degree to which the term is discriminative in a collection the frequency of xe in C
14
example Term x = volcanic boundariesTerm x = volcanic U.S Query: Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories.
15
Outline Introduction The PhRank Algorithm Diversity filter Experiment Conclusions
16
Diversity filter PhRank often assigns a high rank to multi-word terms that contain only one highly informative word For example, query: the destruction of Pan Am Flight 103 over Lockerbie, Scotland term ‘pan flight 103 ’ is informative “pan” is uninformative by itself Example : on the assumption that the longer term better represents the information need. on the assumption that the shorter terms better represent the information need and the longer term is redundant.. birth rate china. birth rate Way 1 :. declining birth. birth rate. declining birth rate Way 2 : Discarded!
17
Outline Introduction The PhRank Algorithm Diversity filter Experiment Conclusions
18
Experiment Dataset F : excluded from features , T : include in features
19
Experiment
20
TREC description topics TREC title queries
21
Outline Introduction The PhRank Algorithm Diversity filter Experiment Conclusions
22
have presented PhRank, a novel term ranking algorithm that extends work on Markov chain frameworks for query expansion to select focused and succinct terms from within a query. For all collections, around 26% of queries have more than 5% decrease in MAP compared to SD Efficiency considerations surrounding the time to construct an affinity graph may be ameliorated by off-line indexing to precompute a language model for each document in a collection.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.