Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling, Koh Speaker : Shun-Chen, Cheng

Outline Introduction The PhRank Algorithm Graph Construction Edge Weight Random Walk Vertex weights Term ranking Diversity filter Experiment Conclusions

Introduction Query ： Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories.

Introduction long queries contain words that are peripheral or shared across many topics so expansion is prone to query drift. Past ： jointly optimize weights and term selection using both global statistics and local syntactic features Shortcoming ： fail to detect or differentiate informative terms Don’t reflect local query context Don’t identify all the informative relations

Introduction Goal ： novel term ranking algorithm, PhRank, that extends work on Markov chain frameworks for query expansion to select compact and focused terms from within a query itself.

Outline Introduction The PhRank Algorithm Graph Construction Edge Weight Random Walk Vertex weights Term ranking Diversity filter Experiment Conclusions

Principles for Term Selection An informative word ： Is informative relative to a query accurately represent the meaning of a query. Is related to other informative words if one index term is good at discriminating relevant from non-relevant documents, then any closely associated index term is also likely to be good at this Contains informative words all terms must contain informative words. Is discriminative in the retrieval collection A term that occurs many times within a small number of documents gives a pronounced relevance signal.

Graph Construction C ： retrieval collection & English Wikipedia Example ： Q ： a b Top k documents ： d1,d2 (if k=2) N(Neighborhood set) ： {d0,d1,d2} ， d0 ： query encoded Graph G ： c b a f e d1 ： c b ed2 ： a f b

Edge Weight 、： the counts of stem co-occurrence in window size=2 and 10 in N ： the probability of the document in which the stems i and j co- occur given Q With idf-weight ： factor r confirms the importance of a connection between i and j in N

Random Walk 1 2 3 0.9 0.001 0.1 0.6 0.1 0.005 0.009 0.395 0.8 0.009 0.9 0.01 0.1 0.8 0.1 0.6 0.005 0.395 H =H = If it starts from node 1 at time=0 Then the probability that walks to node 3 at time=1 1 0 0 0.009 0.9 0.01 0.1 0.8 0.1 0.6 0.005 0.395 =

Vertex weights Factor s balances exhaustivity with global saliency to identify stems that are poor discriminators been relevant and non- relevant documents frequency of a word wn in N,averaged over k + 1 documents, and normalized by the maximum average frequency of any term in N the number of documents in C containing wn TREC query #840 ： ‘Give the definition, locations, or characteristics of geysers’. => “definition geysers” is not more informative

Example Wn = geysers The avg frequency of “geysers” in N = 12/3 |N| = 3, |C|=35 max avg frequency of any term in N = 4 dfwn = 3 Wn = definition The frequency of “definition” in N = 2/3 max avg frequency of any term in N = 4dfwn = 1

Term ranking Input ： all combinations of 1-3 words in a query that are not stopwords. Output ： Rank list sorted by f(x,Q) score To avoid a bias towards longer terms, a term x is scored by averaging the affinity scores for its component words factor z x that represents the degree to which the term is discriminative in a collection the frequency of xe in C

example Term x = volcanic boundariesTerm x = volcanic U.S Query: Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories.

Outline Introduction The PhRank Algorithm Diversity filter Experiment Conclusions

Diversity filter PhRank often assigns a high rank to multi-word terms that contain only one highly informative word For example, query: the destruction of Pan Am Flight 103 over Lockerbie, Scotland term ‘pan flight 103 ’ is informative “pan” is uninformative by itself Example ： on the assumption that the longer term better represents the information need. on the assumption that the shorter terms better represent the information need and the longer term is redundant.. birth rate china. birth rate Way 1 ：. declining birth. birth rate. declining birth rate Way 2 ： Discarded!

Experiment Dataset F ： excluded from features ， T ： include in features

Experiment

TREC description topics TREC title queries

have presented PhRank, a novel term ranking algorithm that extends work on Markov chain frameworks for query expansion to select focused and succinct terms from within a query. For all collections, around 26% of queries have more than 5% decrease in MAP compared to SD Efficiency considerations surrounding the time to construct an affinity graph may be ameliorated by off-line indexing to precompute a language model for each document in a collection.

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Similar presentations

Presentation on theme: "Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Similar presentations

Presentation on theme: "Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,"— Presentation transcript:

Similar presentations

About project

Feedback