Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compact Query Term Selection Using Topically Related Text

Similar presentations


Presentation on theme: "Compact Query Term Selection Using Topically Related Text"— Presentation transcript:

1 Compact Query Term Selection Using Topically Related Text
K. Tamsin Maxwell, W. Bruce Croft SIGIR 2013

2 Outline Introduction Related Work Principle for Term Selection
PhRank Algorithm Evaluation Framework Experiments Conlusion

3 Introduction Recent query reformulation techniques usually uses pseudo relevant feecback in their approaches. But since they consider words which not in the original query, the expansion may include peripheral words and causes query drift PhRank also uses PRF, but uses them for in-query term selection. Each indicate term include 1-3 words, and ranked with score which from a co-occurrence graph Here we list advantages of PhRank It’s the first method to use PRF for in-query term selection Only small number of terms are selected, so it retaining the flexibility for more or longer terms if required The affinity graph captures aspects of both syntactic and non-syntactic word associations

4 Related Work Markov chain framework
The Markov chain framework uses the stationary distribution of a random walk over an affinity graph G to estimate the importance of vertices in the graph A random walk describes a succession of random or semi-random steps between vertices 𝑣 𝑖 and 𝑣 𝑗 in 𝐺 If we define transition probability between 𝑣 𝑖 and 𝑣 𝑗 as ℎ 𝑖𝑗 , and 𝜋 𝑗 𝑡 as affinity score of 𝑣 𝑗 at time t, then 𝜋 𝑗 𝑡+1 is the sum of scores for each 𝑣 𝑖 connect to 𝑣 𝑗

5 Related Work Sometimes 𝑣 𝑖 step to some 𝑣 𝑗 that may be unconnected, so we often define a minimum probability 𝑢=1/𝑛, where 𝑛 is the number of vertices in 𝐺 then we uses a factor 𝛼 to control the balance between transition probability and minimum probability

6 Principle for Term Selection
For an informative word Is informative relative to a query:a word should represent the meaning of query, but query usually doesn’t have enough information. PRF is used to enhancing a query representation Is related to other informative words:The Association Hypothesis states that, “if one index term is good at discriminating relevant from non-relevant documents, then any closely associated index term is also likely to be good at this”. With a affinity graph, we can get the information above by estimate the number of word connects to a target word and the value

7 Principle for Term Selection
For a informative term Contains informative words:We deduce all terms must contain informative words, so we consider individual words when ranking terms Is discriminative in retrieval collection:A term that occurs many times within a small number of documents gives a pronounced relevance signal. So we weights terms with a normalized tf.idf inspired weight

8 The PhRank Algorithm Graph construction
For a query, we first retrieve top 𝑘 documents. Then we define set 𝑁 as set of query itself and its relevant documents Do stemming for documents in 𝑁. Each unique word is now a vertex in graph 𝐺 Edges between vertices 𝑣 𝑖 and 𝑣 𝑗 are connected if word 𝑖 and 𝑗 is adjacent in 𝑁 Edge weights Transition probability is based on linear combination of word 𝑖 and 𝑗 co-occur in window size of 2 and 10

9 The PhRank Algorithm Edge weights are defined by
𝑝 𝑑 𝑘 𝑄 is the probability of document in which word 𝑖 and 𝑗 co-occur given 𝑄, and 𝑐 𝑖𝑗 𝑤 2 and 𝑐 𝑖𝑗 𝑤 10 is the count of co-occur in window 2 and 10 𝑟 is the 𝑖𝑑𝑓 style weight confirms importance between 𝑖 and 𝑗 in 𝑁

10 The PhRank Algorithm Random walk
A random walk of 𝐺 is proceed as we represent in related work The edge weights are normalized to sum to one The iteration stopped when the difference between any vertex dies not exceed Vertex weights The word are also weighted to exhaustiveness represent the query. Some words like “make ” would high score in affinity graph, but it is not more informative

11 The PhRank Algorithm We define 𝑠 as factor to balance exhaustively with global saliency to identify stems that are poor discriminators been relevant and non-relevant documents For a word 𝑤 𝑛 , 𝑠 𝑤 𝑛 = 𝑤 𝑛 𝑓 𝑎𝑣𝑔 ∗ 𝑖𝑑𝑓 𝑤 𝑛 𝑤 𝑛 𝑓 𝑎𝑣𝑔 is the frequency of 𝑤 𝑛 in 𝑁, and 𝑖𝑑𝑓 𝑤 𝑛 is 𝑖𝑑𝑓 of 𝑤 𝑛 in 𝑁

12 The PhRank Algorithm Term ranking
For a term 𝑥, Factor 𝑧 represents the degree to which the term is discriminative in a collection. 𝑧 is defined by 𝑥 𝑒 is the frequency of words in 𝑥 co-occur in 4*number of term window in collection, 𝑖𝑑𝑓 𝑥 𝑒 defined just like 𝑖𝑑𝑓 𝑤 𝑛 , and 𝑙 𝑥 = 𝑥 𝑥 Finally, the rank of a term 𝑥 for 𝑄is defined as

13 The PhRank Algorithm After finish the rank, we still have some terms that includes uninformative words. This is because we rank terms by the whole score, so some terms would contain the similar words and decrease the diversity We apply a simple filtering with top-down constraints For term 𝑥 , If a higher rank term contains all words in 𝑥 or 𝑥 contains all words in higher rank term, we discard 𝑥

14 Evaluation Framework Robustness Precision Succinctness
Compare with sequential dependence of Markov random field model. This model uses linear combine for query likelihood, 2 and 8 window sized bigram Precision The subset distribution model achieves high mean average precision Succinctness We use Key Concepts as the succinctness approach. This approach linear combined bag-of-words query representation and weighted bag-of-words query representation

15 Evaluation Framework Word dependence
We refers four models of phrase belief as the figure

16 Experiments We use Indri on Robust04, WT10G and GOV2 for evaluate
Feature analysis Here we list the results of using the features in PhRank

17 Experiments

18 Experiments Compare with other model

19 Conclusion PhRank is a novel method to select succinct term within a query which works on Markov chain frameworks Although the term is succinct, but its risky strategy and causes the decreasing of mAP compared with sequential dependence


Download ppt "Compact Query Term Selection Using Topically Related Text"

Similar presentations


Ads by Google