A Markov Random Field Model for Term Dependencies Hongyu Li & Chaorui Chang
Background Dependencies exist between terms in a collection of text Estimating statistical models for general term dependencies is infeasible due to data sparsity Most work on modeling term dependencies in the past has focused on phrases proximity or term co-occurrences
Hypothesis and Solution Dependence models will be more effective for larger collections than smaller collections Incorporating several types of evidence into a dependence model will further improve effectiveness Introducing Markov Random Field
Markov Random Field Also called undirected graph models , model joint distributions In the paper used to model the joint distribution 𝑃 Λ 𝑄,𝐷 over queries Q and documents D Assume graph G consists of query nodes qi and a document node D Joint distribution is defined by 𝑃 Λ 𝑄,𝐷 = 1 𝑍 Λ 𝑐∈𝐶(𝐺) 𝜓(𝑐;Λ) Where C(G) is the set of cliques in G, each 𝜓(.;Λ) is a non-negative potential function
3 variants of MRF model Full independence Query terms qi are independent Sequential dependence Dependence between neighboring query terms Full dependence All query terms are in some way dependent
Potential functions Potential function for 2-clique 𝜓 𝑇 𝑐 = 𝜆 𝑇 log 𝑃 𝑞 𝑖 𝐷 = 𝜆 𝑇 log [ 1− 𝛼 𝐷 𝑡𝑓 𝑞 𝑖 ,𝐷 𝐷 + 𝛼 𝐷 𝑐𝑓 𝑞 𝑖 |𝐶| ] Contiguously sets of query terms within the clique 𝜓 𝑂 𝑐 = 𝜆 𝑂 log 𝑃 #1(𝑞 𝑖 ,…, 𝑞 𝑖+𝑘 ) 𝐷 = 𝜆 𝑂 log [ 1− 𝛼 𝐷 𝑡𝑓 #1(𝑞 𝑖 ,…, 𝑞 𝑖+𝑘 ),𝐷 𝐷 + 𝛼 𝐷 𝑐𝑓 #1(𝑞 𝑖 ,…, 𝑞 𝑖+𝑘 ) |𝐶| ] Non-contiguous sets of query terms 𝜓 𝑈 𝑐 = 𝜆 𝑈 log 𝑃 #𝑢𝑤N(𝑞 𝑖 ,…, 𝑞 𝑗 ) 𝐷 = 𝜆 𝑈 log [ 1− 𝛼 𝐷 𝑡𝑓 #𝑢𝑤N(𝑞 𝑖 ,…, 𝑞 𝑗 ),𝐷 𝐷 + 𝛼 𝐷 𝑐𝑓 #𝑢𝑤N(𝑞 𝑖 ,…, 𝑞 𝑗 ) |𝐶| ]
Ranking Define the ranking function 𝑃 Λ 𝐷|𝑄 = 𝑃 Λ 𝑄,𝐷 𝑃 Λ 𝑄 𝑟𝑎𝑛𝑘 log 𝑃 Λ 𝑄,𝐷 −𝑙𝑜𝑔 𝑃 Λ 𝑄 Potential function can be parameterized as 𝜓 𝑐;Λ =exp( 𝜆 𝑐 𝑓 𝑐 ) 𝑃 Λ 𝐷|𝑄 𝑟𝑎𝑛𝑘 𝑐∈𝐶(𝐺) 𝜆 𝑐 𝑓(𝑐) = 𝑐∈𝑇 𝜆 𝑇 𝑓 𝑇 (𝑐) + 𝑐∈𝑂 𝜆 𝑂 𝑓 𝑂 𝑐 + 𝑐∈𝑂∪𝑈 𝜆 𝑈 𝑓 𝑈 (𝑐)
Training Set parameter values (𝜆 𝑇 , 𝜆 𝑂 , 𝜆 𝑈 ) Train the model by directly maximizing mean average precision Ranking function is invariant to parameter scale, thus 𝜆 𝑇 + 𝜆 𝑂 + 𝜆 𝑈 =1 Example mean average precision surface for the GOV2 collection using the full dependence model
3.Experimental Results analyze the retrieval effectiveness across different collections Journal &Press :small homogeneous collections Web Collections: larger and less homogeneous
3.1 Full Independence variant the cliques are only members of the set T , and therefore we set 𝜆 𝑂 = 𝜆 𝑈 = 0, 𝜆 𝑇 = 1. Ranking function :
AvgP refers to mean average precision, P@10 is precision at 10 ranked documents, and µ is the smoothing parameter used. This results provide a baseline to compare the sequential and full dependence variants Full independence variant results.
3.2 Sequential Dependence variant Models of this form have cliques in T , O, and U. Ranking function : The unordered feature function, 𝑓 𝑈 , has a free parameter N that allows the size of the unordered window (scope of proximity) to vary. We explore window sizes of 2, sentence(8), 50, and “unlimited” to see what impact they have on effectiveness.
Show very little difference across the various window sizes. For the AP, WT10g, and GOV2 collection, the sentence-sized windows performed the best. For the WSJ collection, N = 2 performed the best. The sequential dependence variant outperforms the full independence variant sequential dependence variant results
3.3 Full Dependence variant Consists of cliques in T , O and U. ranking function : We set the parameter N in the feature function 𝑓 𝑈 to be four times the number of query terms in the clique c. We analyze the impact ordered and unordered window feature functions have on effectiveness.
AP collection, there is very little difference . The results for the WSJ collection the ordered features produce a clear improvement over the unordered features, but there is very little difference between using ordered features and the combination of ordered and unordered. The results for the two web collections, WT10g and GOV2, are similar. In both, unordered features perform better than ordered features, but the combination of both ordered and unordered features led to noticeable improvements in mean average precision. full dependence variant results
Strict matching via ordered window features is more important for the smaller newswire collections, due to the homogeneous, clean nature of the documents For the web collections, the opposite is true.
4. CONCLUSIONS Three dependence model variants are described, where each captures different dependencies between query terms. Modeling dependencies can significantly improve retrieval effectiveness across a range of collections. Possible future work includes exploring a wider range of potential functions, applying the model to other retrieval tasks and so on.
thank you