A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign
Kullback-Leibler Divergence Retrieval Method Document d A text mining paper data mining Doc Language Model (LM) θ d : p(w| d ) text 4/100=0.04 mining 3/100=0.03 clustering 1/100=0.01 … data = 0 computing = 0 … Query q Data ½=0.5 Mining ½=0.5 Query Language Model θ q : p(w| q ) Data ½=0.4 Mining ½=0.4 Clustering =0.1 … ? p(w| q’ ) text =0.039 mining =0.028 clustering =0.01 … data = computing = … Similarity function Smoothed Doc LM θ d' : p(w| d’ ) 2
Smoothing a Document Language Model 3 Retrieval performance estimate LM smoothing LM text 4/100 = 0.04 mining 3/100 = 0.03 Assoc. 1/100 = 0.01 clustering 1/100=0.01 … data = 0 computing = 0 … text = mining = Assoc. = clustering =0.01 … data = computing = … Assign non-zero prob. to unseen words Estimate a more accurate distribution from sparse data text = mining = Assoc. = clustering =0.01 … data = computing = …
Previous Work on Smoothing d Collection d Clusters d Nearest Neighbors Interpolate MLE with Reference LM Estimate a Reference language model θ ref using the collection (corpus) [Ponte & Croft 98] [Liu & Croft 04] [Kurland& Lee 04] 4
Problems of Existing Methods Smoothing with Global Background –Ignoring collection structure Smoothing with Document Clusters –Ignoring local structures inside cluster Smoothing using Neighbor Documents –Ignoring global structure Different heuristics on θ ref and interpolation –No clear objective function for optimization –No guidance on how to further improve the existing methods 5
Research Questions What is the right corpus structure to use? What are the criteria for a good smoothing method? – Accurate language model? What are we ending up optimizing? Could there be a general optimization framework? 6
Our Contribution Formulation of smoothing as optimization over graph structures A general optimization framework for smoothing both document LMs and query LMs Novel instantiations of the framework lead to more effective smoothing methods 7
A Graph-based Formulation of Smoothing A novel and general view of smoothing 8 d P(w|d): MLE P(w|d): Smoothed P(w|d) = Surface on top of the Graph projection on a plain Smoothed LM = Smoothed Surface! Collection = Graph (of Documents) Collection P(w|d 1 ) P(w|d 2 ) d1d1 d2d2
Covering Existing Models 9 d C1C1 C2C2 C3C3 C4C4 Background Smoothing with Graph Structure Smoothing with Nearest Neighbor - Local Graph Smoothing with Document Clusters - Forest w/ Pseudo docs Smoothing with Global Background - Star graph Collection = Graph Smoothed LM = Smoothed Surfaces
Instantiations of the Formulation 10 Language Models to be Smoothed Types of GraphsDocument LMQuery LM Star Graph w/ Background Node [Ponte & Croft 98], [Hiemstra & Kraaij 98], [Miller et al. 99], [ Zhai & Lafferty 01]… N/A Forest w/ Cluster roots [Liu and Croft 04] N/A Local kNN graph [Kurland and Lee 04] [Tao et al. 06] N/A Document Similarity GraphNovelN/A Word Similarity GraphNovel Other graphs??? Document Graphs
Smoothing over Word Graphs w P(w u |d)/Deg(u) Smoothed LM = Smoothed Surface! Similarity graph of words Given d, {P(w|d)} = Surface over the word graph! P(w u |d) P(w v |d) 11
The General Objective of Smoothing 12 Fidelity to MLE Smoothness of the surface Importance of vertices - Weights of edges (1/dist.)
The Optimization Framework 13 Criteria: –Fidelity: keep close to the MLE –Surface Smoothness: local and global consistency –Constraint: Unified optimization objective: Fidelity to MLESmoothness of the surface
The Procedure of Smoothing 14 Construct a document/word graph; d Iterative updating Define reasonable w(u) and w(u,v); Additional Dirichlet Smoothing Define reasonable f u Define graph Define surfaces Smooth surfaces
Smoothing Language Models using a Document Graph 15 Construct a kNN graph of documents; d w(u): Deg(u) w(u,v): cosine Additional Dirichlet Smoothing f u = p(w|d u ); or f u = s(q, d u ); Document language model: Alternative: Document relevance score : e.g., (Diaz 05)
Smoothing Language Models using a Word Graph 16 Construct a kNN graph of words; w w(u): Deg(u) w(u,v): PMI Additional Dirichlet Smoothing fu=fu= Document language model: Query Language Model
Intuitive Interpretation – Smoothing using Word Graph 17 w Stationary distribution of a Markov Chain w Writing a document = random walk on the word Markov chain; write down w whenever passing w
Intuitive Interpretation – Smoothing using Document Graph d d 10 Absorption Probability to the “1” state Writing a word w in a document = random walk on the doc Markov chain; write down w if reaching “1” Act as neighbors do 18
Experiments Data Sets # docs Avg doc length queries# relevant docs AP k LA132k SJMN90k TREC8528k Liu and Croft ’04 Tao ’06 Smooth Document LM on Document Graph (DMDG) Smooth Document LM on Word Graph (DMWG) Smooth relevance Score on Document Graph (DSDG) Smooth Query LM on word graph (QMWG) Evaluate using MAP
Effectiveness of the Framework 20 Data SetsDirichletDMDGDMWG † DSDGQMWG AP *** (+17.1%) *** (+16.1%) *** (+10.1%) (+10.1%) LA ** (+4.5%) ** (+4.5%) ** (+1.6%) SJMN *** (+13.2%) *** (+12.3%) *** (+10.3%) (+7.4%) TREC *** (+5.4%) ** (+5.4%) (+1.6%) (+1.2%) † DMWG: reranking top 3000 results. Usually this yields to a reduced performance than ranking all the documents Wilcoxon test: *, **, *** means significance level 0.1, 0.05, 0.01 Graph-based smoothing >> Baseline Smoothing Doc LM >> relevance score >> Query LM
Comparison with Existing Models 21 Data Sets CBDM (Liu and Croft) DELM (Tao et al.) DMDG (1 iteration) AP LA SJMN TREC8N/A Graph-based smoothing > state-of-the-art More iterations > Single iteration (similar to DELM)
Combined with Pseudo-Feedback 22 Data SetsFBFB+QMWG AP LA SJMN TREC Data SetsDMWGFBFB+DMWG AP ** LA ** SJMN ** TREC *** d q w smooth w rerank Top docs
Related Work Language modeling in Information Retrieval; smoothing using collection model –(Ponte & Croft 98); (Hiemstra & Kraaij 98); (Miller et al. 99); (Zhai & Lafferty 01), etc. Smoothing using corpus structures –Cluster structure: (Liu & Croft 04), etc. –Nearest Neighbors: (Kurland & Lee 04), (Tao et al. 06) Relevance score propagation (Diaz 05), (Qin et al. 05) Graph-based learning –(Zhu et al. 03); (Zhou et al. 04), etc. 23
Conclusions Smoothing language models using document/word graphs A general optimization framework –Various effective instantiations Improved performance over state-of-the-art Future Work: –Combine document graphs with word graphs –Study alternative ways of constructing graphs 24
Thanks! 25
Parameter Tuning 26 Fast Convergence