A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao, and ChengXiang Zhai University of Illinois at Urbana Champaign SIGIR 2004 (Best paper.
Statistical Translation Language Model Maryam Karimzadehgan University of Illinois at Urbana-Champaign 1.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato.
Hierarchical Dirichlet Trees for Information Retrieval Gholamreza Haffari Simon Fraser University Yee Whye Teh University College London NAACL talk, Boulder,
Information Retrieval Models: Probabilistic Models
Mixture Language Models and EM Algorithm
Maggie Zhou COMP 790 Data Mining Seminar, Spring 2011
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Language Modeling Approaches for Information Retrieval Rong Jin.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Generating Impact-Based Summaries for Scientific Literature Qiaozhu Mei, ChengXiang Zhai University of Illinois at Urbana-Champaign 1.
Language Models for IR Debapriyo Majumdar Information Retrieval Indian Statistical Institute Kolkata Spring 2015 Credit for several slides to Jimmy Lin.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing.
2009 © Qiaozhu Mei University of Illinois at Urbana-Champaign Contextual Text Mining Qiaozhu Mei University of Illinois at Urbana-Champaign.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Automatic Labeling of Multinomial Topic Models
Information Retrieval Models: Vector Space Models
Relevance Feedback Hongning Wang
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David.
A Study of Poisson Query Generation Model for Information Retrieval
Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction Anna Shtok and Oren Kurland and David Carmel SIGIR 2010 Hao-Chin.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Exploring Social Tagging Graph for Web Object Classification
Information Retrieval Models: Language Models
A Formal Study of Information Retrieval Heuristics
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
Information Retrieval Models: Probabilistic Models
Compact Query Term Selection Using Topically Related Text
Language Models for Information Retrieval
Murat Açar - Zeynep Çipiloğlu Yıldız
John Lafferty, Chengxiang Zhai School of Computer Science
Language Model Approach to IR
Junghoo “John” Cho UCLA
Learning to Rank Typed Graph Walks: Local and Global Approaches
INF 141: Information Retrieval
Language Models for TR Rong Jin
Presentation transcript:

A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign

Kullback-Leibler Divergence Retrieval Method Document d A text mining paper data mining Doc Language Model (LM) θ d : p(w|  d ) text 4/100=0.04 mining 3/100=0.03 clustering 1/100=0.01 … data = 0 computing = 0 … Query q Data ½=0.5 Mining ½=0.5 Query Language Model θ q : p(w|  q ) Data ½=0.4 Mining ½=0.4 Clustering =0.1 … ? p(w|  q’ ) text =0.039 mining =0.028 clustering =0.01 … data = computing = … Similarity function Smoothed Doc LM θ d' : p(w|  d’ ) 2

Smoothing a Document Language Model 3 Retrieval performance  estimate LM  smoothing LM text 4/100 = 0.04 mining 3/100 = 0.03 Assoc. 1/100 = 0.01 clustering 1/100=0.01 … data = 0 computing = 0 … text = mining = Assoc. = clustering =0.01 … data = computing = … Assign non-zero prob. to unseen words Estimate a more accurate distribution from sparse data text = mining = Assoc. = clustering =0.01 … data = computing = …

Previous Work on Smoothing d Collection d Clusters d Nearest Neighbors Interpolate MLE with Reference LM Estimate a Reference language model θ ref using the collection (corpus) [Ponte & Croft 98] [Liu & Croft 04] [Kurland& Lee 04] 4

Problems of Existing Methods Smoothing with Global Background –Ignoring collection structure Smoothing with Document Clusters –Ignoring local structures inside cluster Smoothing using Neighbor Documents –Ignoring global structure Different heuristics on θ ref and interpolation –No clear objective function for optimization –No guidance on how to further improve the existing methods 5

Research Questions What is the right corpus structure to use? What are the criteria for a good smoothing method? – Accurate language model? What are we ending up optimizing? Could there be a general optimization framework? 6

Our Contribution Formulation of smoothing as optimization over graph structures A general optimization framework for smoothing both document LMs and query LMs Novel instantiations of the framework lead to more effective smoothing methods 7

A Graph-based Formulation of Smoothing A novel and general view of smoothing 8 d P(w|d): MLE P(w|d): Smoothed P(w|d) = Surface on top of the Graph projection on a plain Smoothed LM = Smoothed Surface! Collection = Graph (of Documents) Collection P(w|d 1 ) P(w|d 2 ) d1d1 d2d2

Covering Existing Models 9 d C1C1 C2C2 C3C3 C4C4 Background Smoothing with Graph Structure Smoothing with Nearest Neighbor - Local Graph Smoothing with Document Clusters - Forest w/ Pseudo docs Smoothing with Global Background - Star graph Collection = Graph Smoothed LM = Smoothed Surfaces

Instantiations of the Formulation 10 Language Models to be Smoothed Types of GraphsDocument LMQuery LM Star Graph w/ Background Node [Ponte & Croft 98], [Hiemstra & Kraaij 98], [Miller et al. 99], [ Zhai & Lafferty 01]… N/A Forest w/ Cluster roots [Liu and Croft 04] N/A Local kNN graph [Kurland and Lee 04] [Tao et al. 06] N/A Document Similarity GraphNovelN/A Word Similarity GraphNovel Other graphs??? Document Graphs

Smoothing over Word Graphs w P(w u |d)/Deg(u) Smoothed LM = Smoothed Surface! Similarity graph of words Given d, {P(w|d)} = Surface over the word graph! P(w u |d) P(w v |d) 11

The General Objective of Smoothing 12 Fidelity to MLE Smoothness of the surface Importance of vertices - Weights of edges (1/dist.)

The Optimization Framework 13 Criteria: –Fidelity: keep close to the MLE –Surface Smoothness: local and global consistency –Constraint: Unified optimization objective: Fidelity to MLESmoothness of the surface

The Procedure of Smoothing 14 Construct a document/word graph; d Iterative updating Define reasonable w(u) and w(u,v); Additional Dirichlet Smoothing Define reasonable f u Define graph Define surfaces Smooth surfaces

Smoothing Language Models using a Document Graph 15 Construct a kNN graph of documents; d w(u): Deg(u) w(u,v): cosine Additional Dirichlet Smoothing f u = p(w|d u ); or f u = s(q, d u ); Document language model: Alternative: Document relevance score : e.g., (Diaz 05)

Smoothing Language Models using a Word Graph 16 Construct a kNN graph of words; w w(u): Deg(u) w(u,v): PMI Additional Dirichlet Smoothing fu=fu= Document language model: Query Language Model

Intuitive Interpretation – Smoothing using Word Graph 17 w Stationary distribution of a Markov Chain w Writing a document = random walk on the word Markov chain; write down w whenever passing w

Intuitive Interpretation – Smoothing using Document Graph d d 10 Absorption Probability to the “1” state Writing a word w in a document = random walk on the doc Markov chain; write down w if reaching “1” Act as neighbors do 18

Experiments Data Sets # docs Avg doc length queries# relevant docs AP k LA132k SJMN90k TREC8528k Liu and Croft ’04 Tao ’06 Smooth Document LM on Document Graph (DMDG) Smooth Document LM on Word Graph (DMWG) Smooth relevance Score on Document Graph (DSDG) Smooth Query LM on word graph (QMWG) Evaluate using MAP

Effectiveness of the Framework 20 Data SetsDirichletDMDGDMWG † DSDGQMWG AP *** (+17.1%) *** (+16.1%) *** (+10.1%) (+10.1%) LA ** (+4.5%) ** (+4.5%) ** (+1.6%) SJMN *** (+13.2%) *** (+12.3%) *** (+10.3%) (+7.4%) TREC *** (+5.4%) ** (+5.4%) (+1.6%) (+1.2%) † DMWG: reranking top 3000 results. Usually this yields to a reduced performance than ranking all the documents Wilcoxon test: *, **, *** means significance level 0.1, 0.05, 0.01 Graph-based smoothing >> Baseline Smoothing Doc LM >> relevance score >> Query LM

Comparison with Existing Models 21 Data Sets CBDM (Liu and Croft) DELM (Tao et al.) DMDG (1 iteration) AP LA SJMN TREC8N/A Graph-based smoothing > state-of-the-art More iterations > Single iteration (similar to DELM)

Combined with Pseudo-Feedback 22 Data SetsFBFB+QMWG AP LA SJMN TREC Data SetsDMWGFBFB+DMWG AP ** LA ** SJMN ** TREC *** d q w smooth w rerank Top docs

Related Work Language modeling in Information Retrieval; smoothing using collection model –(Ponte & Croft 98); (Hiemstra & Kraaij 98); (Miller et al. 99); (Zhai & Lafferty 01), etc. Smoothing using corpus structures –Cluster structure: (Liu & Croft 04), etc. –Nearest Neighbors: (Kurland & Lee 04), (Tao et al. 06) Relevance score propagation (Diaz 05), (Qin et al. 05) Graph-based learning –(Zhu et al. 03); (Zhou et al. 04), etc. 23

Conclusions Smoothing language models using document/word graphs A general optimization framework –Various effective instantiations Improved performance over state-of-the-art Future Work: –Combine document graphs with word graphs –Study alternative ways of constructing graphs 24

Thanks! 25

Parameter Tuning 26 Fast Convergence