Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato
Ad-hoc Information Retrieval Ad-hoc Information Retrieval (IR) forms the basic task in IR: Given a query, retrieve and rank documents in a collection Origins: Cranfield 1 ( ), Cranfield 2 ( ), SMART ( ) Major evaluations: TREC Ad-hoc ( ), TREC Robust ( ), CLEF ( ), INEX ( ), NTCIR ( ), FIRE ( )
Illusionary Progress in Ad-hoc IR TREC ad-hoc evaluations stopped in 1999, as progress plateaued More diverse tasks became the foci of research “There is little evidence of improvement in ad-hoc retrieval technology over the past decade” (Armstrong et al. 2009) Weak baselines, non-cumulative improvements ⟶ “no way of using LSI achieves a worthwhile improvement in retrieval accuracy over BM25” (Atreya & Elkan, 2010) ⟶ “there remains very little room for improvement in ad hoc search” (Trotman & Keeler, 2011)
Progress in Language Models for IR? Language Models (LM) form one of the main approaches to IR Many improvements to LMs not adopted generally or evaluated systematically TF-IDF feature weighting Pitman-Yor Process smoothing Feedback models Are these improvements consistent across standard datasets, cumulative, and do they improve on a strong baseline?
Query Likelihood Language Models
Query Likelihood Language Models 2
Pitman-Yor Process Smoothing Standard methods for smoothing in IR LMs are Dirichlet Prior (DP) and 2-Stage Smoothing (2SS) (Zhai & Lafferty 2004, Smucker & Allan 2007) Recent suggested improvement is Pitman-Yor Process smoothing (PYP), an approximation to inference on a Pitman-Yor Process (Momtazi & Klakow 2010, Huang & Renals 2010) All methods interpolate unsmoothed parameters with a background distribution. PYP additionally discounts the unsmoothed counts
Pitman-Yor Process Smoothing 2 All methods share the form: DP: 2SS: PYP:, and
Pitman-Yor Process Smoothing 2 All methods share the form: DP: 2SS: PYP:, and,
Pitman-Yor Process Smoothing 3
TF-IDF Feature Weighting Multinomial modelling assumptions of text can be corrected with TF-IDF weighting (Rennie et al. 2003, Frank & Bouckaert 2006) Traditional view: IDF-weighting unnecessary with IR LMs (Zhai & Lafferty 2004) Recent view: combination is complementary (Smucker & Allan 2007, Momtazi et al. 2010)
TF-IDF Feature Weighting 2
TF-IDF Feature Weighting 3 IDF has a overlapping function to collection smoothing (Hiemstra & Kraaij 1998) Interaction taken into account by replacing collection model by a uniform model in smoothing:
Model-based Feedback
Model-based Feedback 2
Model-based Feedback 3
Experimental Setup
Experimental Setup 2
Results Significant differences: PYP > DP PYP+TI > 2SS PYP+TI+FB > PYP+TI PYP+TI+FB improves on 2SS by 4.07 absolute, a 17.1% relative improvement
Discussion The 3 evaluated improvements in language models for IR: require little additional computation can be implemented with small modifications to existing IR systems are substantial, significant and cumulative across 13 standard datasets, compared to DP and 2SS baselines (4.07 absolute, 17.1% relative) Improvements requiring more computation possible document neighbourhood smoothing, word correlation models, passage- based LMs, bigram LMs, … More extensive evaluations needed for confirming progress