1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David.

1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst)

2 Information Overflow Web Site Growth

3 Text Retrieval (TR) Retrieval System User “Tips on thesis defense” query relevant docs database/collection text docs

4 Challenges in TR (independent,topical) Relevance Ad hoc parameter tuning Utility

5 Sophisticated Parameter Tuning in the Okapi System (Robertson et al. 1999) “k 1, b and k 3 are parameters which depend on the nature of the queries and possibly on the database; k 1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k 3 is often set to 7 or 1000 (effectively infinite).”

6 More Than “Relevance” Relevance Ranking Desired Ranking Redundancy Readability

7 Meeting the Challenges Bayesian Decision Theory Statistical Language Models Risk Minimization Framework Utility-based Retrieval Parameter Estimation

8 Map of Thesis Risk Minimization Framework Two-stage Language Model Automatic parameter setting KL-divergence Retrieval Model Aspect Retrieval Model Natural incorporation of feedback Non-traditional ranking New TR FrameworkNew TR ModelsFeatures

9 Retrieval as Decision-Making Unordered subset ? Clustering ? Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? (  ) Choose: (D,  ) Query … Ranked list ? 1234

10 Generative Model of Document & Query observed Partially observed U User S Source inferred d Document q Query

11 Bayesian Decision Theory Choice : (D 1,  1 ) Choice : (D 2,  2 ) Choice: (D n,  n )... query q user U doc set C source S qq 11 NN hiddenobservedloss Bayes risk for choice (D,  ) RISK MINIMIZATION Loss L

12 Special Cases Set-based models (choose D) Ranking models (choose  ) –Independent loss (  PRP) Relevance-based loss Distance-based loss –Dependent loss MMR loss MDR loss Boolean model Probabilistic relevance model Vector-space Model Aspect retrieval model Two-stage LM KL-divergence model

13 Map of Existing TR Models Relevance  (R(q), R(d)) Similarity P(r=1|q,d) r  {0,1} Probability of Relevance P(d  q) or P(q  d) Probabilistic inference Different rep & similarity Vector space model (Salton et al., 75) Prob. distr. model (Wong & Yao, 89) … Generative Model Regression Model (Fox 83) Classical prob. Model (Robertson & Sparck Jones, 76) Doc generation Query generation LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a) Prob. concept space model (Wong & Yao, 95) Different inference system Inference network model (Turtle & Croft, 91)

14 Where Are We? Risk Minimization Framework Two-stage Language Model KL-divergence Retrieval Model Aspect Retrieval Model

15 Two-stage Language Models U Sd q Loss functionRisk ranking formula Stage 1: compute Stage 1 Stage 2: compute Stage 2 (Dirichlet prior smoothing) (Mixture model) Two-stage smoothing

16 The Need of Query-Modeling (Dual-Role of Smoothing) Verbose queries Keyword queries

17 Interaction of the Two Roles of Smoothing

18 Two-stage Smoothing c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet prior(Bayesian)  (1- )+ p(w|U) Stage-2 -Explain noise in query -2-component mixture

19 Estimating  using leave-one-out P(w 1 |d - w 1 ) P(w 2 |d - w 2 ) log-likelihood Maximum Likelihood Estimator Newton’s Method Leave-one-out w1w1 w2w2 P(w n |d - w n ) wnwn...

21 Automatic 2-stage results  Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)

23 KL-divergence Retrieval Models U Sd q Loss functionRisk ranking formula

24 Expansion-based vs. Model-based Document D Results Feedback Docs Doc model Scoring Query Q Document D Query Q Feedback Docs Results Expansion-based Feedback modify Model-based Feedback Query model Query likelihood KL-divergence

25 Feedback as Model Interpolation Query Q Document D Results Feedback Docs F={d 1, d 2, …, d n } Generative model Divergence minimization  =0 No feedback  =1 Full feedback

26  F Estimation Method I: Generative Mixture Model w w F={d 1, …, d n } Maximum Likelihood P(w|  ) P(w| C) 1- P(source) Background words Topic words

27  F Estimation Method II: Empirical Divergence Minimization d1d1 F={d 1, …, d n } dndn  close Empirical divergence Divergence minimization far ( ) C Background model

28 Example of Feedback Query Model Trec topic 412: “airport security” =0.9 =0.7 Mixture model approach Web database Top 10 docs

29 Model-based feedback vs. Simple LM

31 Aspect Retrieval Query: What are the applications of robotics in the world today? Find as many DIFFERENT applications as possible. Example Aspects: A 1 : spot-welding robotics A 2 : controlling inventory A 3 : pipe-laying robots A 4 : talking robot A 5 : robots for loading & unloading memory tapes A 6 : robot [telephone] operators A 7 : robot cranes … Aspect judgments A 1 A 2 A 3 …... A k d 1 1 1 0 0 … 0 0 d 2 0 1 1 1 … 0 0 d 3 0 0 0 0 … 1 0 …. d k 1 0 1 0... 0 1

32 Evaluation Measures Aspect Coverage (AC): measures per-doc coverage – #distinct-aspects/#docs –Equivalent to the “set cover” problem, NP-hard Aspect Uniqueness(AU): measures redundancy –#distinct-aspects/#aspects –Equivalent to the “volume cover” problem, NP-hard Examples 00010010001001 01011000101100 10001011000101 …... d1d1 d3d3 d2d2 #doc 1 2 3 … … #asp 2 5 8 … … #uniq-asp 2 4 5 AC: 2/1=2.0 4/2=2.0 5/3=1.67 AU: 2/2=1.0 4/5=0.8 5/8=0.625 Accumulated counts

33 Loss Function L(  k+1 |  1 …  k ) d1d1 dkdk ? d k+1 …  1  k  k+1 known Novelty/Redundancy Nov (  k+1 |  1 …  k ) Relevance Rel(  k+1 ) Maximal Marginal Relevance (MMR) The best d k+1 is novel & relevant  1  k  k+1 Maximal Diverse Relevance (MDR) Aspect Coverage Distrib. p(a|  i ) The best d k+1 is complementary in coverage

34 Maximal Marginal Relevance (MMR) Models Maximizing aspect coverage indirectly through redundancy elimination Elements –Redundancy/Novelty measure –Combination of novelty and relevance Proposed & studied six novelty measures Proposed & studied four combination strategies

35 Comparison of Novelty Measures (Aspect Coverage)

36 Comparison of Novelty Measures (Aspect Uniqueness)

37 A Mixture Model for Redundancy P(w|Background) Collection P(w|Old) Ref. document 1- =? Maximum Likelihood Expectation-Maximization

38 Cost-based Combination of Relevance and Novelty Relevance scoreNovelty score

39 Maximal Diverse Relevance (MDR) Models Maximizing aspect coverage directly through aspect modeling Elements –Aspect loss function –Generative Aspect Model Proposed & studied KL-divergence aspect loss function Explored two aspect models (PLSI, LDA)

40 Aspect Generative Model of Document & Query  U User q Query S Source d Document =(  1,…,  k ) PLSI: LDA:

41 Aspect Loss Function  Uq S d

42 Aspect Loss Function: Illustration Desired coverage p(a|  Q ) “Already covered” p(a|  1 )... p(a|  k -1 ) New candidate p(a|  k ) non-relevant redundant perfect Combined coverage

43 Preliminary Evaluation: MMR vs. MDR On the relevant data set, both MMR and MDR are effective, but they complement each other - MMR improves AU more than AC - MDR improves AC more than AU On the mixed data set, however, - MMR is only effective when relevance ranking is accurate - MDR improves AC, even though relevance ranking is degraded.

44 Further Work is Needed Controlled experiments with synthetic data –Level of redundancy –Density of relevant documents –Per-document aspect counts Alternative loss functions Aspect language models, especially along the line of LDA –Aspect-based feedback

45 Summary of Contributions Two-stage Language Model KL-divergence Retrieval Model Aspect Retrieval Model New TR Models Risk Minimization Framework New TR Framework Unifies existing models Incorporates LMs Serves as a map for exploring new models Specific Contributions Empirical study of smoothing (dual role of smoothing) New smoothing method (two-stage smoothing) Automatic parameter setting (leave-one-out, mixture) Query/document distillation Feedback with LMs (mixture model & div. min.) Evaluation criteria (AC, AU) Redundancy/novelty measures (mixture weight) MMR with LMs (cost-comb.) Aspect-based loss function (“collective KL-div”)

46 Future Research Directions Better Approximation of the risk integral More effective LMs for “traditional” retrieval –Can we beat TF-IDF without increasing computational complexity? –Automatic parameter setting, especially for feedback models –Flexible passage retrieval, especially with HMM –Beyond unigrams (more linguistics)

47 More Future Research Directions Aspect Retrieval Models –Document structure/sub-topic modeling – Aspect-based feedback Interactive information retrieval models –Risk minimization for information filtering –Personalized & context-sensitive retrieval

48 Thank you!

1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David.

Similar presentations

Presentation on theme: "1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David.

Similar presentations

Presentation on theme: "1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David."— Presentation transcript:

Similar presentations

About project

Feedback