Download presentation
Presentation is loading. Please wait.
Published byEvan Simpson Modified over 8 years ago
1
1 Risk Minimization and Language Modeling in Text Retrieval ChengXiang Zhai Thesis Committee: John Lafferty (Chair), Jamie Callan Jaime Carbonell David A. Evans W. Bruce Croft (Univ. of Massachusetts, Amherst)
2
2 Information Overflow Web Site Growth
3
3 Text Retrieval (TR) Retrieval System User “Tips on thesis defense” query relevant docs database/collection text docs
4
4 Challenges in TR (independent,topical) Relevance Ad hoc parameter tuning Utility
5
5 Sophisticated Parameter Tuning in the Okapi System (Robertson et al. 1999) “k 1, b and k 3 are parameters which depend on the nature of the queries and possibly on the database; k 1 and b default to 1.2 and 0.75 respectively, but smaller values of b are sometimes advantageous; in long queries k 3 is often set to 7 or 1000 (effectively infinite).”
6
6 More Than “Relevance” Relevance Ranking Desired Ranking Redundancy Readability
7
7 Meeting the Challenges Bayesian Decision Theory Statistical Language Models Risk Minimization Framework Utility-based Retrieval Parameter Estimation
8
8 Map of Thesis Risk Minimization Framework Two-stage Language Model Automatic parameter setting KL-divergence Retrieval Model Aspect Retrieval Model Natural incorporation of feedback Non-traditional ranking New TR FrameworkNew TR ModelsFeatures
9
9 Retrieval as Decision-Making Unordered subset ? Clustering ? Given a query, - Which documents should be selected? (D) - How should these docs be presented to the user? ( ) Choose: (D, ) Query … Ranked list ? 1234
10
10 Generative Model of Document & Query observed Partially observed U User S Source inferred d Document q Query
11
11 Bayesian Decision Theory Choice : (D 1, 1 ) Choice : (D 2, 2 ) Choice: (D n, n )... query q user U doc set C source S qq 11 NN hiddenobservedloss Bayes risk for choice (D, ) RISK MINIMIZATION Loss L
12
12 Special Cases Set-based models (choose D) Ranking models (choose ) –Independent loss ( PRP) Relevance-based loss Distance-based loss –Dependent loss MMR loss MDR loss Boolean model Probabilistic relevance model Vector-space Model Aspect retrieval model Two-stage LM KL-divergence model
13
13 Map of Existing TR Models Relevance (R(q), R(d)) Similarity P(r=1|q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Different rep & similarity Vector space model (Salton et al., 75) Prob. distr. model (Wong & Yao, 89) … Generative Model Regression Model (Fox 83) Classical prob. Model (Robertson & Sparck Jones, 76) Doc generation Query generation LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a) Prob. concept space model (Wong & Yao, 95) Different inference system Inference network model (Turtle & Croft, 91)
14
14 Where Are We? Risk Minimization Framework Two-stage Language Model KL-divergence Retrieval Model Aspect Retrieval Model
15
15 Two-stage Language Models U Sd q Loss functionRisk ranking formula Stage 1: compute Stage 1 Stage 2: compute Stage 2 (Dirichlet prior smoothing) (Mixture model) Two-stage smoothing
16
16 The Need of Query-Modeling (Dual-Role of Smoothing) Verbose queries Keyword queries
17
17 Interaction of the Two Roles of Smoothing
18
18 Two-stage Smoothing c(w,d) |d| P(w|d) = + p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet prior(Bayesian) (1- )+ p(w|U) Stage-2 -Explain noise in query -2-component mixture
19
19 Estimating using leave-one-out P(w 1 |d - w 1 ) P(w 2 |d - w 2 ) log-likelihood Maximum Likelihood Estimator Newton’s Method Leave-one-out w1w1 w2w2 P(w n |d - w n ) wnwn...
20
20 Estimating using Mixture Model query 11 NN... Maximum Likelihood Estimator Expectation-Maximization (EM) algorithm P(w|d 1 )d1d1 P(w|d N )dNdN …... Stage-1 (1- )p(w|d 1 )+ p(w|U) (1- )p(w|d N )+ p(w|U) Stage-2
21
21 Automatic 2-stage results Optimal 1-stage results Average precision (3 DB’s + 4 query types, 150 topics)
22
22 Where Are We? Risk Minimization Framework Two-stage Language Model KL-divergence Retrieval Model Aspect Retrieval Model
23
23 KL-divergence Retrieval Models U Sd q Loss functionRisk ranking formula
24
24 Expansion-based vs. Model-based Document D Results Feedback Docs Doc model Scoring Query Q Document D Query Q Feedback Docs Results Expansion-based Feedback modify Model-based Feedback Query model Query likelihood KL-divergence
25
25 Feedback as Model Interpolation Query Q Document D Results Feedback Docs F={d 1, d 2, …, d n } Generative model Divergence minimization =0 No feedback =1 Full feedback
26
26 F Estimation Method I: Generative Mixture Model w w F={d 1, …, d n } Maximum Likelihood P(w| ) P(w| C) 1- P(source) Background words Topic words
27
27 F Estimation Method II: Empirical Divergence Minimization d1d1 F={d 1, …, d n } dndn close Empirical divergence Divergence minimization far ( ) C Background model
28
28 Example of Feedback Query Model Trec topic 412: “airport security” =0.9 =0.7 Mixture model approach Web database Top 10 docs
29
29 Model-based feedback vs. Simple LM
30
30 Where Are We? Risk Minimization Framework Two-stage Language Model KL-divergence Retrieval Model Aspect Retrieval Model
31
31 Aspect Retrieval Query: What are the applications of robotics in the world today? Find as many DIFFERENT applications as possible. Example Aspects: A 1 : spot-welding robotics A 2 : controlling inventory A 3 : pipe-laying robots A 4 : talking robot A 5 : robots for loading & unloading memory tapes A 6 : robot [telephone] operators A 7 : robot cranes … Aspect judgments A 1 A 2 A 3 …... A k d 1 1 1 0 0 … 0 0 d 2 0 1 1 1 … 0 0 d 3 0 0 0 0 … 1 0 …. d k 1 0 1 0... 0 1
32
32 Evaluation Measures Aspect Coverage (AC): measures per-doc coverage – #distinct-aspects/#docs –Equivalent to the “set cover” problem, NP-hard Aspect Uniqueness(AU): measures redundancy –#distinct-aspects/#aspects –Equivalent to the “volume cover” problem, NP-hard Examples 00010010001001 01011000101100 10001011000101 …... d1d1 d3d3 d2d2 #doc 1 2 3 … … #asp 2 5 8 … … #uniq-asp 2 4 5 AC: 2/1=2.0 4/2=2.0 5/3=1.67 AU: 2/2=1.0 4/5=0.8 5/8=0.625 Accumulated counts
33
33 Loss Function L( k+1 | 1 … k ) d1d1 dkdk ? d k+1 … 1 k k+1 known Novelty/Redundancy Nov ( k+1 | 1 … k ) Relevance Rel( k+1 ) Maximal Marginal Relevance (MMR) The best d k+1 is novel & relevant 1 k k+1 Maximal Diverse Relevance (MDR) Aspect Coverage Distrib. p(a| i ) The best d k+1 is complementary in coverage
34
34 Maximal Marginal Relevance (MMR) Models Maximizing aspect coverage indirectly through redundancy elimination Elements –Redundancy/Novelty measure –Combination of novelty and relevance Proposed & studied six novelty measures Proposed & studied four combination strategies
35
35 Comparison of Novelty Measures (Aspect Coverage)
36
36 Comparison of Novelty Measures (Aspect Uniqueness)
37
37 A Mixture Model for Redundancy P(w|Background) Collection P(w|Old) Ref. document 1- =? Maximum Likelihood Expectation-Maximization
38
38 Cost-based Combination of Relevance and Novelty Relevance scoreNovelty score
39
39 Maximal Diverse Relevance (MDR) Models Maximizing aspect coverage directly through aspect modeling Elements –Aspect loss function –Generative Aspect Model Proposed & studied KL-divergence aspect loss function Explored two aspect models (PLSI, LDA)
40
40 Aspect Generative Model of Document & Query U User q Query S Source d Document =( 1,…, k ) PLSI: LDA:
41
41 Aspect Loss Function Uq S d
42
42 Aspect Loss Function: Illustration Desired coverage p(a| Q ) “Already covered” p(a| 1 )... p(a| k -1 ) New candidate p(a| k ) non-relevant redundant perfect Combined coverage
43
43 Preliminary Evaluation: MMR vs. MDR On the relevant data set, both MMR and MDR are effective, but they complement each other - MMR improves AU more than AC - MDR improves AC more than AU On the mixed data set, however, - MMR is only effective when relevance ranking is accurate - MDR improves AC, even though relevance ranking is degraded.
44
44 Further Work is Needed Controlled experiments with synthetic data –Level of redundancy –Density of relevant documents –Per-document aspect counts Alternative loss functions Aspect language models, especially along the line of LDA –Aspect-based feedback
45
45 Summary of Contributions Two-stage Language Model KL-divergence Retrieval Model Aspect Retrieval Model New TR Models Risk Minimization Framework New TR Framework Unifies existing models Incorporates LMs Serves as a map for exploring new models Specific Contributions Empirical study of smoothing (dual role of smoothing) New smoothing method (two-stage smoothing) Automatic parameter setting (leave-one-out, mixture) Query/document distillation Feedback with LMs (mixture model & div. min.) Evaluation criteria (AC, AU) Redundancy/novelty measures (mixture weight) MMR with LMs (cost-comb.) Aspect-based loss function (“collective KL-div”)
46
46 Future Research Directions Better Approximation of the risk integral More effective LMs for “traditional” retrieval –Can we beat TF-IDF without increasing computational complexity? –Automatic parameter setting, especially for feedback models –Flexible passage retrieval, especially with HMM –Beyond unigrams (more linguistics)
47
47 More Future Research Directions Aspect Retrieval Models –Document structure/sub-topic modeling – Aspect-based feedback Interactive information retrieval models –Risk minimization for information filtering –Personalized & context-sensitive retrieval
48
48 Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.