Download presentation
Presentation is loading. Please wait.
Published byHester Octavia Hood Modified over 9 years ago
1
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information Retrieval ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology Department of Statistics University of Illinois, Urbana-Champaign
2
Different Needs for Diversification Redundancy reduction Diverse information needs (e.g., overview, subtopic retrieval) Active relevance feedback … 2
3
Outline Risk minimization framework Capturing different needs for diversification Language models for diversification 3
4
4 IR as Sequential Decision Making UserSystem A 1 : Enter a query Which documents to present? How to present them? R i : results (i=1, 2, 3, …) Which documents to view? A 2 : View document Which part of the document to show? How? R’: Document content View more? A 3 : Click on “Back” button (Information Need) (Model of Information Need)
5
5 Retrieval Decisions User U: A 1 A 2 … … A t-1 A t System: R 1 R 2 … … R t-1 Given U, C, A t, and H, choose the best R t from all possible responses to A t History H={(A i,R i )} i=1, …, t-1 Document Collection C Query=“Jaguar” All possible rankings of C The best ranking for the query Click on “Next” button All possible size-k subsets of unseen docs The best k unseen docs R t r(A t ) R t =?
6
6 A Risk Minimization Framework User: U Interaction history: H Current user action: A t Document collection: C Observed All possible responses: r(A t )={r 1, …, r n } User Model M=(S, U …) Seen docs Information need L(r i,A t,M)Loss Function Optimal response: r* (minimum loss) ObservedInferred Bayes risk
7
7 Approximate the Bayes risk by the loss at the mode of the posterior distribution Two-step procedure –Step 1: Compute an updated user model M* based on the currently available information –Step 2: Given M*, choose a response to minimize the loss function A Simplified Two-Step Decision-Making Procedure
8
8 Optimal Interactive Retrieval User A1A1 UC M* 1 P(M 1 |U,H,A 1,C) L(r,A 1,M* 1 ) R1R1 A2A2 L(r,A 2,M* 2 ) R2R2 M* 2 P(M 2 |U,H,A 2,C) A3A3 … Collection IR system
9
Rt query, clickthrough, feedback,…} r(A t ): decision space (A t dependent) –r(A t ) = all possible subsets of C + presentation strategies –r(A t ) = all possible rankings of docs in C –r(A t ) = all possible rankings of unseen docs –… M: user model –Essential component: U = user information need –S = seen documents –n = “Topic is new to the user” L(R t,A t,M): loss function –Generally measures the utility of R t for a user modeled as M –Often encodes retrieval criteria (e.g., using M to select a ranking of docs) P(M|U, H, A t, C): user model inference –Often involves estimating a unigram language model U 9 Refinement of Risk Minimization
10
10 Generative Model of Document & Query [Lafferty & Zhai 01] observed Partially observed U User S Source inferred d Document q Query R
11
11 Risk Minimization with Language Models [Lafferty & Zhai 01, Zhai & Lafferty 06] Choice: (D 1, 1 ) Choice: (D 2, 2 ) Choice: (D n, n )... query q user U doc set C source S qq 11 NN hiddenobservedloss Bayes risk for choice (D, ) RISK MINIMIZATION Loss L
12
12 Optimal Ranking for Independent Loss Decision space = {rankings} Sequential browsing Independent loss Independent risk = independent scoring “Risk ranking principle” [Zhai 02, Zhai & Lafferty 06]
13
Risk Minimization for Diversification Redundancy reduction: Loss function includes a redundancy measure –Special case: list presentation + MMR [Zhai et al. 03] Diverse information needs: loss function defined on latent topics –Special case: PLSA/LDA + topic retrieval [Zhai 02] Active relevance feedback: loss function considers both relevance and benefit for feedback –Special case: hard queries + feedback only [Shen & Zhai 05] 13
14
Subtopic Retrieval Query: What are the applications of robotics in the world today? Find as many DIFFERENT applications as possible. Example subtopics: A 1 : spot-welding robotics A 2 : controlling inventory A 3 : pipe-laying robots A 4 : talking robot A 5 : robots for loading & unloading memory tapes A 6 : robot [telephone] operators A 7 : robot cranes … Subtopic judgments A 1 A 2 A 3 …... A k d 1 1 1 0 0 … 0 0 d 2 0 1 1 1 … 0 0 d 3 0 0 0 0 … 1 0 …. d k 1 0 1 0... 0 1 This is a non-traditional retrieval task …
15
Diversify = Remove Redundancy 15 “Willingness to tolerate redundancy” C2<C3, since a redundant relevant doc is better than a non-relevant doc Greedy Algorithm for Ranking: Maximal Marginal Relevance (MMR)
16
A Mixture Model for Redundancy P(w|Background) Collection P(w|Old) Ref. document 1- =? p(New|d)= = probability of “new” (estimated using EM) p(New|d) can also be estimated using KL-divergence
17
Evaluation metrics Intuitive goals: – Should see documents from many different subtopics appear early in a ranking (subtopic coverage/recall) – Should not see many different documents that cover the same subtopics (redundancy). How do we quantify these? –One problem: the “intrinsic difficulty” of queries can vary.
18
Evaluation metrics: a proposal Definition: Subtopic recall at rank K is the fraction of subtopics a so that one of d1,..,dK is relevant to a. Definition: minRank(S,r) is the smallest rank K such that the ranking produced by IR system S has subtopic recall r at rank K. Definition: Subtopic precision at recall level r for IR system S is: This generalizes ordinary recall-precision metrics. It does not explicitly penalize redundancy.
19
Evaluation metrics: rationale recall K minRank(Sopt,r) minRank(S,r) precision 1.0 0.0 For subtopics, the minRank(Sopt,r) curve’s shape is not predictable and linear.
20
Evaluating redundancy Definition: the cost of a ranking d1,…,dK is where b is cost of seeing document, a is cost of seeing a subtopic inside a document (before a=0). Definition: minCost(S,r) is the minimal cost at which recall r is obtained. Definition: weighted subtopic precision at r is will use a=b=1
21
Evaluation Metrics Summary Measure performance (size of ranking minRank, cost of ranking minCost) relative to optimal. Generalizes ordinary precision/recall. Possible problems: –Computing minRank, minCost is NP-hard! –A greedy approximation seems to work well for our data set
22
Experiment Design Dataset: TREC “interactive track” data. –London Financial Times: 210k docs, 500Mb –20 queries from TREC 6-8 Subtopics: average 20, min 7, max 56 Judged docs: average 40, min 5, max 100 Non-judged docs assumed not relevant to any subtopic. Baseline: relevance-based ranking (using language models) Two experiments –Ranking only relevant documents –Ranking all documents
23
S-Precision: re-ranking relevant docs
24
WS-precision: re-ranking relevant docs
25
Results for ranking all documents “Upper bound”: use subtopic names to build an explicit subtopic model.
26
Summary: Remove Redundancy Mixture model is effective for identifying novelty in relevant documents Trading off novelty and relevance is hard Relevance seems to be dominating factor in TREC interactive-track data
27
Diversity = Satisfy Diverse Info. Need [Zhai 02] Need to directly model latent aspects and then optimize results based on aspect/topic matching Reducing redundancy doesn’t ensure complete coverage of diverse aspects 27
28
Aspect Generative Model of Document & Query U User q Query S Source d Document =( 1,…, k ) PLSI: LDA:
29
Aspect Loss Function Uq S d
30
Aspect Loss Function: Illustration Desired coverage p(a| Q ) “Already covered” p(a| 1 )... p(a| k -1 ) Combined coverage p(a| k ) New candidate p(a| k ) non-relevant redundant perfect
31
Evaluation Measures Aspect Coverage (AC): measures per-doc coverage – #distinct-aspects/#docs –Equivalent to the “set cover” problem Aspect Uniqueness(AU): measures redundancy –#distinct-aspects/#aspects –Equivalent to the “volume cover” problem Examples 00010010001001 01011000101100 #doc 1 2 3 … … #asp 2 5 8 … … #uniq-asp 2 4 5 AC: 2/1=2.0 4/2=2.0 5/3=1.67 AU: 2/2=1.0 4/5=0.8 5/8=0.625 10001011000101 …... d1d1 d3d3 d2d2
32
Effectiveness of Aspect Loss Function (PLSI)
33
Effectiveness of Aspect Loss Function (LDA)
34
Comparison of 4 MMR Methods CC - Cost-based Combination QB - Query Background Model MQM - Query Marginal Model MDM - Document Marginal Model
35
Summary: Diverse Information Need Mixture model is effective for capturing latent topics Direct modeling of latent aspects/topics is more effective than indirect modeling through MMR in improving aspect coverage, but MMR is better for improving aspect uniqueness With direct topic modeling and matching, aspect coverage can be improved at the price of lower relevance-based precision
36
Diversify = Active Feedback [Shen & Zhai 05] Decision problem: Decide subset of documents for relevance judgment
37
Independent Loss
38
Independent Loss (cont.) Uncertainty Sampling Top K
39
Dependent Loss Heuristics: consider relevance first, then diversity Gapped Top K Select Top N documents Cluster N docs into K clusters K Cluster Centroid MMR …
40
Illustration of Three AF Methods Top-K (normal feedback) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … Gapped Top-K K-cluster centroid Aiming at high diversity …
41
Evaluating Active Feedback Query Select K docs K docs Judgment File + Judged docs ++ + - - Initial Results No feedback (Top-k, gapped, clustering) Feedback Results
42
Retrieval Methods (Lemur toolkit) Query Q Document D Results Kullback-Leibler Divergence Scoring Feedback Docs F={d 1, …, d n } Active Feedback Default parameter settings unless otherwise stated Mixture Model Feedback Only learn from relevant docs
43
Comparison of Three AF Methods CollectionActive FB Method #Rel Include judged docs MAPPr@10doc HARD Top-K1460.3250.527 Gapped1500.3300.548 Clustering1050.3320.565 AP88-89 Top-K1980.2280.351 Gapped1800.234*0.389* Clustering1180.2370.393 Top-K is the worst! bold font = worst * = best Clustering uses fewest relevant docs
44
Appropriate Evaluation of Active Feedback New DB (AP88-89, AP90) Original DB with judged docs (AP88-89, HARD) + - + Original DB without judged docs + - + Can’t tell if the ranking of un- judged documents is improved Different methods have different test documents See the learning effect more explicitly But the docs must be similar to original docs
45
Comparison of Different Test Data Test DataActive FB Method #RelMAPPr@10d oc AP88-89 Including judged docs Top-K1980.2280.351 Gapped1800.2340.389 Clustering1180.2370.393 AP90Top-K1980.2200.321 Gapped1800.2220.326 Clustering1180.2230.325 Clustering generates fewer, but higher quality examples Top-K is consistently the worst!
46
Summary: Active Feedback Presenting the top-k is not the best strategy Clustering can generate fewer, higher quality feedback examples
47
Conclusions There are many reasons for diversifying search results (redundancy, diverse information needs, active feedback) Risk minimization framework can model all these cases of diversification Different scenarios may need different techniques and different evaluation measures 47
48
48 Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.