Download presentation
Presentation is loading. Please wait.
Published byMarcia Foster Modified over 9 years ago
1
IR Theory: Relevance Feedback
2
Relevance Feedback: Example Initial Results Search Engine2
3
Relevance Feedback: Example Relevance Feedback Search Engine3
4
Relevance Feedback: Example Revised Results Search Engine4
5
Relevance Feedback: What is it? Search Engine5 1. Query Formulation - “What is IR?” 2. Search Results - (ranked) document list 3. Relevance Judgments - (selected) document list 4. Query Re-Formulation - “IR, search, retrieval” 5. Refined Search Results - (re-ranked) document list
6
Relevance Feedback: Why? Anomalous State of Knowledge (ASK) Information needs arise because the user doesn’t know something “an anomaly in his state of knowledge with respect to the problem faced” ASK suggests that the user may not know what he/she is looking for may lack the knowledge to properly express his/her information need Relevance Feedback Assumption The user will know when he/she sees a relevant document There will exist useful information (e.g., related terms) in relevant documents Searching is an iterative process Improve query formulation via feedback Facilitate vocabulary and concept discovery via search iteration Search Engine6
7
Relevance Feedback: How? Utilize relevance judgments to improve search performance Idea → Modify the current query based on relevance judgments 1.Relevance Judgments Identify relevant documents from initial search result 2.Query Reformulation Construct a better representation of information need based on feedback 3.Re-Ranking Generate a refined search result using the reformulated query Approaches How to collect feedback? Explicit, Implicit, Blind/Pseudo How to formulate the feedback query? Probabilistic, Vector-based, etc. Add terms from relevant documents to the query (query expansion) Modify query term weights based on their occurrences in relevant documents Search Engine7
8
Relevance Feedback: Approaches Manual-Explicit RF (Interactive) 1.User explicitly identifies relevant documents 2.User selects terms from a system-generated term list OR System reformulates the query automatically 3.System re-retrieves/ranks documents Manual-Implicit RF (Interactive-Automatic) 1.System identifies relevant documents based on user data (e.g., click-through, profile) 2.System reformulates the query & re-retrieves/ranks documents Pseudo/Blind RF (Fully Automatic) 1.Top n documents of initial retrieval results are assumed to be relevant 2.System reformulates the query & re-retrieves/ranks documents Limitations Uses binary, document-level relevance Does not accommodate multi-dimensional (e.g., aspectual, contextual) relevance Search Engine8
9
Relevance Feedback: Algorithm Rocchio Formula (Vector-based) where Q 1 = feedback query vector Q 0 = initial/original query vector R i = vector for the relevant document i S i = vector for the non-relevant document i n 1 = number of judged relevant documents n 2 = number of judged non-relevant documents and are coefficients that tune the importance of relevant and non-relevant terms (e.g. = 0.75, = 0.25) Feedback query (Q 1 ) Moves towards relevant document vectors & Away from non-relevant document vectors Search Engine9
10
Relevance Feedback: Algorithm Robertson-Spark Jones Weight (Probabilistic) Generate initial ranking with IDF formula Present top n documents to the user Re-compute the query term weights where N = number of evaluated documents n = number of evaluated documents in which term k appears R = number of evaluated documents that are relevant N - R = number of evaluated documents that are non-relevant r = number of evaluated documents in which term k appears and are relevant N - r = number of evaluated documents in which term k appears and are non-relevant Search Engine10 Relevant Documents Non- relevant Documents Total Documents with Term k rN-rn Documents without Term k R – rN-n-R+rN-n Total RN-RN
11
RF: Problem 11 Inverted Index w/ tf*idf: Relevance Feedback (Rocchio Method): - D 3 was juged relevant to Q 0 = (0, 0.3, 0.4) Q0Q0 Q1Q1 t1t1 0 0 + 0.75*0 – 0.25*(0.222+0.222+0.444)/4 = - 0.056 t2t2 0.3 0.3 + 0.75*0.444 – 0.25*(0.222+0.444)/4 = 0.591 t3t3 0.4 0.4 + 0.75*1.194 – 0.25*0.398/4 = 1.271 1.Formulate the feedback query: Q 0 = the initial query vector R i = the vector for the relevant document i S i = the vector for the non-relevant document i n 1 = the number of judged relevant documents n 2 = the number of judged non-relevant documents D1D1 D2D2 D3D3 D4D4 D5D5 t1t1 0.22200 0.444 t2t2 00.2220.444 0 t3t3 001.1940.3980 2.Rerank the documents using Q 1 - sim(D 1,Q 1 ) = - sim(D 2,Q 1 ) = - sim(D 3,Q 1 ) = - sim(D 4,Q 1 ) = - sim(D 5,Q 1 ) = D1D1 D2D2 D3D3 D4D4 D5D5 Score-0.0400.4210.9960.847-0.040 Rank43124
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.