IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.

IR Theory: Relevance Feedback

Relevance Feedback: Example  Initial Results Search Engine2

Relevance Feedback: Example  Relevance Feedback Search Engine3

Relevance Feedback: Example  Revised Results Search Engine4

Relevance Feedback: What is it? Search Engine5 1. Query Formulation - “What is IR?” 2. Search Results - (ranked) document list 3. Relevance Judgments - (selected) document list 4. Query Re-Formulation - “IR, search, retrieval” 5. Refined Search Results - (re-ranked) document list

Relevance Feedback: Why?  Anomalous State of Knowledge (ASK)  Information needs arise because the user doesn’t know something  “an anomaly in his state of knowledge with respect to the problem faced”  ASK suggests that the user  may not know what he/she is looking for  may lack the knowledge to properly express his/her information need  Relevance Feedback Assumption  The user will know when he/she sees a relevant document  There will exist useful information (e.g., related terms) in relevant documents  Searching is an iterative process  Improve query formulation via feedback  Facilitate vocabulary and concept discovery via search iteration Search Engine6

Relevance Feedback: How?  Utilize relevance judgments to improve search performance  Idea → Modify the current query based on relevance judgments 1.Relevance Judgments  Identify relevant documents from initial search result 2.Query Reformulation  Construct a better representation of information need based on feedback 3.Re-Ranking  Generate a refined search result using the reformulated query  Approaches  How to collect feedback?  Explicit, Implicit, Blind/Pseudo  How to formulate the feedback query?  Probabilistic, Vector-based, etc.  Add terms from relevant documents to the query (query expansion)  Modify query term weights based on their occurrences in relevant documents Search Engine7

Relevance Feedback: Approaches  Manual-Explicit RF (Interactive) 1.User explicitly identifies relevant documents 2.User selects terms from a system-generated term list OR System reformulates the query automatically 3.System re-retrieves/ranks documents  Manual-Implicit RF (Interactive-Automatic) 1.System identifies relevant documents based on user data (e.g., click-through, profile) 2.System reformulates the query & re-retrieves/ranks documents  Pseudo/Blind RF (Fully Automatic) 1.Top n documents of initial retrieval results are assumed to be relevant 2.System reformulates the query & re-retrieves/ranks documents  Limitations  Uses binary, document-level relevance  Does not accommodate multi-dimensional (e.g., aspectual, contextual) relevance Search Engine8

Relevance Feedback: Algorithm  Rocchio Formula (Vector-based) where Q 1 = feedback query vector Q 0 = initial/original query vector R i = vector for the relevant document i S i = vector for the non-relevant document i n 1 = number of judged relevant documents n 2 = number of judged non-relevant documents  and  are coefficients that tune the importance of relevant and non-relevant terms (e.g.  = 0.75,  = 0.25)  Feedback query (Q 1 )  Moves towards relevant document vectors &  Away from non-relevant document vectors Search Engine9

Relevance Feedback: Algorithm  Robertson-Spark Jones Weight (Probabilistic)  Generate initial ranking with IDF formula  Present top n documents to the user  Re-compute the query term weights where N = number of evaluated documents n = number of evaluated documents in which term k appears R = number of evaluated documents that are relevant N - R = number of evaluated documents that are non-relevant r = number of evaluated documents in which term k appears and are relevant N - r = number of evaluated documents in which term k appears and are non-relevant Search Engine10 Relevant Documents Non- relevant Documents Total Documents with Term k rN-rn Documents without Term k R – rN-n-R+rN-n Total RN-RN

RF: Problem 11 Inverted Index w/ tf*idf: Relevance Feedback (Rocchio Method): - D 3 was juged relevant to Q 0 = (0, 0.3, 0.4) Q0Q0 Q1Q1 t1t1 0 0 + 0.75*0 – 0.25*(0.222+0.222+0.444)/4 = - 0.056 t2t2 0.3 0.3 + 0.75*0.444 – 0.25*(0.222+0.444)/4 = 0.591 t3t3 0.4 0.4 + 0.75*1.194 – 0.25*0.398/4 = 1.271 1.Formulate the feedback query: Q 0 = the initial query vector R i = the vector for the relevant document i S i = the vector for the non-relevant document i n 1 = the number of judged relevant documents n 2 = the number of judged non-relevant documents D1D1 D2D2 D3D3 D4D4 D5D5 t1t1 0.22200 0.444 t2t2 00.2220.444 0 t3t3 001.1940.3980 2.Rerank the documents using Q 1 - sim(D 1,Q 1 ) = - sim(D 2,Q 1 ) = - sim(D 3,Q 1 ) = - sim(D 4,Q 1 ) = - sim(D 5,Q 1 ) = D1D1 D2D2 D3D3 D4D4 D5D5 Score-0.0400.4210.9960.847-0.040 Rank43124

IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.

Similar presentations

Presentation on theme: "IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.

Similar presentations

Presentation on theme: "IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2."— Presentation transcript:

Similar presentations

About project

Feedback