Retrieval Utilities Relevance feedback Clustering Passage-based Retrieval Parsing N-grams Thesauri Semantic Networks Regression Analysis
Relevance Feedback Do the retrieval in multiple steps User refines the query at each step wrt the results of the previous queries User tells the IR system which documents are relevant New terms are added to the query based on the feedback Term weights may be updated based on the user feedback
Relevance Feedback Bypass the user for relevance feedback by Assuming the top-k results in the ranked list are relevant Modify the original query as done before
Relevance Feedback Example: “find information surrounding the various conspiracy theories about the assassination of John F. Kennedy” (Example from your text book) IF the highly ranked document contains the term “Oswald” then this needs to be added to the initial query If the term “assassination” appears in the top ranked document, then its weight should be increased.
Relevance Feedback in Vector Space Model Q is the original query R is the set of relevant and S is the set of irrelevant documents selected by the user |R| = n1, |S| = n2
Relevance Feedback in Vector Space Model Q is the original query R is the set of relevant and S is the set of irrelevant documents selected by the user |R| = n1, |S| = n2 In general The weights are referred to as Rocchio weights
Relevance Feedback in Vector Space Model What if the original query retrieves only non-relevant documents (determined by the user)? Then increase the weight of the most frequently occurring term in the document collection.
Relevance Feedback in Vector Space Model Result set clustering can be used as a utility for relevance feedback. Hierarchical clustering can be used for that purpose where the distance is defined by the cosine similarity