Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign
2 Normal Relevance Feedback (RF) Feedback Judgments: d 1 + d 2 - … d k - Query Retrieval System Top K Results d d … d k 0.5 User Document Collection
3 Document Selection in RF Feedback Judgments: d 1 + d 2 - … d k - Query Retrieval System Which k docs to present ? User Document Collection Can we do better than just presenting top-K? (Consider diversity…)
4 Active Feedback (AF) An IR system actively selects documents for obtaining relevance judgments If a user is willing to judge K documents, which K documents should we present in order to maximize learning effectiveness?
5 Outline Framework and specific methods Experiment design and results Summary and future work
6 A Framework for Active Feedback Consider active feedback as a decision problem –Decide K documents (D) for relevance judgment Formalize it as an optimization problem –Optimize the expected learning benefits (loss) by requesting relevance judgments on D from the user Consider two cases of loss function according to the interaction between documents
7 Formula of the Framework Value of documents for learning Independent judgment Different judgments
8 Independent Loss Expected loss of each document
9 Independent Loss (cont.) Uncertainty Sampling Top K Relevant docs more useful than non-relevant docs More uncertain, more useful
10 Dependent Loss First select Top N docs of baseline retrieval Cluster N docs into K clusters K Cluster Centroid MMR … Gapped Top K Pick one doc every G+1 docs More relevant, more useful More diverse, more useful
11 Illustration of Three AF Methods Top-K (normal feedback) … Gapped Top-K K-Cluster Centroid Aiming at high diversity …
12 Evaluating Active Feedback Query Select K Docs K docs Judgment File + Judged Docs Initial Results No Feedback (Top-k, Gapped, Clustering) Feedback Results
13 Retrieval Methods (Lemur toolkit) Query Q Document D Results KL Divergence Feedback Docs F={d 1, …, d n } Active Feedback Default parameter settings unless otherwise stated Mixture Model Feedback Only learn from relevant docs
14 Comparison of Three AF Methods Collection Active FB Method #AFRel Per topic Include judged docs HARD 2003 Baseline/ Pseudo FB/ Top-K Gapped ** * Clustering AP88-89 Baseline/ Pseudo FB/ Top-K Gapped * ** Clustering ** ** Top-K is the worst! Clustering uses fewest relevant docs
15 Appropriate Evaluation of Active Feedback New DB (AP88-89, AP90) Original DB with judged docs (AP88-89, HARD) Original DB without judged docs Can’t tell if the ranking of un-judged documents is improved Different methods have different test documents See the learning effect more explicitly But the docs must be similar to original docs
16 Retrieval Performance on AP90 Dataset MethodBaselinePseudo FB Top KGapped Top K K Cluster Centroid MAP Top-K is consistently the worst!
17 Mixture Model Parameter Factor
18 Summary Introduce the active feedback problem Propose a preliminary framework and three methods (Top-k, Gapped Top-k, Clustering) Study the evaluation strategy Experiment results show that –Presenting the top-k is not the best strategy –Clustering can generate fewer, higher quality feedback examples
19 Future Work Explore other methods for active feedback Develop a general framework Combine pseudo feedback and active feedback
20 Thank you ! The End