Evaluating Information Retrieval Systems Kostousov Sergei sergkosto94@gmail.com 1 Hannover 14 June 2016
I. Novelty and Diversity in Information Retrieval Evaluation IR system Redundancy - Novelty + Ambiguity - Diversity + Measures MAP bpref nDCG These evaluation measures may produce unsatisfactory results when redundancy and ambiguity are considered. 2 Sergei Kostousov 2
I. Novelty and Diversity in Information Retrieval Evaluation Web Search Example 3 Sergei Kostousov 3
I. Novelty and Diversity in Information Retrieval Evaluation Question Answering Example 4 Sergei Kostousov 4
I. Novelty and Diversity in Information Retrieval Evaluation EVALUATION FRAMEWORK Principal: «If an IR system’s response to each query is a ranking of documents in order of decreasing probability of relevance, the overall effectiveness of the system to its user will be maximized» document binary random variable (relevance) info need occasioning a user to formulate q Information nuggets – any binary property of document Answer for a query Topicality Indicator of part of the site Specific fact 5 Sergei Kostousov 5
I. Novelty and Diversity in Information Retrieval Evaluation Formulate objective function & & Relevance Judgments J(d, i) = 1 if the assessor has judged that d contains nugget ni, and J(d, i) = 0 if not the possibility of assessor error 6 Sergei Kostousov 6
I. Novelty and Diversity in Information Retrieval Evaluation Ambiguity and Diversity «queries are linguistically ambiguous» Redundancy and Novelty 7 Sergei Kostousov 7
I. Novelty and Diversity in Information Retrieval Evaluation Normalized Discounted Cumulative Gain measure (nDCG) 1. gain vector 2. cumulative gain vector 3. discounted cumulative gain vector Computing Ideal Gain 8 Sergei Kostousov 8
I. Novelty and Diversity in Information Retrieval Evaluation Conclusion Goal was to define a workable evaluation framework for information retrieval that accounts for novelty and diversity in a sound fashion Serious criticism could be applied to many links in our chain of assumptions Despite these concerns, we believe we have made substantial progress towards our goal. Unusual features of our approach include recognition of judging error and the ability to incorporate a user model. 9 Sergei Kostousov 9
II. Adaptive Effort for Search Evaluation Metrics Searchers wish to find more but spend less We need accurately measure the amount of relevant information they found (gain) and the effort they spent (cost) Metrics: nDCG, GAP, RBP and ERR Two suggested approaches: parameter for the ratio of effort between relevant and non-relevant entries; time-based that measures effort by the expected time to examine the results 10 Sergei Kostousov 10
II. Adaptive Effort for Search Evaluation Metrics Existing IR Evaluation Metrics M1: E(gain)/E(effort) M2: E(gain/effort) 11 Sergei Kostousov 11
II. Adaptive Effort for Search Evaluation Metrics 12 Sergei Kostousov 12
II. Adaptive Effort for Search Evaluation Metrics Adaptive Effort Metrics 1. Parameter relevant non-relevant 2. Time-Based relevance grades: 0,1…rmax effort vector: [e0, e1, e2, ..., ermax] Relevance grades (r = 0, 1, 2) >> the effort vector is 13 Sergei Kostousov 13
II. Adaptive Effort for Search Evaluation Metrics Computation Relevance grade Effort vector 14 Sergei Kostousov 14
II. Adaptive Effort for Search Evaluation Metrics 15 Sergei Kostousov 15
II. Adaptive Effort for Search Evaluation Metrics Experiment 16 Sergei Kostousov 16
II. Adaptive Effort for Search Evaluation Metrics Conclusion Adaptive can better indicate users search experience compared with statics Future research - broad set of queries of different types - explore the effect of different effort levels 17 Sergei Kostousov 17