Download presentation
Presentation is loading. Please wait.
1
Evaluating Information Retrieval Systems
Kostousov Sergei 1 Hannover 14 June 2016
2
I. Novelty and Diversity in Information Retrieval Evaluation
IR system Redundancy Novelty + Ambiguity Diversity + Measures MAP bpref nDCG These evaluation measures may produce unsatisfactory results when redundancy and ambiguity are considered. 2 Sergei Kostousov 2
3
I. Novelty and Diversity in Information Retrieval Evaluation
Web Search Example 3 Sergei Kostousov 3
4
I. Novelty and Diversity in Information Retrieval Evaluation
Question Answering Example 4 Sergei Kostousov 4
5
I. Novelty and Diversity in Information Retrieval Evaluation
EVALUATION FRAMEWORK Principal: «If an IR system’s response to each query is a ranking of documents in order of decreasing probability of relevance, the overall effectiveness of the system to its user will be maximized» document binary random variable (relevance) info need occasioning a user to formulate q Information nuggets – any binary property of document Answer for a query Topicality Indicator of part of the site Specific fact 5 Sergei Kostousov 5
6
I. Novelty and Diversity in Information Retrieval Evaluation
Formulate objective function & & Relevance Judgments J(d, i) = 1 if the assessor has judged that d contains nugget ni, and J(d, i) = 0 if not the possibility of assessor error 6 Sergei Kostousov 6
7
I. Novelty and Diversity in Information Retrieval Evaluation
Ambiguity and Diversity «queries are linguistically ambiguous» Redundancy and Novelty 7 Sergei Kostousov 7
8
I. Novelty and Diversity in Information Retrieval Evaluation
Normalized Discounted Cumulative Gain measure (nDCG) 1. gain vector 2. cumulative gain vector 3. discounted cumulative gain vector Computing Ideal Gain 8 Sergei Kostousov 8
9
I. Novelty and Diversity in Information Retrieval Evaluation
Conclusion Goal was to define a workable evaluation framework for information retrieval that accounts for novelty and diversity in a sound fashion Serious criticism could be applied to many links in our chain of assumptions Despite these concerns, we believe we have made substantial progress towards our goal. Unusual features of our approach include recognition of judging error and the ability to incorporate a user model. 9 Sergei Kostousov 9
10
II. Adaptive Effort for Search Evaluation Metrics
Searchers wish to find more but spend less We need accurately measure the amount of relevant information they found (gain) and the effort they spent (cost) Metrics: nDCG, GAP, RBP and ERR Two suggested approaches: parameter for the ratio of effort between relevant and non-relevant entries; time-based that measures effort by the expected time to examine the results 10 Sergei Kostousov 10
11
II. Adaptive Effort for Search Evaluation Metrics
Existing IR Evaluation Metrics M1: E(gain)/E(effort) M2: E(gain/effort) 11 Sergei Kostousov 11
12
II. Adaptive Effort for Search Evaluation Metrics
12 Sergei Kostousov 12
13
II. Adaptive Effort for Search Evaluation Metrics
Adaptive Effort Metrics 1. Parameter relevant non-relevant 2. Time-Based relevance grades: 0,1…rmax effort vector: [e0, e1, e2, ..., ermax] Relevance grades (r = 0, 1, 2) >> the effort vector is 13 Sergei Kostousov 13
14
II. Adaptive Effort for Search Evaluation Metrics
Computation Relevance grade Effort vector 14 Sergei Kostousov 14
15
II. Adaptive Effort for Search Evaluation Metrics
15 Sergei Kostousov 15
16
II. Adaptive Effort for Search Evaluation Metrics
Experiment 16 Sergei Kostousov 16
17
II. Adaptive Effort for Search Evaluation Metrics
Conclusion Adaptive can better indicate users search experience compared with statics Future research - broad set of queries of different types - explore the effect of different effort levels 17 Sergei Kostousov 17
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.