Download presentation
Presentation is loading. Please wait.
1
Evaluating Evaluation Measure Stability Authors: Chris Buckley, Ellen M. Voorhees Presenters: Burcu Dal, Esra Akbaş
2
Retrieval System Evaluation Experiments on the accuracies of evaluation measures Requirements for acceptable experiments: Reasonable number of requests. Reasonable evaluation measure. Reasonable notion of difference. A test collection consists of a set of documents, a set of topics, and a set of relevance judgments.
3
Retrieval System Evaluation-2 Each retrieval strategy: a ranked list of documents for each topic The list is ordered by decreasing likelihood The effectiveness of a strategy is computed as a function of the ranks
4
IR Measures Prec( λ ) Recall (1000) Prec at.5 Recall R-Prec Average Precision
5
Computing error rate Goal: to quantify the error rate associated with deciding that one retrieval method is better another Based on experiment a particular number of topics a specific evaluation measure a particular value, as fuzziness value
6
Select an evaluation measure and fuzziness value Pick a query set for each of nine retrieval methods Compare them first is better than, worse than or equal to the second method with respect to the fuzziness
7
Figure 1: Counts of the number of times the retrieval method of the row was better than, worse than, or equal to the method of the column. Counts were computed using a fuzziness factor of 5% and the original 21 query sets.
8
|A > B| is the number of times method A is better than method B in an entry. The number of times methods are deemed to be equivalent reflects on the power of a measure to discriminate among systems. The proportion of ties
9
Average error rate and average proportion of ties for different evaluation measures.
10
Varying topic set size investigate how changing the number of topics used in a test affects the error rate of the evaluation measures Look topic set sizes of 5, 10, 15, 20, 25, 30, 40, and 50 100 trials for each topic set size
12
Varying fuzziness values larger fuzziness values decrease the error rate but also decrease the discrimination power of the measure.
13
The effect of fuzziness value on average error rate.
14
Conclusion Error rate depends on Topic set size Query size Fuzziness value Evaluation measure
15
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.