Download presentation
Presentation is loading. Please wait.
Published byDarcy Shelton Modified over 9 years ago
1
Lecture 3: Retrieval Evaluation Maya Ramanath
2
Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided by experts, identified from real search logs, etc. Relevance judgements – For a given query, is the document relevant?
3
Evaluation for Large Collections Cranfield/TREC paradigm – Pooling of results A/B testing – Possible for search engines Crowdsourcing – Let users decide
4
Precision and Recall Relevance judgements are binary – “relevant” or “not-relevant”. – Partition the collection into 2 parts. Precision Recall Can a search engine guarantee 100% recall?
5
F-measure F-Measure: Weighted harmonic mean of Precision and Recall Why use harmonic mean instead of arithmetic mean?
6
Precision-Recall Curves Using precision and recall to evaluate ranked retrieval Source: Introduction to Information Retrieval. Manning, Raghavan and Schuetze, 2008
7
Single measures Precision at k, P@10, P@100, etc. and others…
8
Graded Relevance – NDCG Highly relevant documents should have more importance Higher the rank of a relevant document, more valuable it is to the user
9
Inter-judge Agreement – Fleiss’ Kappa N – number of results n – number of ratings/result k – number of grades n ij – no. of judges who agree that the i th result should have grade j.
10
Tests of Statistical Significance Wilcoxon signed rank test Student’s paired t-test …and more
11
END OF MODULE “IR FROM 20000FT”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.