Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.

Lecture 3: Retrieval Evaluation Maya Ramanath

Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided by experts, identified from real search logs, etc. Relevance judgements – For a given query, is the document relevant?

Evaluation for Large Collections Cranfield/TREC paradigm – Pooling of results A/B testing – Possible for search engines Crowdsourcing – Let users decide

Precision and Recall Relevance judgements are binary – “relevant” or “not-relevant”. – Partition the collection into 2 parts. Precision Recall Can a search engine guarantee 100% recall?

F-measure F-Measure: Weighted harmonic mean of Precision and Recall Why use harmonic mean instead of arithmetic mean?

Precision-Recall Curves Using precision and recall to evaluate ranked retrieval Source: Introduction to Information Retrieval. Manning, Raghavan and Schuetze, 2008

Single measures Precision at k, P@10, P@100, etc. and others…

Graded Relevance – NDCG Highly relevant documents should have more importance Higher the rank of a relevant document, more valuable it is to the user

Inter-judge Agreement – Fleiss’ Kappa N – number of results n – number of ratings/result k – number of grades n ij – no. of judges who agree that the i th result should have grade j.

Tests of Statistical Significance Wilcoxon signed rank test Student’s paired t-test …and more

END OF MODULE “IR FROM 20000FT”

Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.

Similar presentations

Presentation on theme: "Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.

Similar presentations

Presentation on theme: "Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided."— Presentation transcript:

Similar presentations

About project

Feedback