Download presentation
Presentation is loading. Please wait.
1
Retrieval Evaluation
2
Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document sets, or large content types, such performance evaluations are valid. In information retrieval, we also care about retrieval performance evaluation, that is how well the retrieved documents match the goal.
3
Retrieval Performance Evaluation We discussed overall system evaluation previously –Traditional vs. berry-picking models of retrieval activity –Metrics include time to complete task, user satisfaction, user errors, time to learn system But how can we compare how well different algorithms do at retrieving documents?
4
Precision and Recall Consider if we have a document collection, a query and its results, and a task and its relevant documents. Document Collection Relevant Documents |R| Retrieved Documents |A| Relevant Documents in Answer Set |Ra|
5
Precision Precision – the percentage of retrieved documents that are relevant. = |Ra| / |A| Document Collection Relevant Documents |R| Retrieved Documents |A| Relevant Documents in Answer Set |Ra|
6
Recall Recall – the percentage of relevant documents that are retrieved. = |Ra| / |R| Document Collection Relevant Documents |R| Retrieved Documents |A| Relevant Documents in Answer Set |Ra|
7
Precision/Recall Trade-Off We can guarantee 100% recall by returning all documents in the collection … –Obviously, this is a bad idea! We can get a high precision rate by only returning documents that we are sure of. –Maybe a bad idea So, retrieval algorithms are characterized by their recall and precision curve
8
Plotting Precision/Recall Curve 11-Level Precision/Recall Graph –Plot precision at 0%, 10%, 20%, …, 100% recall. –Normally averages over a set of standard queries are used. P avg (r) = Σ ( P i (r) / N q ) Example (using one query): Relevant Documents (R q ) = {d 1, d 2, d 3, d 4, d 5, d 6, d 7, d 8, d 9, d 10 } Ordered Ranking by Retrieval Algorithm (A q ) = {d 10, d 27, d 7, d 44, d 35, d 3, d 73, d 82, d 19, d 4, d 29, d 33, d 48, d 54, d 1 }
9
Plotting Precision/Recall Curve Example (second query): Relevant Documents (R q ) = {d 1, d 7, d 82 } Ordered Ranking by Retrieval Algorithm (A q ) = {d 10, d 27, d 7, d 44, d 35, d 3, d 73, d 82, d 19, d 4, d 29, d 33, d 48, d 54, d 1 } Need to interpolate. Now plot the average of a set of queries that matches expected usage and distribution
10
Evaluating Interactive Systems Empirical data involving human users is time consuming to gather and difficult to draw universal conclusions from. Evaluation metrics for user interfaces –Time required to learn the system –Time to achieve goals on benchmark tasks –Error rates –Retention of the use of the interface over time –User satisfaction
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.