Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Slides:



Advertisements
Similar presentations
Recuperação de Informação B Cap. 10: User Interfaces and Visualization 10.1,10.2,10.3 November 17, 1999.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Shape Analysis and Retrieval ( ) (Michael) Misha Kazhdan.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Evaluating Search Engine
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
Modern Information Retrieval
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Evaluation CSC4170 Web Intelligence and Social Computing Tutorial 5 Tutor: Tom Chao Zhou
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:
Combining Content-based and Collaborative Filtering Department of Computer Science and Engineering, Slovak University of Technology
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Chapter 6: Information Retrieval and Web Search
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Progress presentation
Introduction to HCI Lecture #1.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
Introduction to Computational Thinking
The Social Science Inquiry Model
7CCSMWAL Algorithmic Issues in the WWW
Evaluation.
Modern Information Retrieval
IR Theory: Evaluation Methods
Query Caching in Agent-based Distributed Information Retrieval
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Introduction to information retrieval
Precision and Recall Reminder:
Information Retrieval
Presentation transcript:

Retrieval Evaluation

Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document sets, or large content types, such performance evaluations are valid. In information retrieval, we also care about retrieval performance evaluation, that is how well the retrieved documents match the goal.

Retrieval Performance Evaluation We discussed overall system evaluation previously –Traditional vs. berry-picking models of retrieval activity –Metrics include time to complete task, user satisfaction, user errors, time to learn system But how can we compare how well different algorithms do at retrieving documents?

Precision and Recall Consider if we have a document collection, a query and its results, and a task and its relevant documents. Document Collection Relevant Documents |R| Retrieved Documents |A| Relevant Documents in Answer Set |Ra|

Precision Precision – the percentage of retrieved documents that are relevant. = |Ra| / |A| Document Collection Relevant Documents |R| Retrieved Documents |A| Relevant Documents in Answer Set |Ra|

Recall Recall – the percentage of relevant documents that are retrieved. = |Ra| / |R| Document Collection Relevant Documents |R| Retrieved Documents |A| Relevant Documents in Answer Set |Ra|

Precision/Recall Trade-Off We can guarantee 100% recall by returning all documents in the collection … –Obviously, this is a bad idea! We can get a high precision rate by only returning documents that we are sure of. –Maybe a bad idea So, retrieval algorithms are characterized by their recall and precision curve

Plotting Precision/Recall Curve 11-Level Precision/Recall Graph –Plot precision at 0%, 10%, 20%, …, 100% recall. –Normally averages over a set of standard queries are used. P avg (r) = Σ ( P i (r) / N q ) Example (using one query): Relevant Documents (R q ) = {d 1, d 2, d 3, d 4, d 5, d 6, d 7, d 8, d 9, d 10 } Ordered Ranking by Retrieval Algorithm (A q ) = {d 10, d 27, d 7, d 44, d 35, d 3, d 73, d 82, d 19, d 4, d 29, d 33, d 48, d 54, d 1 }

Plotting Precision/Recall Curve Example (second query): Relevant Documents (R q ) = {d 1, d 7, d 82 } Ordered Ranking by Retrieval Algorithm (A q ) = {d 10, d 27, d 7, d 44, d 35, d 3, d 73, d 82, d 19, d 4, d 29, d 33, d 48, d 54, d 1 } Need to interpolate. Now plot the average of a set of queries that matches expected usage and distribution

Evaluating Interactive Systems Empirical data involving human users is time consuming to gather and difficult to draw universal conclusions from. Evaluation metrics for user interfaces –Time required to learn the system –Time to achieve goals on benchmark tasks –Error rates –Retention of the use of the interface over time –User satisfaction