WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.

Slides:



Advertisements
Similar presentations
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Introduction to Information Retrieval (Part 2) By Evren Ermis.
IR Models: Overview, Boolean, and Vector
Evaluating Search Engine
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
ISP 433/533 Week 2 IR Models.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Compression Word document: 1 page is about 2 to 4kB Raster Image of 1 page at 600 dpi is about 35MB Compression Ratio, CR =, where is the number of bits.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Modern Information Retrieval
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Evaluating the Performance of IR Sytems
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Information Retrieval Evaluation and the Retrieval Process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Performance Measurement. 2 Testing Environment.
Information Retrieval
Reference Collections: Collection Characteristics.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
Retrieval Evaluation Modern Information Retrieval, Chapter 3
Evaluation of Information Retrieval Systems
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Text Based Information Retrieval
Evaluation.
IR Theory: Evaluation Methods
CS 430: Information Discovery
Cost incurred per search Users’ efforts involved in
Introduction to Information Retrieval
Evaluation of Information Retrieval Systems
Retrieval Evaluation - Reference Collections
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Precision and Recall Reminder:
Presentation transcript:

WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION

INTRODUCTION Evaluation necessary Why evaluate ? What to evaluate? How to evaluate?

WHY EVALUATE Need to know the advantages and disadvantages of using a particular IRS. The user should be be able to decide whether he / she wants to use an IRS based on evaluation results. The user should also be able to decide whether it is cost-effective to use a particular IRS based on evaluation results.

WHAT TO EVALUATE What can be measured and should reflect the ability of the IRS to satisfy user needs. Coverage of system – to what extent the IRS includes relevant material Time lag – average interval between the time the user query request is made and the time taken to obtain an answer set. Form of presentation of output Effort involved on part of user in getting answers to his / her query request. Recall of the IRS - % of relevant materials actually retrieved in the answer to a query request. Precision of the IRS - % of retrieved material that is actually relevant.

HOW TO EVALUATE? Various methods available.

EVALUATION 2 main processes in IR :  User query request/query request/ information query/query retrieval strategy / search request  Answer set / Hits Need to know whether the documents retrieved in the answer set fulfills the user query request. Evaluation process known as retrieval performance evaluation. Evaluation is based on 2 main components : Test reference collection Evaluation measure.

EVALUATION Test reference collection consists of :  A collection of documents  A set of example information requests  A set of relevant documents (provided by specialists) for each information request 2 interrelated measures – RECALL and PRECISION

RETRIEVAL PERFORMANCE EVALUATION Relevance Recall and Precision Parameters defined : I = information request R = set of relevant documents |R| = number of documents in this set A = document answer set retrieved by the information request |A| = number of documents in this set |Ra| = number of documents in the intersection of sets R and A

RETRIEVAL PERFORMANCE EVALUATION Recall = fraction of the relevant documents (set R) which have been retrieved |Ra| R = |R| Precision = fraction of the retrieved documents (set A) which is relevant |Ra| P = |A|

Relevant docs in answer set |Ra| Relevant docs |R| Answer set |A| Collection Precision and Recall for a given example information request

RETRIEVAL PERFORMANCE EVALUATION Recall and precision are expressed as % Sorted by degree of relevance or ranking. User will see a ranked list.

RETRIEVAL PERFORMANCE EVALUATION a. 10 documents in an IRS with a collection of 100 documents has been identified by specialists as being relevant to a particular query request - d3, d5, d9, d25, d39, d44, d56, d71, d89, d123 b. A query request was submitted and the following documents were retrieved and ranked according to relevance.

RETRIEVAL PERFORMANCE EVALUATION 1. d123* 2. d84 3. d56* 4. d6 5. d8 6. d9* 7. d d d d25* 11. d d d d d3*

RETRIEVAL PERFORMANCE EVALUATION c. Only 5 documents retrieved (d123, d56, d9, d25, d3) are relevant to the query and matches the ones in (a).

d123 ranked 1 st R=1/10 x 100% = 10% P=1/1 x 100% = 100% d56 ranked 3 rd R=2/10 x 100% = 20% P=2/3 x100% = 66% d9 ranked 6 th R=3/10 x 100% = 30% P=3/6 x 100% = 50% d25 ranked 10 th R=4/10 x 100% = 40% P=4/10 x 100% = 40% d3 ranked 15 th R=5/10 x 100% = 50% P=5/15 x 100% = 33%

A = relevant documents  = non-relevant documents C = retrieved documents Ĉ = not retrieved documents N = total number of documents in the system RelevantNon-relevant Retrieved A  C  C Not retrievedA  Ĉ  Ĉ Contigency table

RETRIEVAL PERFORMANCE EVALUATION Contingency table N = 100 A=10, Ā =90 C=15, Ĉ =85 RelevantNon- Relevant Retrieved515-5=10 Not- Retrieved 10-5= =80 Recall =5/10X100% = 50%, Precision = 5/15X100% = 33%

OTHER ALTERNATIVE MEASURES Harmonic mean – a single measure which combines R & P E measures - a single measure which combines R & P, user specifies whether interested in R or P User-oriented measures - based on a the user’s interpretation of which documents are relevant and which documents are not relevant Expected search length Satisfaction – focuses only on relevant docs Frustration – focuses on non-relevant docs

REFERENCE COLLECTION Experimentations on IR done on test collection – eg. of test collection 1. Yearly conference known as TREC – Text Retrieval Conference Dedicated to experimentation with a large test collection of over 1 million documents, testing is time consuming. For each TREC conference, a set of reference experiments designed and research groups use these reference experiments to compare their IRS TREC NIST site –

REFERENCE COLLECTION Collection known as TIPSTER TIPSTER/TREC test collection Collection composed of Documents A set of example information requests ot topics A set of relevant documents for each example information request

OTHER TEST COLLECTIONS ADI – documents on information science CACM – computer science INSPEC – abstracts on electronics, computer and physics ISI – library science Medlars – medical articles developed by E.A. Fox for his PhD thesis at Cornell University, Ithaca, New York in 1883 – Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types –