Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.

Slides:



Advertisements
Similar presentations
Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Advertisements

Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Evaluating Search Engine
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
1 CS 430: Information Discovery Lecture 10 Cranfield and TREC.
Modern Information Retrieval
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Evaluation of Retrieval Effectiveness 2.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Evaluation of Retrieval Effectiveness 2.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Browsing Personal Images Using Episodic Memory Chufeng Chen School of Computing and Technology, University of Sunderland
Information Retrieval: Human-Computer Interfaces and Information Access Process.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
Evaluating the Performance of IR Sytems
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 Discussion Class 5 TREC. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Information Retrieval
Evaluation Information retrieval Web. Purposes of Evaluation System Performance Evaluation efficiency of data structures and methods operational profile.
Chapter 5: Information Retrieval and Web Search
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Information Retrieval Evaluation and the Retrieval Process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Chapter 6: Information Retrieval and Web Search
Search Engine Architecture
Performance Measurement. 2 Testing Environment.
Information Retrieval
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
1 CS 430: Information Discovery Lecture 8 Evaluation of Retrieval Effectiveness II.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation of Information Retrieval Systems Xiangming Mu.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 CS 430: Information Discovery Lecture 11 Cranfield and TREC.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Information Retrieval in Practice
Evaluation Anisio Lacerda.
Search Engine Architecture
Modern Information Retrieval
IR Theory: Evaluation Methods
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Panagiotis G. Ipeirotis Luis Gravano
Search Engine Architecture
Retrieval Evaluation - Reference Collections
Presentation transcript:

Measuring How Good Your Search Engine Is. *

Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of test documents such as the Cranfield collection. l 1400 documents, 225 queries, exhaustive relevance judgements. l Now systems are evaluated at the annual Text Retrieval Evaluation Conference (TREC).

Reasons to evaluate the effectiveness of an IR system. l To aid in the selection of a system to procure l To evaluate query generation processes for improvements l To determine the effects of changes made to an existing information system: determine the effects of changing a system’s algorithms

Relevance l The most important evaluation metrics of information systems will always be biased by human subjectivity. l Relevance is not always binary, but a spectrum from exactly what is being looked for and its being totally unrelated. l Relevance may be: l Subjective, depending on a specific user’s judgement: inter- annotator agreement. l Situational, related to a user’s requirements: is information we already know relevant to our information need? l Temporal, changing over time: pertinence

The System View l Relates to a match between query terms and index terms within an item. Can be objectively tested without relying on human judgement, e.g. l time to index an item l Computer memory requirements l Response time from query input to first set of items retrieved for user to view l An objective measure involving the user is the time required to create a query

Recall and Precision l Precision = number_retrieved_and_relevant / total_number_retrieved l Recall = number_retrieved_and_relevant / total_number_relevant

Estimating Recall l In controlled environments with small databases the number of relevant documents can be found. l But for “open” searching on the internet, the total_number_relevant is not known l Two approaches to estimating total_number_relevant l a) Use a sampling technique: what percentage of documents are relevant ? l b) Technique used by TREC: use aggregate pool of documents retrieved by several search engines.

l Do you want to find all the pages which are relevant to your query? l Do you want all pages returned in the first screen of results to be relevant to your query? l is precision considering only the top 10 ranked hits.

User Satisfaction (Platt et al., 2002) l Used five-point Likert scale questionnaires to determine the degree of user satisfaction for each browser: l 1. I like this image browser. l 2. This Browser is easy to use. l 3. This Browser feels familiar. l 4. It is easy to find the photo I am looking for. l 5. A month from now, I would still be able to find these photos. l 6. I was satisfied with how the pictures were organised.

Text Retrieval and Evaluation Conference (TREC). Contents of the TREC data base: l Wall Street Journal l Associated Press Newswire l Articles from Computer Select discs l Federal Register l Short Abstracts from DOE publications l San Jose Mercury News l US Patents

Five new areas of testing called Tracks at TREC l Multilingual (e.g. El Norte newspaper in Spanish) l Interactive (e.g. relevance feedback, rather than batch mode) l Database merging track – merging hit files of several subcollections l Confusion track to deal with corrupted data l Routing (dissemination): long standing queries.

Qualitiative and Quantitative Methods l Qualitative evaluation: what is it like? l Quantitative evaluation: how much is it? A traditional comparison involves the following stages: l Qualitative assessment at the level of question- document pairs, of relevance l Quantitative analysis covering the different documents and the different questions, e.g. recall and precision l A final qualitative assessment of which system(s) perform better than other(s).