A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.

Slides:



Advertisements
Similar presentations
Recuperação de Informação B Cap. 10: User Interfaces and Visualization 10.1,10.2,10.3 November 17, 1999.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
K-6 Science and Technology Consistent teaching – Assessing K-6 Science and Technology © 2006 Curriculum K-12 Directorate, NSW Department of Education and.
Victorian Curriculum and Assessment Authority
Protocol Development.
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Search Engines and Information Retrieval
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
© Tefko Saracevic, Rutgers University1 Interaction in information retrieval There is MUCH more to searching than knowing computers, networks & commands,
Modern Information Retrieval
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.
Experimental Components for the Evaluation of Interactive Information Retrieval Systems Pia Borlund Dawn Filan 3/30/04 610:551.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Designing Course-Level Performance Measures Aligned with Program-Level Learning Outcomes Steven Beyerlein, Mechanical Engineering University of Idaho Daniel.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Research Methods in MIS Dr. Deepak Khazanchi. Objectives for the Course Identify Problem Areas Conduct Interview Do Library Research Develop Theoretical.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
JASS 2005 Next-Generation User-Centered Information Management Information visualization Alexander S. Babaev Faculty of Applied Mathematics.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Information Retrieval Evaluation and the Retrieval Process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Information Behaviour Introduction to Library & Information Studies March 1, 2010.
1 Introduction to Software Engineering Lecture 1.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Ben Carterette Paul Clough Evangelos Kanoulas Mark Sanderson.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.
Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.
Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.
Information Retrieval
Jhu-hlt-2004 © n.j. belkin 1 Information Retrieval: A Quick Overview Nicholas J. Belkin
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
Strategies for Generating Topics/Questions If you are researching a topic on which you already have definite opinions, you may have a thesis in mind before.
Information Retrieval in Practice
Sampath Jayarathna Cal Poly Pomona
Information Retrieval (in Practice)
Approaches to Information Systems Development
Evaluation.
Modern Information Retrieval
IR Theory: Evaluation Methods
Martin Rajman, EPFL Switzerland & Martin Vesely, CERN Switzerland
CSE 635 Multimedia Information Retrieval
Information Retrieval for Evidence-based Practice
Retrieval Evaluation - Measures
Retrieval Performance Evaluation - Measures
Presentation transcript:

A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004

Introduction Reid offers an alternative to evaluation of IR systems based on the objective, static concept of relevance Reid offers an alternative to evaluation of IR systems based on the objective, static concept of relevance Instead, offers the concept of “task- oriented relevance” Instead, offers the concept of “task- oriented relevance” Non-topical relevanceNon-topical relevance Shift toward more user-centered evaluation methodsShift toward more user-centered evaluation methods Attempts to bridge systems-oriented view with a user-oriented viewAttempts to bridge systems-oriented view with a user-oriented view

Non-Interactive Evaluation Advantages Cheap Cheap Simple to Conduct Simple to Conduct Statistical Measures allow for easy comparison of systems Statistical Measures allow for easy comparison of systemsDisadvantages Relevance judgments are too simplistic Relevance judgments are too simplistic Does not account for interactive nature of current IR systems Does not account for interactive nature of current IR systems “Systems” Framework, not “Situational” “Systems” Framework, not “Situational”

Task-Oriented IR Task Representation Task Outcome Task Requirements A task is a purpose for seeking information A task is a purpose for seeking information Features common to all tasks can be identified as the “task framework” Features common to all tasks can be identified as the “task framework” Setter Performer Setter Performer Task Model

Task-Oriented Evaluation Task Model is continuously refined Task Model is continuously refined Takes place within an external context Takes place within an external context Externally vs. internally generated tasks Externally vs. internally generated tasks Different criteria for judgementDifferent criteria for judgement

A Task-Oriented Test Collection 1. A textual or mixed media collection of documents 2. A description of the task which can include performer, outcome, and completion information 3. Natural language queries created by the performers submitted to the IR System 4. Relevance Judgments

Relevance Judgments (cont’d) Subjective (query to document) vs. Objective (query to end need) Subjective (query to document) vs. Objective (query to end need) Task oriented judgments viewed as subjective, but more narrowly construed- document to performer’s task model Task oriented judgments viewed as subjective, but more narrowly construed- document to performer’s task model Paper goes further because it accounts for the feedback and learning stage Paper goes further because it accounts for the feedback and learning stage Notion of relevance will be modified throughout the entire processNotion of relevance will be modified throughout the entire process Only after feedback will a definitive answer about the relevance of a document be determinedOnly after feedback will a definitive answer about the relevance of a document be determined

Implementation of Test Collection Task Descriptions Task Descriptions Use real tasksUse real tasks Categorize to allow for varietyCategorize to allow for variety Experts vs. novice generated tasks Experts vs. novice generated tasks Externally vs. Internally generated tasks Externally vs. Internally generated tasks Simple (well defined) vs. Complex (poorly defined) Simple (well defined) vs. Complex (poorly defined)

Implementation of Test Collection (2) Queries Queries Will be created by task performersWill be created by task performers All queries should be includedAll queries should be included

Implementation of Test Collection (3) Relevance Judgments Relevance Judgments Use individual weighted relevance judgments, not binary judgmentsUse individual weighted relevance judgments, not binary judgments Ask performers to judge relevance on a scale Ask performers to judge relevance on a scale

Statistical Measures- Recall A different view definition of recall and precision A different view definition of recall and precision Recall Recall Old:Old: Number of relevant documents retrieved ___________________________________ Total number of relevant documents New:New: Relevance weight of document retrieved ______________________________ Total relevance weight of all documents

Statistical Measures-Recall Accumulate amount of relevance for each document across all task performers Accumulate amount of relevance for each document across all task performers Gives most intuitive view of recall Gives most intuitive view of recall

Statistical Measures- Precision Precision Precision Old:Old: Number of relevant documents retrieved Total number of documents retrieved New:New: Relevance weight of documents retrieved Potential Relevance weight of documents retrieved

Statistical Measures- Precision For an overall picture of an IR system’s performance: For an overall picture of an IR system’s performance: Calculate precision score every time another document is retrievedCalculate precision score every time another document is retrieved Average recall and precision points across all queriesAverage recall and precision points across all queries

Advantages of Task Oriented Evaluation Easy to use Easy to use Can still compare results Can still compare results Incorporates elements of interaction Incorporates elements of interaction Is based in the real world with the user’s task in mind, along with the dynamic nature of seeking information Is based in the real world with the user’s task in mind, along with the dynamic nature of seeking information

Implications for Future Research Use for multi-media searches, which are not as clear cut. Use for multi-media searches, which are not as clear cut. How do we arrive at weighted relevance judgements? How do we arrive at weighted relevance judgements? Other measures besides recall and precision Other measures besides recall and precision