Do Batch and User Evaluations Give the Same Results? Authors: William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,

Slides:



Advertisements
Similar presentations
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Results Need to be Diverse Mark Sanderson University of Sheffield.
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Search Engines and Information Retrieval
Evaluating Evaluation Measure Stability Authors: Chris Buckley, Ellen M. Voorhees Presenters: Burcu Dal, Esra Akbaş.
Modern Information Retrieval
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Evaluating the Performance of IR Sytems
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
R U There? Looking for those Teaching Moments in Chat Transcripts Frances Devlin, John Stratton and Lea Currie University of Kansas ALA Annual Conference.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Author Names with affiliation. Introduction Briefly explain the study in not more than four bulleted sentences.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
Evaluation of Evaluation in Information Retrieval - Tefko Saracevic Historical Approach to IR Evaluation.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
WikiQuery.org -- An interactive collaboration interface for creating, storing and sharing effective CNF queries Le Zhao*, Xiaozhong Liu #, Jamie Callan*
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
Is Relevance Associated with Successful Use of Information Retrieval Systems? William Hersh Professor and Head Division of Medical Informatics & Outcomes.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Jane Reid, AMSc IRIC, QMUL, 16/10/01 1 Evaluation of IR systems Jane Reid
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals  Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Introduction to Access. Access 2010 is a database creation and management program.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan.
Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Quality of a Search Engine.
D O B ATCH AND U SER E VALUATIONS G IVE THE S AME R ESULTS ? William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek,
We are Gathered Here Together: Why? Introduction to the NII Workshop on Whole-Session Evaluation of Interactive Information Retrieval Systems Whole-Session.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Lab Report. Title Page Should be a concise statement of the main topic and should identify the actual variables under investigation and the relationship.
Text Similarity: an Alternative Way to Search MEDLINE James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami and Harold R. Garner Translational Research.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Walid Magdy Gareth Jones
Evaluation.
IR Theory: Evaluation Methods
Q4 Measuring Effectiveness
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
MECH 3550 : Simulation & Visualization
Building Valid, Credible, and Appropriately Detailed Simulation Models
Cumulated Gain-Based Evaluation of IR Techniques
Relevance and Reinforcement in Interactive Browsing
Retrieval Evaluation - Reference Collections
HIGHLIGHTS FOR READERS
Retrieval Performance Evaluation - Measures
Presentation transcript:

Do Batch and User Evaluations Give the Same Results? Authors: William Hersh, Andrew Turpin, Susan Price, Benjamin Chan, Dale Kraemer, Lynetta Sacherek, Daniel Olson Presenters: Buğra M. Yıldız, Emre Varol

Introduction-1 There is a continuing debate about whether the results from batch evaluations, consisting of measuring recall and precision in a non- intractive environment can be generalized to real world. Some IR researchers argue that searching in real world is much more complex.

Introduction-2 If the batch searching results don't reflect the real world case, then system design decisions and measurement of results are misleading. The purpose of this study is to capture whether IR approaches that showed good performance in lab settings(batch environment), could be translate that effectiveness to real world.

Experiment 1 Purpose: Establishment of the best weighting approach for batch searching using previous TREC interactive data. Setup: MG retrieval system is used. The prior interactive data( from TREC 6 and 7) is converted into a tst collection.

Experiment 1 Results: The experiment set out to determine a baseline performance and one with maximum improvement that could be usd in subsequent user experiments. Q-ExpressionWeighting TypeAverage Precision% Improvement BB-ACB-BAATFIDF % AB-BFD-BAAOkapi %

Experiment 2 Purpose: Determining whether batch measures give comparable results with human searchers with the new TREC interactive data. Setup: The main performance measure used in the TREC-8 interactive track was instance recall, defined as the proportion of true instances identified by a user searching on the topic.

Experiment 2 Results: 12 librarians and 12 graduate students participated to the experiment. While there was essentially no difference between searcher types, the Okapi system showed an 18.2% improvement in instance recall and an 8.1% improvement in instance precision, both of which were not statistically significant.

Experiment 3 and 4 Experiment 3 is the verification of that the experiment results are not result of the data sets. Experiment 4:

Conclusion The experiments in this study showed that batch and user searching experiments do not give the same result. But, since this is a limited study, further researches should be done for getting a clearer result.