Walid Magdy Gareth Jones

Slides:



Advertisements
Similar presentations
Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Cumulative Progress in Language Models for Information Retrieval Antti Puurula 6/12/2013 Australasian Language Technology Workshop University of Waikato.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Search Results Need to be Diverse Mark Sanderson University of Sheffield.
Section Based Relevance Feedback Student: Nat Young Supervisor: Prof. Mark Sanderson.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
The Status of Retrieval Evaluation in the Patent Domain Mihai Lupu Vienna University of Technology Information Retrieval Facility PaIR.
Evaluating Evaluation Measure Stability Authors: Chris Buckley, Ellen M. Voorhees Presenters: Burcu Dal, Esra Akbaş.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Evaluating the Performance of IR Sytems
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Modern Retrieval Evaluations Hongning Wang
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
Building a Domain-Specific Document Collection for Evaluating Metadata Effects on Information Retrieval Walid Magdy, Jinming Min, Johannes Leveling, Gareth.
A Study on Query Expansion Methods for Patent Retrieval Walid MagdyGareth Jones Centre for Next Generation Localisation School of Computing Dublin City.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets  Test collections: evaluating rankings Interleaving.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Lucene. Lucene A open source set of Java Classses ◦ Search Engine/Document Classifier/Indexer 
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment.
Performance Measurement. 2 Testing Environment.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
TREC-2003 (CDVP TRECVID 2003 Team)- 1 - Center for Digital Video Processing C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g CDVP & TRECVID-2003.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Modern Retrieval Evaluations Hongning Wang
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Efficient Result-set Merging Across Thousands of Hosts Simulating an Internet-scale GIR application with the GOV2 Test Collection Christopher Fallen Arctic.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Evaluation. The major goal of IR is to search document relevant to a user query. The evaluation of the performance of IR systems relies on the notion.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Sampath Jayarathna Cal Poly Pomona
Developments in Evaluation of Search Engines
Evaluation of IR Systems
An Empirical Study of Learning to Rank for Entity Search
Applying Key Phrase Extraction to aid Invalidity Search
ارزيابی قابليت استفاده مجدد مجموعه تست‌ها دارای قضاوت‌های چندسطحی Reusability Assessment of Test Collections with Relevance Levels of Judgments مريم.
IR Theory: Evaluation Methods
Panos Ipeirotis Luis Gravano
Cost Sensitive Evaluation Measures for F-term Classification
Feature Selection for Ranking
Cumulated Gain-Based Evaluation of IR Techniques
1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S
Precision and Recall Reminder:
Presentation transcript:

Walid Magdy Gareth Jones CLEF, 21 Sep 2010 Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements Walid Magdy Gareth Jones Dublin City University

Patent Retrieval Search collection of patents for relevant ones Objective: find all possible relevant documents Search: takes much longer Users: professionals and more patient IR Campaigns: NTCIR, TREC, CLEF Evaluation: MAP, recall, PRES Focuses on finding more relevant documents in relative good ranks Focuses on finding relevant documents earlier Focuses on finding more relevant documents W. Magdy and G. Jones. PRES: a score metric for evaluating recall-oriented information retrieval applications. SIGIR 2010

What’s up? Missing a relevant document in patent search is harmful What about missing it in the relevance judgements? How evaluation metrics will be affected? Are the metrics robust in evaluating systems? Bompad et al. On the robustness of relevance measures with incomplete judgements. SIGIR 2007

Data Used CLEF-IP 2009 qrels for 400 topics Avg. number of relevant documents per topic = 6 48 runs submitted by 15 participants Runs ranked according to MAP, recall, and PRES

Experimental Setup Create versions of incomplete judgements (20%, 40%, 60%, 80% of the qrels) Re-compute scores with the new judgements Re-rank runs according to new scores Monitor the change in ranking Measure correlation between ranking using Kendall Tau The higher the correlation the more robust the metric

Results Voorhees E. M. Evaluation by highly relevant documents. SIGIR 2001 Kendall tau > 0.9: nearly equivalent ranking Kendall tau < 0.8: noticeable change in ranking

Conclusion MAP is not a robust score for evaluating patent search when relevance judgements are incomplete PRES & recall are more robust

Recommendation Based on metrics robustness + performance for patent search evaluation Stop using MAP - does not reflect system recall - not robust with incomplete judgements Start using PRES - reflects system recall + quality of ranking - highly robust with incomplete judgements Get PRESeval from: www.computing.dcu.ie/~wmagdy/PRES.htm

Thank you Get PRESeval from: www.computing.dcu.ie/~wmagdy/PRES.htm

Number of Relevant Docs per Topic