Thomas Mandl: Robust CLEF 2006 - Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim

Slides:



Advertisements
Similar presentations
Thomas Mandl, Julia Maria Schulz LREC 2010, Web Logs & QA, /10 Log-Based Evaluation Resources for Question Answering Thomas Mandl, Julia Maria.
Advertisements

LogCLEF 2009 Log Analysis for Digital Societies (LADS) Thomas Mandl, Maristella Agosti, Giorgio Maria Di Nunzio, Alexander Yeh, Inderjeet Mani, Christine.
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010.
Search Results Need to be Diverse Mark Sanderson University of Sheffield.
Current and Future Research Directions University of Tehran Database Research Group 1 October 2009 Abolfazl AleAhmad, Ehsan Darrudi, Hadi.
Slide 1 ERFP Website The German Centre for Documentation and Information in Agriculture 11 th Workshop for European National.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.
Information Retrieval in Practice
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
With or without users? Julio Gonzalo UNEDhttp://nlp.uned.es.
 Ad-hoc - This track tests mono- and cross- language text retrieval. Tasks in 2009 will test both CL and IR aspects.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
MAKING THE INTERNET WORK FOR A HEALTHCARE FACILITY Creating functional websites with optimal Customer Service.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
Search Engines and Information Retrieval Chapter 1.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
LREC 2008 From Research to Application in Multilingual Information Access: The Contribution of Evaluation Carol Peters ISTI-CNR, Pisa, Italy.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Some Personal Observations Donna Harman NIST. Language issues I see learning about accessing information both within and across different languages as.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Slide 1 ERFP Website The German Centre for Documentation and Information in Agriculture 10 th Workshop for European National.
Multilingual Retrieval Experiments with MIMOR at the University of Hildesheim René Hackl, Ralph Kölle, Thomas Mandl, Alexandra Ploedt, Jan-Hendrik Scheufen,
A merging strategy proposal: The 2-step retrieval status value method Fernando Mart´inez-Santiago · L. Alfonso Ure ˜na-L´opez · Maite Mart´in-Valdivia.
Thomas Mandl: GeoCLEF Track Overview th Workshop of the Cross-Language Evaluation Forum (CLEF) Århus, 18 th Sept
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.
1 Flexible and Efficient Toolbox for Information Retrieval MIRACLE group José Miguel Goñi-Menoyo (UPM) José Carlos González-Cristóbal (UPM-Daedalus) Julio.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Stiftung Wissenschaft und Politik German Institute for International and Security Affairs CLEF 2005: Domain-Specific Track Overview Michael Kluck SWP,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Departamento de Lenguajes y Sistemas Informáticos Cross-language experiments with IR-n system CLEF-2003.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
1 The Domain-Specific Track at CLEF 2007 Vivien Petras, Stefan Baerisch & Max Stempfhuber GESIS Social Science Information Centre, Bonn, Germany Budapest,
Crowdsourcing Blog Track Top News Judgments at TREC Richard McCreadie, Craig Macdonald, Iadh Ounis {richardm, craigm, 1.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
Multilingual Search using Query Translation and Collection Selection Jacques Savoy, Pierre-Yves Berger University of Neuchatel, Switzerland
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Walid Magdy Gareth Jones
College of Information
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim 7 th Workshop of the Cross-Language Evaluation Forum (CLEF) Alicante 23 Sept CLEF Robust Overview

Thomas Mandl: Robust CLEF Overview 2 The user never sees the perspective of an evaluation (=MAP) but only the performance on his/her request(s). Why Robust?

Thomas Mandl: Robust CLEF Overview 3 Why Robust? « The unhappy customer, on average, will tell 27 other people about their experience. With the use of the internet, whether web pages or , that number can increase to the thousands …» « Dissatisfied customers tell an average of ten other people about their bad experience. Twelve percent tell up to twenty people. » → Bad news travels fast. Credit to Jacques Savoy for slide

Thomas Mandl: Robust CLEF Overview 4 On the other hand, satisfied customers will tell an average of five people about their positive experience. → Good news travels somewhat slower Why Robust? Credit to Jacques Savoy for slide Your system should produce less bad news! improve on hard topics best topics Don’t worry too much about the good news

Thomas Mandl: Robust CLEF Overview 5 Which system is better? Topics

Thomas Mandl: Robust CLEF Overview 6 Purpose: Robustness in CLIR Robustness in multilingual retrieval –Stable performance over all topics instead of high average performance (like at TREC, just for other languages) –Stable performance over all topics for multi-lingual retrieval –Stable performance over different languages (so far at CLEF ?)

Thomas Mandl: Robust CLEF Overview 7 Ultimate Goal „More work needs to be done on customizing methods for each topic“ (Harman 2005) Hard, but some work ist done –RIA Worshop –SIGIR Workshop on Topic Difficulty –TREC Robust Track (until 2005) –...

Thomas Mandl: Robust CLEF Overview 8 Data Collection LanguageCollection EnglishLA Times 94, Glasgow Herald 95 FrenchATS (SDA) 94/95, Le Monde 94 ItalianLa Stampa 94, AGZ (SDA) 94/95 DutchNRC Handelsblad 94/95, Algemeen Dagblad 94/95 GermanFrankfurter Rundschau 94/95, Spiegel 94/95, SDA 94 SpanishEFE 94/95 Six languages 1.35 million documents 3.6 gigabytes of text

Thomas Mandl: Robust CLEF Overview 9 Data Collections # Topics Relevance Judgements Documents #91-140# Missing relevance judgements CLEF Year

Thomas Mandl: Robust CLEF Overview 10 Test and Training Set Arbitrary split of the 160 topics 60 training topics 100 test topics

Thomas Mandl: Robust CLEF Overview 11 Identification of Difficult Topics Finding a set of difficult topics... was difficult –Before the campaign, no topics were found which were difficult for more than one language or task at CLEF 2001, CLEF 2002 and CLEF 2003 –Several definitions of difficulty were tested Just as at Robust TREC –Topics are not difficult by themselves –but only in interaction with a collection

Thomas Mandl: Robust CLEF Overview 12 Task Design for Robustness Sub-Tasks –Mono-lingual Only for the six document languages as topic language –Bi-lingual it->es fr->nl en->de –Multi-lingual

Thomas Mandl: Robust CLEF Overview 13Sub-TasksSub-Tasks Submission of Training data was encouraged –Further analysis of topic difficulty Overall, systems did better on test topics

Thomas Mandl: Robust CLEF Overview 14ParticipantsParticipants U. Coruna & U. Sunderland (Spain & UK) U. Jaen (Spain) DAEDALUS & Madrid Univs. (Spain) U. Salamanca – REINA (Spain) Hummingbird Core Tech. (Canada) U. Neuchatel (Switzerland) Dublin City U. – Computing (Ireland) U. Hildesheim – Inf. Sci. (Germany)

Thomas Mandl: Robust CLEF Overview 15 More than 100 Runs... TaskLanguagenr test runsnr training runsnr groups mono en1376 fr18107 nl733 de733 es1155 it1155 bi it->es823 fr->nl401 en->de512 multi 1034

Thomas Mandl: Robust CLEF Overview 16 Results – Mono-lingual

Thomas Mandl: Robust CLEF Overview 17 Results – Mono-lingual

Thomas Mandl: Robust CLEF Overview 18 Results – Mono-lingual English

Thomas Mandl: Robust CLEF Overview 19 Results – Mono-lingual Italian

Thomas Mandl: Robust CLEF Overview 20 Results – Mono-lingual Spanish

Thomas Mandl: Robust CLEF Overview 21 Results – Mono-lingual French

Thomas Mandl: Robust CLEF Overview 22 Results – Mono-lingual German

Thomas Mandl: Robust CLEF Overview 23 Results – Bi-lingual

Thomas Mandl: Robust CLEF Overview 24 Results – Multi-lingual

Thomas Mandl: Robust CLEF Overview 25 Results – Multi-lingual

Thomas Mandl: Robust CLEF Overview 26 Topic Analysis Example: topic 64 –Easiest topic for mono and bi German –Hardest topic for mono Italian Example: topic 144 –Easiest topic for four bi Dutch –hardest for mono and bi German Example: topic 146 –Among three hardest topics for four sub-tasks –mid-range for all other sub-tasks

Thomas Mandl: Robust CLEF Overview 27ApproachesApproaches SINAI expanded with terms gathered from a web search engine [Martinez-Santiago et al. 2006] REINA used a heuristic for determining hard topics during training. Different expansion techniques were applied [Zazo et al. 2006] Hummingbird experimented with other evaluation measures then used in the track [Tomlinson 2006]. MIRACLE tried to find a fusion scheme which is good for the robust measure [Goni-Menoyo et al. 2006]

Thomas Mandl: Robust CLEF Overview 28OutlookOutlook What can we do with the data? Have people improved in comparison to CLEF 2001 through 2003? Are low MAP scores a good indicator of topic difficulty?

Thomas Mandl: Robust CLEF Overview 29AcknowledgementsAcknowledgements Giorgio di Nunzio & Nicola Ferro (U Padua) Robust Committee –Donna Harman (NIST) –Carol Peters (ISTI-CNR) –Jacques Savoy (U Neuchâtel) –Gareth Jones (Dublin City U) Ellen Voorhees (NIST) Participants

Thomas Mandl: Robust CLEF Overview 30 Thanks for your Attention I am looking forward to the Discussion Please come to the Breakout Session