CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

Slides:



Advertisements
Similar presentations
CLEF QA, September 21, 2006, Synapse Développement, D. LAURENT Why not 100% ?
Advertisements

TRANSNATIONAL REPORT on the National Workshops organized in the framework of the NELLIP project.
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,
TrebleCLEF Evaluation package and language resources CLEF Evaluation Packages Nicolas Moreau / Khalid Choukri - ELDA.
1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.
Using the CEFR in Catalonia Neus Figueras
ModularTe project Summary of evaluation results ITALY – ARIES FORMAZIONE Aldo Minardo, PhD.
3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.
1 CLEF 2009, Corfu Question Answering Track Overview J. Turmo P.R. Comas S. Rosset O. Galibert N. Moreau D. Mostefa P. Rosso D. Buscaldi D. Santos L.M.
Alicante, September, 22, Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
Answer Validation Exercise Anselmo Peñas UNED NLP Group 2005 Breakout session.
With or without users? Julio Gonzalo UNEDhttp://nlp.uned.es.
 Ad-hoc - This track tests mono- and cross- language text retrieval. Tasks in 2009 will test both CL and IR aspects.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
IATE EU tool for translation-oriented terminology work
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,
A Language Independent Method for Question Classification COLING 2004.
CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),
New RCLayout. Do product layout 3 improvements All products Local databases New functionalities.
LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
GeoCLEF Breakout Notes Fred Gey, Ray Larson, Paul Clough.
Ecolearning project Methodology for the Work Package 4 Adaptation of the training material to the national realities.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
 General domain question answering system.  The starting point was the architecture described in Brill, Eric. ‘Processing Natural Language without Natural.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
CLEER-Project Helmer Schweizer Past-President EUROMCONTACT
Patrick Staes and Ann Stoffels
2.1. ESS Agreement on Learning Mobility (IVET & Youth)

Marine Strategy Framework Directive: Transposition and Implementation
Marine Strategy Framework Directive: Status of reporting
Sogeti: User support in numbers
What is the Entrance Exams Task
UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…
Statistics Explained goes multilingual
Leveraging Multilingual Helpdesk Services with Quality MT

Machine Reading.
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner

2 QA 2008 Task and Exercises  QA Main task (6th edition) Pilot: QA WSD, English newswire collections with Word Sense Disambiguation  Answer Validation Exercise – AVE (3rd edition)  QA on Speech Transcripts – QAST (2nd edition)

3 Main Task QA 2008 Organizing Committee  CELCT (D. Giampiccolo, P. Forner): Italian  UNED (A. Peñas): Spanish  U. Groeningen (G. Bosma): Dutch  U. Limerick (R. Sutcliff): English  DFKI (B. Sacalenau): German  ELDA/ELRA (N. Moreau): French  Linguateca (P. Rocha): Portuguese  Bulgarian Academy of Sciences (P. Osenova): Bulgarian ♦ IASI (C. Forascu): Romanian ♦ U. Basque Country (I. Alegria): Basque ♦ ILSP (P.Prokopidis): Greek

4 Evolution of the Track Target languages Collections News 1994+News 1995+Wikipedia Nov Type of questions 200 Factoid + Temporal restrictions + Definitions - Type of question + Lists + Linked questions + Closed lists Supporting information Doc.Snippet Pilots and Exercises Temporal restrictions Lists AVE Real Time WiQA AVE QAST AVE QAST WSDQA

5 200 questions  FACTOID  ( loc, mea, org, per, tim, cnt, obj, oth )  DEFINITION  (per, org, obj, oth)  CLOSED LIST  Who were the components of The Beatles?  Who were the last three presidents of Italy?  LINKED QUESTIONS Who was called the “Iron-Chancellor”? When was he born? Who was his first wife? ♦ Temporal restrictions by date, by period, by event ♦ NIL questions (without known answer in the collection)

6 43 Activated Language Combinations (at least one registered participant) TARGET LANGUAGES (corpus and answers) BGDEELENESEUFRITNLPTRO SOURCE LANGUAGES (questions) BG DE EL EN ES EU FR IT NL PT RO

7 7 Activated Tasks MONOLINGUALCROSS-LINGUALTOTAL CLEF CLEF CLEF CLEF CLEF CLEF

8 8 Submitted runs MonolingualCross-lingual CLEF CLEF (+182%) 2028 CLEF (+40%) 4324 CLEF (+15%) 4235 CLEF (-52%) 2017 CLEF (+38%) 3120

9 Participant groups NewcomersVeteransTOTALRegistered CLEF CLEF (+125%) 22 CLEF (+33%) 27 CLEF (+25%) 36 CLEF (-26%) 29 CLEF

10 List of Participants (random order) Bulgaria

11 Groups per year and target collection Task Change Natural selection? Above 20 groups

12 Groups per target collection

participation: Comparative evaluation? Lack from evaluation perspective: 4 languages without comparison between different groups Breakout session LanguageRuns Different groups Portuguese96 Spanish104 English54 German113 Romanian42 Dutch41 Basque41 French31 Bulgarian11 Italian00 Greek00

14 Results: Best and Average scores

15 Best scores by language

16 Best scores by participant

17 Results depend on type of questions  Definitions Almost solved for several systems 80%-95%  Factoids 50%-65% for several systems  Temporal restrictions Same level of difficulty as factoids for some systems  Closed lists Still very difficult  Linked questions Still very difficult  Now wikipedia provides more answers

18 Conclusion  Same task as 2007 Same level of participation (slightly better)  11 target languages (9 with participation)  43 activated subtasks  21 participants  51 runs Same results (slightly better)

19 Future direction  Less participants per language Poor comparison Change methodology: one task for all  Critics to QA over wikipedia Easier to find questions with IR No user model Change collection  QA proposal for 2009 SC and breakout