CLEF 2008 Multilingual Question Answering Track

Slides:



Advertisements
Similar presentations
CLEF QA, September 21, 2006, Synapse Développement, D. LAURENT Why not 100% ?
Advertisements

TRANSNATIONAL REPORT on the National Workshops organized in the framework of the NELLIP project.
Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,
1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.
ModularTe project Summary of evaluation results ITALY – ARIES FORMAZIONE Aldo Minardo, PhD.
3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.
1 CLEF 2009, Corfu Question Answering Track Overview J. Turmo P.R. Comas S. Rosset O. Galibert N. Moreau D. Mostefa P. Rosso D. Buscaldi D. Santos L.M.
Alicante, September, 22, Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
Answer Validation Exercise Anselmo Peñas UNED NLP Group 2005 Breakout session.
With or without users? Julio Gonzalo UNEDhttp://nlp.uned.es.
 Ad-hoc - This track tests mono- and cross- language text retrieval. Tasks in 2009 will test both CL and IR aspects.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
Welcome to the iTEC People & Events Directory … key points!
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
IATE EU tool for translation-oriented terminology work
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
A Language Independent Method for Question Classification COLING 2004.
CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),
D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.
LREC 2010 Malta, May 20, 2010  ELDA 1 Evaluation Protocol and Tools for Question-Answering on Speech Transcripts N. Moreau, O. Hamon, D. Mostefa ELDA/ELRA,
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
Ecolearning project Methodology for the Work Package 4 Adaptation of the training material to the national realities.
CLEF 2007 Workshop Budapest, Hungary, 19–21 September 2007 Nicola Ferro Information Management Systems (IMS) Research Group Department of Information Engineering.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
 General domain question answering system.  The starting point was the architecture described in Brill, Eric. ‘Processing Natural Language without Natural.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Eurostat Ag.no "Annex 2" supplement to Eurostat Annual Report, October 2015 Working Group on Article 64 and Article 65 of the Staff Regulations Meeting.
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
Experiments for the CL-SR task at CLEF 2006
Patrick Staes and Ann Stoffels
2.1. ESS Agreement on Learning Mobility (IVET & Youth)

Marine Strategy Framework Directive: Transposition and Implementation
Marine Strategy Framework Directive: Status of reporting
What is the Entrance Exams Task
UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…
Statistics Explained goes multilingual
Leveraging Multilingual Helpdesk Services with Quality MT

Machine Reading.
Presentation transcript:

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner

QA 2008 Task and Exercises QA Main task (6th edition) Pilot: QA WSD, English newswire collections with Word Sense Disambiguation Answer Validation Exercise – AVE (3rd edition) QA on Speech Transcripts – QAST (2nd edition)

Main Task QA 2008 Organizing Committee CELCT (D. Giampiccolo, P. Forner): Italian UNED (A. Peñas): Spanish U. Groeningen (G. Bosma): Dutch U. Limerick (R. Sutcliff): English DFKI (B. Sacalenau): German ELDA/ELRA (N. Moreau): French Linguateca (P. Rocha): Portuguese Bulgarian Academy of Sciences (P. Osenova): Bulgarian IASI (C. Forascu): Romanian U. Basque Country (I. Alegria): Basque ILSP (P.Prokopidis): Greek

Evolution of the Track 2003 2004 2005 2006 2007 2008 Target languages 9 10 11 Collections News 1994 +News 1995 +Wikipedia Nov. 2006 Type of questions 200 Factoid + Temporal restrictions + Definitions - Type of question + Lists + Linked questions + Closed lists Supporting information Doc. Snippet Pilots and Exercises Temporal restrictions Lists AVE Real Time WiQA QAST WSDQA

200 questions FACTOID DEFINITION CLOSED LIST LINKED QUESTIONS (loc, mea, org, per, tim, cnt, obj , oth) DEFINITION (per, org, obj, oth) CLOSED LIST Who were the components of The Beatles? Who were the last three presidents of Italy? LINKED QUESTIONS Who was called the “Iron-Chancellor”? When was he born? Who was his first wife? Temporal restrictions by date, by period, by event NIL questions (without known answer in the collection)

SOURCE LANGUAGES (questions) 43 Activated Language Combinations (at least one registered participant) TARGET  LANGUAGES  (corpus and answers) BG DE EL EN ES EU FR IT NL PT RO SOURCE LANGUAGES (questions)

Activated Tasks MONOLINGUAL CROSS-LINGUAL TOTAL CLEF 2003 3 5 8 7 Activated Tasks MONOLINGUAL CROSS-LINGUAL TOTAL CLEF 2003 3 5 8 CLEF 2004 6 13 19 CLEF 2005 15 23 CLEF 2006 7 17 24 CLEF 2007 29 37 CLEF 2008 10 33 43

Submitted runs Submitted runs Monolingual Cross-lingual CLEF 2003 17 6   Submitted runs Monolingual Cross-lingual CLEF 2003 17 6 11 CLEF 2004 48 (+182%) 20 28 CLEF 2005 67 (+40%) 43 24 CLEF 2006 77 (+15%) 42 35 CLEF 2007 37 (-52%) CLEF 2008 51 (+38%) 31 8

Participant groups CLEF 2003 - 8 CLEF 2004 13 5 18 (+125%) 22   Newcomers Veterans TOTAL Registered CLEF 2003 - 8 CLEF 2004 13 5 18 (+125%) 22 CLEF 2005 9 15 24 (+33%) 27 CLEF 2006 10 20 30 (+25%) 36 CLEF 2007 14 (-26%) 29 CLEF 2008 21 33

List of Participants (random order) Bulgaria

Groups per year and target collection Natural selection? Task Change Above 20 groups

Groups per target collection

2008 participation: Comparative evaluation? Language Runs Different groups Portuguese 9 6 Spanish 10 4 English 5 German 11 3 Romanian 2 Dutch 1 Basque French Bulgarian Italian Greek Lack from evaluation perspective: 4 languages without comparison between different groups Breakout session

Results: Best and Average scores

Best scores by language

Best scores by participant

Results depend on type of questions Definitions Almost solved for several systems 80%-95% Factoids 50%-65% for several systems Temporal restrictions Same level of difficulty as factoids for some systems Closed lists Still very difficult Linked questions Now wikipedia provides more answers

Conclusion Same task as 2007 Same level of participation (slightly better) 11 target languages (9 with participation) 43 activated subtasks 21 participants 51 runs Same results (slightly better)

Future direction Less participants per language Poor comparison Change methodology: one task for all Critics to QA over wikipedia Easier to find questions with IR No user model Change collection QA proposal for 2009 SC and breakout