CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 (http://clef-qa.itc.it) The Multiple Language Question Answering Track at CLEF 2003.

Slides:



Advertisements
Similar presentations
1. 2 We feel part of a multilingual Europe and we would like to give our contribution: for this reason we decided to apply for the European Language label.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.
3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Alicante, September, 22, Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Impressions of 10 years of CLEF Donna Harman Scientist Emeritus National Institute of Standards and Technology.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
D L T French Question Answering in Technical and Open Domains Aoife O’Gorman Documents and Linguistic Technology Group Univeristy of Limerick.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Cross-Language Evaluation Forum CLEF Workshop 2004 Carol Peters ISTI-CNR, Pisa, Italy.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,
Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.
CLEF 2007 Workshop Budapest, September 19, 2007  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1),
D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
Text REtrieval Conference (TREC) Implementing a Question-Answering Evaluation for AQUAINT Ellen M. Voorhees Donna Harman.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Cross-Language Evaluation Forum CLEF 2003 Carol Peters ISTI-CNR, Pisa, Italy Martin Braschler Eurospider Information Technology AG.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Stiftung Wissenschaft und Politik German Institute for International and Security Affairs CLEF 2005: Domain-Specific Track Overview Michael Kluck SWP,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
AQUAINT AQUAINT Evaluation Overview Ellen M. Voorhees.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
Thomas Mandl: Robust CLEF Overview 1 Cross-Language Evaluation Forum (CLEF) Thomas Mandl Information Science Universität Hildesheim
The CLEF 2005 interactive track (iCLEF) Julio Gonzalo 1, Paul Clough 2 and Alessandro Vallin Departamento de Lenguajes y Sistemas Informáticos, Universidad.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
CLEF Workshop ECDL 2003 Trondheim Michael Kluck slide 1 Introduction to the Monolingual and Domain-Specific Tasks of the Cross-language.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…
Machine Reading.
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003 Bernardo Magnini*, Simone Romagnoli*, Alessandro Vallin* Jesús Herrera**, Anselmo Peñas**, Víctor Peinado**, Felisa Verdejo** Maarten de Rijke*** * ITC-irst, Centro per la Ricerca Scientifica e Tecnologica, Trento - Italy ** UNED, Spanish Distance Learning University, Madrid – Spain *** Language and Inference Technology Group, ILLC, University of Amsterdam - The Netherlands

Outline Overview of the Question Answering track at CLEF 2003 Report on the organization of QA tasks Present and discuss the participants’ results Perspectives for future QA campaigns

Question Answering QA: find the answer to an open domain question in a large collection of documents INPUT: questions (instead of keyword-based queries) OUTPUT: answers (instead of documents) QA track at TREC –Mostly fact-based questions Question: Who invented the electric light? Answer: Edison Scientific Community –NLP and IR –AQUAINT program in USA QA as an applicative scenario

Multilingual QA Purposes: Answers may be found in languages different from the language of the question Interest in QA systems for languages other than English Force the QA community to design real multilingual systems Check/improve the portability of the technologies implemented in current English QA systems Creation of reusable resources and benchmarks for further multilingual QA evaluation

QA at CLEF Organization  WEB SITE ( )  CLEF QA MAILING LIST (  GUIDELINES FOR THE TRACK (following the model of TREC 2001)

Tasks at CLEF question s target corpus exact answers 50 bytes answers

QA Tasks at CLEF 2003 Monolingual Bilingual  English Q-setAssessmentQ-setAssessment Italian ITC-irst NIST Dutch U. Amsterdam ITC-irst U. Amsterdam NIST Spanish UNED ITC-irst UNED NIST French ITC-irst U. Montreal NIST German ITC-irst DFKI NIST

Tasks at CLEF 2003 Monolingual Bilingual  English Q-setAssessmentQ-setAssessment Italian ITC-irst NIST Dutch U. Amsterdam ITC-irst U. Amsterdam NIST Spanish UNED ITC-irst UNED NIST French ITC-irst U. Montreal NIST German ITC-irst DFKI NIST

Bilingual against English English questions Question extraction Italian questions Translation English answers QA system Assessment English text collection 1 p/m for 200 questions2 p/d for 200 questions 4 p/d for 1 run (600 answers)

Document Collections Corpora licensed by CLEF in 2002: Dutch Algemeen Dagblad and NRC Handelsblad (years 1994 and 1995) Italian La Stampa and SDA press agency (1994) Spanish EFE press agency (1994) English Los Angeles Times (1994) MONOLINGUAL TASKS BILINGUAL TASK

Creating the Test Collection CLEF Topics 150 q/a Dutch 150 q/a Italian 150 q/a Spanish MONOLINGUAL TEST SETS 150 Dutch/English 150 Italian/English 150 Spanish/English ENGLISH QUESTIONS SHARING ILLCITC-irstUNED 300 Ita+Spa 300 Dut+Spa 300 Ita+Dut NEW TARGET LANGUAGES ENGLISH the DISEQuA corpus DATA MERGING

Questions 200 fact-based questions for each task: - queries related to the events occurred in the years 1994 and/or 1995, i.e. the years of the target corpora; - coverage of different categories of questions: date, location, measure, person, object, organization, other; - questions were not guaranteed to have an answer in the corpora: 10% of the test sets required the answer string “NIL”

Questions 200 fact-based questions for each task: - queries related to the events occurred in the years 1994 and/or 1995, i.e. the years of the target corpora - coverage of different categories of questions (date, location, measure, person, object, organization, other) - questions were not guaranteed to have an answer in the corpora: 10% of the test sets required the answer string “NIL” - definition questions (“Who/What is X”) - Yes/No questions - list questions

Answers Participants were allowed to submit up to three answers per question and up to two runs: - answers must be either exact (i.e. contain just the minimal information) or 50 bytes long strings - answers must be supported by a document - answers must be ranked by confidence Answers were judged by human assessors, according to four categories: CORRECT (R) UNSUPPORTED (U) INEXACT (X) INCORRECT (W)

Judging the Answers Question and judged responsesComment What museum is directed by Henry Hopkins? W 1 irstex031bi LA Modern Art U 1 irstex031bi LA UCLA X 1 irstex031bi LA Cultural Center The second answer was correct but the document retrieved was not relevant. The third response missed bits of the name, and was judged non-exact. Where did the Purussaurus live before becoming extinct? W 2 irstex031bi 1 9 NIL The system erroneously “believed” that the query had no answer in the corpus, or could not find one. When did Shapour Bakhtiar die? R 3 irstex031bi LA W 3 irstex031bi LA Monday In the questions that asked for the date of an event, the year was often regarded as sufficient. Who is John J. Famalaro accused of having killed? W 4 irstex031bi LA Clark R 4 irstex031bi LA Huber W 4 irstex031bi LA Department The second answer, that returned the victim’s last name, was considered sufficient and correct, since in the document retrieved no other people named “Huber” were mentioned.

Evaluation Measures The score for each question was the reciprocal of the rank of the first answer to be found correct; if no correct answer was returned, the score was 0. The total score, or Mean Reciprocal Rank (MRR), was the mean score over all questions. In STRICT evaluation only correct (R) answers scored points. In LENIENT evaluation the unsupported (U) answers were considered correct, as well.

Participants GROUPTASKRUN NAME DLSI-UA University of Alicante, Spain Monolingual Spanish alicex031ms alicex032ms UVA University of Amsterdam, The Netherlands Monolingual Dutch uamsex031md uamsex032md ITC-irst Italy Monolingual Italian Bilingual Italian irstex031mi irstst032mi irstex031bi irstex032bi ISI University of Southern California, USA Bilingual Spanish isixex031bs isixex032bs /Bilingual Dutch / DFKI Germany Bilingual German dfkist031bg CS-CMU Carnegie Mellon University, USA Bilingual French lumoex031bf lumoex032bf DLTG University of Limerick, Ireland Bilingual French dltgex031bf dltgex032bf RALI University of Montreal, Canada Bilingual French udemst031bf udemex032bf

Participants in past QA tracks Comparison between the number and place of origin of the participants in the past TREC and in this year’s CLEF QA tracks: PARTICIPANTS No. of submitted runs United States Canada EuropeAsiaAustraliaTOTAL TREC TREC /2775 TREC /3567 TREC /3267 CLEF //817

Performances at TREC-QA Evaluation metric: Mean Reciprocal Rank (MRR ) 1 rank of the correct answer Best result Average over 67 runs  / 500 TREC-8 TREC-9TREC-10 66% 25% 58% 24% 67% 23%

Results - EXACT ANSWERS RUNS MONOLINGUAL TASKS GROUP TASKRUN NAMEMRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned DLSI-UA Monolingual Spanish alicex031ms alicex032ms ITC-irst Monolingual Italian irstex031mi UVA Monolingual Dutch uamsex031md uamsex032md

Results - EXACT ANSWERS RUNS MONOLINGUAL TASKS

Results - EXACT ANSWERS RUNS CROSS-LANGUAGE TASKS GROUPTASKRUN NAMEMRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned ISI Bilingual Spanish isixex031bs isixex032bs ITC-irst Bilingual Italian irstex031bi irstex032bi CS-CMU Bilingual French lumoex031bf lumoex032bf DLTG Bilingual French dltgex031bf dltgex032bf RALI Bilingual French udemex032bf

Results - EXACT ANSWERS RUNS CROSS-LANGUAGE TASKS

Results - 50 BYTES ANSWERS RUNS MONOLINGUAL TASKS GROUPTASKRUN NAME MRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned ITC-irst Monolingual Italian irstst032mi

Results - 50 BYTES ANSWERS RUNS CROSS-LANGUAGE TASKS GROUPTASKRUN NAMEMRR No. of Q. with at least one right answer NIL Questions strictlenientstrictlenientreturnedcorrectly returned DFKI Bilingual German dfkist031bg RALI Bilingual French udemst031bf

Average Results in Different Tasks

Approaches in CL QA Two main different approaches used in Cross-Language QA systems: answer extraction question processing answer extraction question processing in the source language to retrieve information (such as keywords, question focus, expected answer type, etc.) translation and expansion of the retrieved data 1 2 translation of the question into the target language (i.e. in the language of the document collection)

Approaches in CL QA Two main different approaches used in Cross-Language QA systems: answer extraction question processing answer extraction preliminary question processing in the source language to retrieve information (such as keywords, question focus, expected answer type, etc.) translation and expansion of the retrieved data 1 2 translation of the question into the target language (i.e. in the language of the document collection) ITC-irst RALI DFKI ISI CS-CMU Limerik

Conclusions  A pilot evaluation campaign for multiple language Question Answering Systems has been carried on.  Five European languages were considered: three monolingual tasks and five bilingual tasks against an English collection have been activated.  Considering the difference of the task, results are comparable with QA at TREC.  A corpus of 450 questions, each in four languages, reporting at least one known answer in the respective text collection, has been built.  This year experience was very positive: we intend to continue with QA at CLEF 2004.

Perspective for Future QA Campaigns Organization issues: Promote larger participation Collaboration with NIST Financial issues: Find a sponsor: ELRA, the new CELCT center, … Tasks (to be discussed) Update to TREC-2003: definition questions, list questions Consider just “exact answer”: 50 bytes did not have much favor Introduce new languages: in the cross-language task this is easy to do New steps toward multilinguality: English questions against other language collections; a small set of full cross-language tasks (e.g. Italian/Spanish).

Creation of the Question Set 1. Find 200 questions for each language (Dutch, Italian, Spanish), based on CLEF-2002 topics, with at least one answer in the respective corpus. 2. Translate each question into English, and from English into the other two languages. 3. Find answers in the corpora of the other languages (e.g. a Dutch question was translated and processed in the Italian text collection). 4.The result is a corpus of 450 questions, each in four languages, with at least one known answer in the respective text collection. More details in the paper and in the Poster. 5. Questions with at least one answer in all the corpora were selected for the final question set.