Download presentation
Presentation is loading. Please wait.
1
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo
2
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Outline Tasks Test set preparation Participants Evaluation Results Final considerations Future perspectives
3
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop QA 2006: Organizing Committee ITC-irst (Bernardo Magnini): main coordinator CELCT (D. Giampiccolo, P. Forner): general coordination, Italian DFKI (B. Sacalenau): German ELDA/ELRA (C. Ayache): French Linguateca (P. Rocha): Portuguese UNED (A. Penas): Spanish U. Amsterdam (Valentin Jijkoun): Dutch U. Limerick (R. Sutcliff): English Bulgarian Academy of Sciences (P. Osenova): Bulgarian ♦Only Source Languages: ♦Depok University of Indonesia (M. Adriani): Indonesian ♦IASI, Romania (D. Cristea): Romanian ♦Wrocław University of Technology (J. Pietraszko): Polish
4
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop QA@CLEF-06: Tasks Main task: ♦Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same ♦Cross-lingual: the questions were formulated in a language different from that of the news collection One pilot task: ♦WiQA: coordinated by Maarten de Rijke Two exercises: Answer Validation Exercise (AVE): coordinated by Anselmo Peñas Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis)
5
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Data set: Question format 200 Questions of three kinds FACTOID ( loc, mea, org, oth, per, tim; ca. 150): What party did Hitler belong to? DEFINITION (ca. 40): Who is Josef Paul Kleihues? ♦reduced in number (-25%) ♦ two new categories added: –Object: What is a router? –Other: What is a tsunami? LIST (ca. 10): Name works by Tolstoy ♦Temporally restricted (ca. 40): by date, by period, by event ♦NIL (ca. 20): questions that do not have any known answer in the target document collection input format: question type (F, D, L) not indicated NEW!
6
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Multiple answers: from one to ten exact answers per question ♦exact = neither more nor less than the information required ♦each answer has to be supported by – docid – one to ten text snippets justifying the answer (substrings of the specified document giving the actual context) NEW! Data set: run format NEW!
7
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Activated Tasks (at least one registered participant) S T BGDEENESFRINITNLPTPLRO BG DE EN ES FR IT NL PT 11 Source languages (10 in 2005) 8 Target languages (9 in 2005) No Finnish task / New languages: Polish and Romanian
8
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Activated Tasks MONOLINGUALCROSS-LINGUALTOTAL CLEF 2003358 CLEF 200461319 CLEF 200581523 CLEF 200671724 questions were not translated in all the languages Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant NEW! More interest in cross-linguality
9
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Participants AmericaEuropeAsiaTOTAL Registered participants New comersVeterans Absent veterans CLEF 2003 35-8 CLEF 2004 117- 18 (+125%) 22 1353 CLEF 2005 1221 24 (+33%) 27 9154 CLEF 2006 4242 30 (+25%) 36 10204
10
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop List of participants ACRONYMNAMECOUNTRY SYNAPSESYNAPSE DeveloppementFrance Ling-CompU.Rome-La SapienzaItaly AlicanteU.Alicante- InformaticaSpain HagenU.Hagen-InformaticsGermany DaedalusDaedalus ConsortiumSpain JaenU.Jaen-Intell.SystemsSpain ISLAU.AmsterdamNetherlands INAOEInst.Astrophysics,Optics&ElectronicsMexico DEPOKU.Indonesia-Comp.Sci.Indonesia DFKIDFKI-Lang.Tech.Germany FURUI Lab.Tokyo Inst TechnologyJapan LinguatecaLinguateca-SintefNorway LIC2M-CEACentre CEA SaclayFrance LINAU.Nantes-LINAFrance PriberamPriberam InformaticaPortugal U.PortoU.Porto- AIPortugal U.GroningenU.Groningen-LettersNetherlands ACRONYMNAMECOUNTRY Lab.Inf.D‘ Avignon Lab.Inf. D'AvignonFrance U.Sao PauloU.Sao Paulo – MathBrazil VanguardVanguard EngineeringMexico LCCLanguage Comp. Corp.USA UAICU.AI.I Cuza" IasiRomania Wroclaw U.Wroclaw U.of TechPoland RFIA-UPVUniv.Politècnica de ValenciaSpain LIMSICNRS Lab-Orsay CedexFrance U.StuttgartU.Stuttgart-NLPGermany ITCITC-irst,Italy JRC- ISPRA Institute for the Protection and the Security of the Citizen Italy BTBBulTreeBank ProjectSofia dltgUniversity of LimerickIreland Industrial Companies
11
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Submitted runs # Monolingual # Cross-lingual # CLEF 200317 611 CLEF 200448 (+182%) 2028 CLEF 200567 (+39.5%) 4324 CLEF 200677 (+13%) 4235
12
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Number of answers and snippets per question Number of RUNS with respect to number of answers 1 answer more than 5 answers between 2 and 5 answers Number of SNIPPETS for each answer 1 snippet 2 snippets 3 snippets > 4 snippets
13
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Evaluation As in previous campaigns ♦runs manually judged by native speakers ♦each answer: Right, Wrong, ineXact, Unsupported ♦up to two runs for each participating group Evaluation measures ♦Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only excessive workload: some groups could manually assess only one answer (the first one) per question –1 answer: Spanish and English –3 answers: French –5 answers: Dutch –all answers: Italian, German, Portoguese ♦P@N for List questions Additional evaluation measures ♦K1 measure ♦Confident Weighted Score (CWS) ♦Mean Reciprocal Rank (MRR) NEW!
14
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Question Overlapping among Languages 2005-2006
15
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Results: Best and Average scores 49,47 * This result is still under validation. *
16
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Best results in 2004-2005-2006 22,63 * This result is still under validation. *
17
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Participants in 2004-2005-2006: compared best results
18
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop List questions Best: 0.8333 (Priberam, Monolingual PT) Average: 0.138 Problems Wrong classification of List Questions in the Gold Standard ♦Mention a Chinese writer is not a List question! Definition of List Questions ♦“closed” List questions asking for a finite number of answers Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays? A: Romeo and Juliet. ♦“open” List questions requiring a list of items as answer Q: Name books by Jules Verne. A: Around the World in 80 Days. A: Twenty Thousand Leagues Under The Sea. A: Journey to the Centre of the Earth.
19
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Final considerations –Increasing interest in multilingual QA More participants (30, + 25%) Two new languages as source (Romanian and Polish) More activated tasks (24, they were 23 in 2005) More submitted runs (77, +13%) More cross-lingual tasks (35, +31.5%) –Gold Standard: questions not translated in all languages No possibility of activating tasks at the last minutes Useful as reusuable resource: available in the near future.
20
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Final considerations: 2006 main task innovations –Multiple answers: good response limited capacity of assessing large numbers of answers. feedback welcome from participants –Supporting snippets: faster evaluation feedback from participants –“F/D/L/” labels not given in the input format: positive, as apparently there was no real impact on –List questions
21
Alicante, September, 22, 20006 QA@CLEF 2006 Workshop Future perspective: main task For discussion: Romanian as target Very hard questions (implying reasoning and multiple document answers) Allow collaboration among different systems Partial automated evaluation (right answers)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.