Alicante, September, 22, Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo
Alicante, September, 22, Workshop Outline Tasks Test set preparation Participants Evaluation Results Final considerations Future perspectives
Alicante, September, 22, Workshop QA 2006: Organizing Committee ITC-irst (Bernardo Magnini): main coordinator CELCT (D. Giampiccolo, P. Forner): general coordination, Italian DFKI (B. Sacalenau): German ELDA/ELRA (C. Ayache): French Linguateca (P. Rocha): Portuguese UNED (A. Penas): Spanish U. Amsterdam (Valentin Jijkoun): Dutch U. Limerick (R. Sutcliff): English Bulgarian Academy of Sciences (P. Osenova): Bulgarian ♦Only Source Languages: ♦Depok University of Indonesia (M. Adriani): Indonesian ♦IASI, Romania (D. Cristea): Romanian ♦Wrocław University of Technology (J. Pietraszko): Polish
Alicante, September, 22, Workshop Tasks Main task: ♦Monolingual: the language of the question (Source language) and the language of the news collection (Target language) are the same ♦Cross-lingual: the questions were formulated in a language different from that of the news collection One pilot task: ♦WiQA: coordinated by Maarten de Rijke Two exercises: Answer Validation Exercise (AVE): coordinated by Anselmo Peñas Real Time: a “time-constrained” QA exercise coordinated by the University of Alicante (coordinated by Fernando Llopis)
Alicante, September, 22, Workshop Data set: Question format 200 Questions of three kinds FACTOID ( loc, mea, org, oth, per, tim; ca. 150): What party did Hitler belong to? DEFINITION (ca. 40): Who is Josef Paul Kleihues? ♦reduced in number (-25%) ♦ two new categories added: –Object: What is a router? –Other: What is a tsunami? LIST (ca. 10): Name works by Tolstoy ♦Temporally restricted (ca. 40): by date, by period, by event ♦NIL (ca. 20): questions that do not have any known answer in the target document collection input format: question type (F, D, L) not indicated NEW!
Alicante, September, 22, Workshop Multiple answers: from one to ten exact answers per question ♦exact = neither more nor less than the information required ♦each answer has to be supported by – docid – one to ten text snippets justifying the answer (substrings of the specified document giving the actual context) NEW! Data set: run format NEW!
Alicante, September, 22, Workshop Activated Tasks (at least one registered participant) S T BGDEENESFRINITNLPTPLRO BG DE EN ES FR IT NL PT 11 Source languages (10 in 2005) 8 Target languages (9 in 2005) No Finnish task / New languages: Polish and Romanian
Alicante, September, 22, Workshop Activated Tasks MONOLINGUALCROSS-LINGUALTOTAL CLEF CLEF CLEF CLEF questions were not translated in all the languages Gold Standard: questions in multiple languages only for tasks were there was at least one registered participant NEW! More interest in cross-linguality
Alicante, September, 22, Workshop Participants AmericaEuropeAsiaTOTAL Registered participants New comersVeterans Absent veterans CLEF CLEF (+125%) CLEF (+33%) CLEF (+25%)
Alicante, September, 22, Workshop List of participants ACRONYMNAMECOUNTRY SYNAPSESYNAPSE DeveloppementFrance Ling-CompU.Rome-La SapienzaItaly AlicanteU.Alicante- InformaticaSpain HagenU.Hagen-InformaticsGermany DaedalusDaedalus ConsortiumSpain JaenU.Jaen-Intell.SystemsSpain ISLAU.AmsterdamNetherlands INAOEInst.Astrophysics,Optics&ElectronicsMexico DEPOKU.Indonesia-Comp.Sci.Indonesia DFKIDFKI-Lang.Tech.Germany FURUI Lab.Tokyo Inst TechnologyJapan LinguatecaLinguateca-SintefNorway LIC2M-CEACentre CEA SaclayFrance LINAU.Nantes-LINAFrance PriberamPriberam InformaticaPortugal U.PortoU.Porto- AIPortugal U.GroningenU.Groningen-LettersNetherlands ACRONYMNAMECOUNTRY Lab.Inf.D‘ Avignon Lab.Inf. D'AvignonFrance U.Sao PauloU.Sao Paulo – MathBrazil VanguardVanguard EngineeringMexico LCCLanguage Comp. Corp.USA UAICU.AI.I Cuza" IasiRomania Wroclaw U.Wroclaw U.of TechPoland RFIA-UPVUniv.Politècnica de ValenciaSpain LIMSICNRS Lab-Orsay CedexFrance U.StuttgartU.Stuttgart-NLPGermany ITCITC-irst,Italy JRC- ISPRA Institute for the Protection and the Security of the Citizen Italy BTBBulTreeBank ProjectSofia dltgUniversity of LimerickIreland Industrial Companies
Alicante, September, 22, Workshop Submitted runs # Monolingual # Cross-lingual # CLEF CLEF (+182%) 2028 CLEF (+39.5%) 4324 CLEF (+13%) 4235
Alicante, September, 22, Workshop Number of answers and snippets per question Number of RUNS with respect to number of answers 1 answer more than 5 answers between 2 and 5 answers Number of SNIPPETS for each answer 1 snippet 2 snippets 3 snippets > 4 snippets
Alicante, September, 22, Workshop Evaluation As in previous campaigns ♦runs manually judged by native speakers ♦each answer: Right, Wrong, ineXact, Unsupported ♦up to two runs for each participating group Evaluation measures ♦Accuracy (for F,D); main evaluation score, calculated for the FIRST ANSWER only excessive workload: some groups could manually assess only one answer (the first one) per question –1 answer: Spanish and English –3 answers: French –5 answers: Dutch –all answers: Italian, German, Portoguese for List questions Additional evaluation measures ♦K1 measure ♦Confident Weighted Score (CWS) ♦Mean Reciprocal Rank (MRR) NEW!
Alicante, September, 22, Workshop Question Overlapping among Languages
Alicante, September, 22, Workshop Results: Best and Average scores 49,47 * This result is still under validation. *
Alicante, September, 22, Workshop Best results in ,63 * This result is still under validation. *
Alicante, September, 22, Workshop Participants in : compared best results
Alicante, September, 22, Workshop List questions Best: (Priberam, Monolingual PT) Average: Problems Wrong classification of List Questions in the Gold Standard ♦Mention a Chinese writer is not a List question! Definition of List Questions ♦“closed” List questions asking for a finite number of answers Q: What are the names of the two lovers from Verona separated by family issues in one of Shakespeare’s plays? A: Romeo and Juliet. ♦“open” List questions requiring a list of items as answer Q: Name books by Jules Verne. A: Around the World in 80 Days. A: Twenty Thousand Leagues Under The Sea. A: Journey to the Centre of the Earth.
Alicante, September, 22, Workshop Final considerations –Increasing interest in multilingual QA More participants (30, + 25%) Two new languages as source (Romanian and Polish) More activated tasks (24, they were 23 in 2005) More submitted runs (77, +13%) More cross-lingual tasks (35, +31.5%) –Gold Standard: questions not translated in all languages No possibility of activating tasks at the last minutes Useful as reusuable resource: available in the near future.
Alicante, September, 22, Workshop Final considerations: 2006 main task innovations –Multiple answers: good response limited capacity of assessing large numbers of answers. feedback welcome from participants –Supporting snippets: faster evaluation feedback from participants –“F/D/L/” labels not given in the input format: positive, as apparently there was no real impact on –List questions
Alicante, September, 22, Workshop Future perspective: main task For discussion: Romanian as target Very hard questions (implying reasoning and multiple document answers) Allow collaboration among different systems Partial automated evaluation (right answers)