ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo, UNED, Spain 1
Outline The Multiple Language Question Answering Track at CLEF – a bit of History ResPubliQA this year –What is new Participation, Runs and Languages Assessment and Metrics Results Conclusions ResPubliQA 2010, 22 September, Padua, Italy 2
Multiple Language Question Answering at CLEF ResPubliQA 2010, 22 September, Padua, Italy 3 Era I: Era II: Era III: Ungrouped mainly factoid questions asked against monolingual newspapers; Exact answers returned Grouped questions asked against newspapers and Wikipedia; Exact answers returned ResPubliQA - Ungrouped questions against multilingual parallel- aligned EU legislative documents; Passages returned Started in 2003: eighth year
ResPubliQA 2010 – Second Year Key points: – same set of questions in all languages – same document collections: parallel aligned documents Same objectives: – to move towards a domain of potential users – to allow the direct comparison of performances across languages – to allow QA technologies to be evaluated against IR approaches – to promote use of Validation technologies ResPubliQA 2010, 22 September, Padua, Italy 4 But also some novelties…
What’s new 1.New Task (Answer Selection) 2.New document collection (EuroParl) 3.New question types 4.Automatic Evaluation ResPubliQA 2010, 22 September, Padua, Italy 5
The Tasks ResPubliQA 2010, 22 September, Padua, Italy 6 Paragraph Selection (PS) – to extract a relevant paragraph of text that satisfies completely the information need expressed by a natural language question Answer Selection (AS) – to demarcate the shorter string of text corresponding to the exact answer supported by the entire paragraph NEW
The Collections Subset of JRC-Acquis (10,700 docs per lang) – EU treaties, EU legislation, agreements and resolutions – Between 1950 and 2006 – Parallel-aligned at the doc level (not always at paragraph) – XML-TEI.2 encoding Small subset of EUROPARL (~ 150 docs per lang) – Proceedings of the European Parliament translations into Romanian from January 2009 Debates (CRE) from 2009 and Texts Adopted (TA) from 2007 – Parallel-aligned at the doc level (not always at paragraph) – XML encoding ResPubliQA 2010, 22 September, Padua, Italy 7 NEW
EuroParl Collection is compatible with Acquis domain allows to widen the scope of the questions Unfortunately – small number of texts documents are not fully translated ResPubliQA 2010, 22 September, Padua, Italy 8 The specific fragments of JRC-Acquis and Europarl used by ResPubliQA is available at
Questions two new question categories: – OPINION What did the Council think about the terrorist attacks on London? – OTHER What is the e-Content program about? Reason and Purpose categories merged together Why was Perwiz Kambakhsh sentenced to death? And also Factoid, Definition, Procedure ResPubliQA 2010, 22 September, Padua, Italy 9
ResPubliQA Campaigns ResPubliQA 2010, 22 September, Padua, Italy 10 Task Registered groups Participant groups Submitted Runs Organizing people ResPubliQA (baseline runs) 9 ResPubliQA (42 PS and 7 AS) 6 (+ 6 additional translators/ assessors) More participants and more submissions
ResPubliQA 2010 Participants ResPubliQA 2010, 22 September, Padua, Italy 11 System name TeamReference bpacSZTAKI, HUNGARYNemeskey dict Dhirubhai Ambani Institute of Information and Communication Technology, INDIASabnani et al elixUniversity of Basque Country, SPAINAgirre et al iciaRACAI, ROMANIAIon et al ilesLIMSI-CNRS, FRANCETannier et al ju_cJadavpur University, INDIAPakray et al logaUniversity Koblenz, GERMANYGlöckner and Pelzer nlelU. Politecnica Valencia, SPAINCorrea et al pribPriberam, PORTUGAL- uaicAl.I.Cuza\ University of Iasi, ROMANIAIftene et al uc3mUniversidad Carlos III de Madrid, SPAINVicente-Díez et al uiirUniversity of Indonesia, INDONESIAToba et al unedUNED, SPAINRodrigo et al 13 participants 8 countries 4 new participants
Submissions by Task and Language Target language Source languages DEENESFRITPTROTotal DE4 (4,0) EN19 (16,3)2 (2,0)21 (18,3) ES7 (6,1) EU2 (2,0) FR7 (5,2) IT3 (2,1) PT1 (1,0) RO4 (4,0) Total4 (4,0)21 (18,3)7 (6,1)7 (5,2)3 (2,1)1 (1,0)6 (6,0)49 (42,7) ResPubliQA 2010, 22 September, Padua, Italy 12
System Output Two options: – Give an answer (paragraph or exact answer) – Return NOA as response = no answer is given The system is not confident about the correctness of its answer Objective: – avoid to return an incorrect answer – reduce only the portion of wrong answers ResPubliQA 2010, 22 September, Padua, Italy 13
Evaluation Measure ResPubliQA 2010, 22 September, Padua, Italy 14 n R : number of questions correctly answered n U : number of questions unanswered n: total number of questions (200 this year) If n U = 0 then R /n Accuracy
Assessment Two steps: 1)Automatic evaluation o responses automatically compared against the Gold Standard manually produced – answers that exactly match with the GoldStandard, are given the correct value (R) – correctness of a response: exact match of Document identifier, Paragraph identifier, and the text retrieved by the system with respect to those in the GoldStandard 2)Manual assessment o Non-matching paragraphs/ answers judged by human assessors o anonymous and simultaneous for the same question ResPubliQA 2010, 22 September, Padua, Italy 15 31% of the answers automatically marked as correct
Assessment for Paragraph Selection (PS) binary assessment: – Right (R) – Wrong (W) NOA answers: – automatically filtered and marked as U (Unanswered) – discarded candidate answers were also evaluated NoA R: NoA, but the candidate answer was correct NoA W: NoA, and the candidate answer was incorrect Noa Empty: NoA and no candidate answer was given evaluators were guided by the initial “gold” paragraph – only a hint ResPubliQA 2010, 22 September, Padua, Italy 16
Assessment for Answer Selection (AS) R (Right): the answer-string consists of an exact and correct answer, supported by the returned paragraph; X (ineXact): the answer-string contains either part of a correct answer present in the returned paragraph or it contains all the correct answer plus unnecessary additional text; M (Missed): the answer-string does not contain a correct answer even in part but the returned paragraph in fact does contain a correct answer; W (Wrong): the answer-string does not contain a correct answer and moreover the returned paragraph does not contain it either; or it contains an unsupported answer ResPubliQA 2010, 22 September, Padua, Italy 17
Monolingual Results for PS ResPubliQA 2010, 22 September, Padua, Italy 18 systemDEENESFRITPTRO Combination uiir dict bpac loga loga prib nlel bpac elix IR baseline (uned) uned uc3m uc3m dict uiir uned elix nlel ju_c iles uaic uaic icia
Improvement in the Performance ResPubliQA 2010, 22 September, Padua, Italy 19 BESTAVERAGE ResPubliQA ResPubliQA Monolingual PS Task: 2010 CollectionsBESTAVERAGE JRC-Acquis EuroParl
Cross-language Results for PS ResPubliQA 2010, 22 September, Padua, Italy 20 systemDEENESFRITPTRO elix102euen0.36 elix101euen0.33 icia101enro0.29 icia102enro0.29 In comparison to ResPubliQA 2009: – More cross-language runs (+ 2) – Improvement in the best performance: from 0.18 to 0.36
Results for the AS Task ResPubliQA 2010, 22 September, Padua, Italy 21 R #NoA W #NoA M #NoA X #NoA empty combination ju_c101ASenen iles101ASenen iles101ASfrfr nlel101ASenen nlel101ASeses nlel101ASitit nlel101ASfrfr
Conclusions Successful continuation of ResPubliQA 2009 AS task: few groups and poor results Overall improvement of results New document collection and new question types evaluation metric encourages the use of validation module ResPubliQA 2010, 22 September, Padua, Italy 22
More on System Analyses and Approaches MLQA’10 Workshop on Wednesday 14:30 – 18:00 ResPubliQA 2010, 22 September, Padua, Italy 23
ResPubliQA 2010: QA on European Legislation Thank you! 24