Presentation is loading. Please wait.

Presentation is loading. Please wait.

ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,

Similar presentations


Presentation on theme: "ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,"— Presentation transcript:

1 ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo, UNED, Spain http://celct.isti.cnr.it/ResPubliQA/ 1

2 Outline  The Multiple Language Question Answering Track at CLEF – a bit of History  ResPubliQA this year –What is new  Participation, Runs and Languages  Assessment and Metrics  Results  Conclusions ResPubliQA 2010, 22 September, Padua, Italy 2

3 Multiple Language Question Answering at CLEF ResPubliQA 2010, 22 September, Padua, Italy 3 Era I: 2003-2006 Era II: 2007-2008 Era III: 2009-2010 Ungrouped mainly factoid questions asked against monolingual newspapers; Exact answers returned Grouped questions asked against newspapers and Wikipedia; Exact answers returned ResPubliQA - Ungrouped questions against multilingual parallel- aligned EU legislative documents; Passages returned Started in 2003: eighth year

4 ResPubliQA 2010 – Second Year  Key points: – same set of questions in all languages – same document collections: parallel aligned documents  Same objectives: – to move towards a domain of potential users – to allow the direct comparison of performances across languages – to allow QA technologies to be evaluated against IR approaches – to promote use of Validation technologies ResPubliQA 2010, 22 September, Padua, Italy 4 But also some novelties…

5 What’s new 1.New Task (Answer Selection) 2.New document collection (EuroParl) 3.New question types 4.Automatic Evaluation ResPubliQA 2010, 22 September, Padua, Italy 5

6 The Tasks ResPubliQA 2010, 22 September, Padua, Italy 6  Paragraph Selection (PS) – to extract a relevant paragraph of text that satisfies completely the information need expressed by a natural language question  Answer Selection (AS) – to demarcate the shorter string of text corresponding to the exact answer supported by the entire paragraph NEW

7 The Collections  Subset of JRC-Acquis (10,700 docs per lang) – EU treaties, EU legislation, agreements and resolutions – Between 1950 and 2006 – Parallel-aligned at the doc level (not always at paragraph) – XML-TEI.2 encoding  Small subset of EUROPARL (~ 150 docs per lang) – Proceedings of the European Parliament translations into Romanian from January 2009 Debates (CRE) from 2009 and Texts Adopted (TA) from 2007 – Parallel-aligned at the doc level (not always at paragraph) – XML encoding ResPubliQA 2010, 22 September, Padua, Italy 7 NEW

8 EuroParl Collection  is compatible with Acquis domain  allows to widen the scope of the questions  Unfortunately – small number of texts documents are not fully translated ResPubliQA 2010, 22 September, Padua, Italy 8 The specific fragments of JRC-Acquis and Europarl used by ResPubliQA is available at http://celct.isti.cnr.it/ResPubliQA/Downloads

9 Questions  two new question categories: – OPINION What did the Council think about the terrorist attacks on London? – OTHER What is the e-Content program about?  Reason and Purpose categories merged together Why was Perwiz Kambakhsh sentenced to death?  And also Factoid, Definition, Procedure ResPubliQA 2010, 22 September, Padua, Italy 9

10 ResPubliQA Campaigns ResPubliQA 2010, 22 September, Padua, Italy 10 Task Registered groups Participant groups Submitted Runs Organizing people ResPubliQA 2009 2011 28 + 16 (baseline runs) 9 ResPubliQA 2010 2413 49 (42 PS and 7 AS) 6 (+ 6 additional translators/ assessors) More participants and more submissions

11 ResPubliQA 2010 Participants ResPubliQA 2010, 22 September, Padua, Italy 11 System name TeamReference bpacSZTAKI, HUNGARYNemeskey dict Dhirubhai Ambani Institute of Information and Communication Technology, INDIASabnani et al elixUniversity of Basque Country, SPAINAgirre et al iciaRACAI, ROMANIAIon et al ilesLIMSI-CNRS, FRANCETannier et al ju_cJadavpur University, INDIAPakray et al logaUniversity Koblenz, GERMANYGlöckner and Pelzer nlelU. Politecnica Valencia, SPAINCorrea et al pribPriberam, PORTUGAL- uaicAl.I.Cuza\ University of Iasi, ROMANIAIftene et al uc3mUniversidad Carlos III de Madrid, SPAINVicente-Díez et al uiirUniversity of Indonesia, INDONESIAToba et al unedUNED, SPAINRodrigo et al 13 participants 8 countries 4 new participants

12 Submissions by Task and Language Target language Source languages DEENESFRITPTROTotal DE4 (4,0) EN19 (16,3)2 (2,0)21 (18,3) ES7 (6,1) EU2 (2,0) FR7 (5,2) IT3 (2,1) PT1 (1,0) RO4 (4,0) Total4 (4,0)21 (18,3)7 (6,1)7 (5,2)3 (2,1)1 (1,0)6 (6,0)49 (42,7) ResPubliQA 2010, 22 September, Padua, Italy 12

13 System Output  Two options: – Give an answer (paragraph or exact answer) – Return NOA as response = no answer is given The system is not confident about the correctness of its answer  Objective: – avoid to return an incorrect answer – reduce only the portion of wrong answers ResPubliQA 2010, 22 September, Padua, Italy 13

14 Evaluation Measure ResPubliQA 2010, 22 September, Padua, Italy 14 n R : number of questions correctly answered n U : number of questions unanswered n: total number of questions (200 this year) If n U = 0 then c@1=n R /n  Accuracy

15 Assessment Two steps: 1)Automatic evaluation o responses automatically compared against the Gold Standard manually produced – answers that exactly match with the GoldStandard, are given the correct value (R) – correctness of a response: exact match of Document identifier, Paragraph identifier, and the text retrieved by the system with respect to those in the GoldStandard 2)Manual assessment o Non-matching paragraphs/ answers judged by human assessors o anonymous and simultaneous for the same question ResPubliQA 2010, 22 September, Padua, Italy 15 31% of the answers automatically marked as correct

16 Assessment for Paragraph Selection (PS)  binary assessment: – Right (R) – Wrong (W)  NOA answers: – automatically filtered and marked as U (Unanswered) – discarded candidate answers were also evaluated NoA R: NoA, but the candidate answer was correct NoA W: NoA, and the candidate answer was incorrect Noa Empty: NoA and no candidate answer was given  evaluators were guided by the initial “gold” paragraph – only a hint ResPubliQA 2010, 22 September, Padua, Italy 16

17 Assessment for Answer Selection (AS) R (Right): the answer-string consists of an exact and correct answer, supported by the returned paragraph; X (ineXact): the answer-string contains either part of a correct answer present in the returned paragraph or it contains all the correct answer plus unnecessary additional text; M (Missed): the answer-string does not contain a correct answer even in part but the returned paragraph in fact does contain a correct answer; W (Wrong): the answer-string does not contain a correct answer and moreover the returned paragraph does not contain it either; or it contains an unsupported answer ResPubliQA 2010, 22 September, Padua, Italy 17

18 Monolingual Results for PS ResPubliQA 2010, 22 September, Padua, Italy 18 systemDEENESFRITPTRO Combination0.750.940.820.740.730.560.70 uiir1010.73 dict1020.68 bpac1020.68 loga1020.62 loga1010.59 prib1010.56 nlel1010.490.650.560.550.63 bpac1010.65 elix1010.65 IR baseline (uned)0.650.54 uned1020.54 uc3m1020.52 uc3m1010.51 dict1010.64 uiir1020.64 uned1010.63 elix1020.62 nlel1020.590.620.200.550.53 ju_c1010.50 iles1020.480.36 uaic1020.460.240.55 uaic1010.430.300.52 icia1020.49

19 Improvement in the Performance ResPubliQA 2010, 22 September, Padua, Italy 19 BESTAVERAGE ResPubliQA 20090.680.39 ResPubliQA 20100.730.54 Monolingual PS Task: 2010 CollectionsBESTAVERAGE JRC-Acquis0.710.53 EuroParl0.770.55

20 Cross-language Results for PS ResPubliQA 2010, 22 September, Padua, Italy 20 systemDEENESFRITPTRO elix102euen0.36 elix101euen0.33 icia101enro0.29 icia102enro0.29 In comparison to ResPubliQA 2009: – More cross-language runs (+ 2) – Improvement in the best performance: from c@1 0.18 to 0.36

21 Results for the AS Task ResPubliQA 2010, 22 September, Padua, Italy 21 Systemc@1#R#W#M#X#NoA R #NoA W #NoA M #NoA X #NoA empty combination 0.306014000000000 ju_c101ASenen 0.263112108115040240 75 iles101ASenen 0.091712464490000 9 iles101ASfrfr 0.0814128736150000 nlel101ASenen 0.071097206670000 nlel101ASeses 0.0612138211280000 nlel101ASitit 0.036139187300000 nlel101ASfrfr 0.0241321311400000

22 Conclusions  Successful continuation of ResPubliQA 2009  AS task: few groups and poor results  Overall improvement of results  New document collection and new question types  c@1 evaluation metric encourages the use of validation module ResPubliQA 2010, 22 September, Padua, Italy 22

23 More on System Analyses and Approaches MLQA’10 Workshop on Wednesday 14:30 – 18:00 ResPubliQA 2010, 22 September, Padua, Italy 23

24 ResPubliQA 2010: QA on European Legislation Thank you! 24


Download ppt "ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,"

Similar presentations


Ads by Google