CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner
QA 2008 Task and Exercises QA Main task (6th edition) Pilot: QA WSD, English newswire collections with Word Sense Disambiguation Answer Validation Exercise – AVE (3rd edition) QA on Speech Transcripts – QAST (2nd edition)
Main Task QA 2008 Organizing Committee CELCT (D. Giampiccolo, P. Forner): Italian UNED (A. Peñas): Spanish U. Groeningen (G. Bosma): Dutch U. Limerick (R. Sutcliff): English DFKI (B. Sacalenau): German ELDA/ELRA (N. Moreau): French Linguateca (P. Rocha): Portuguese Bulgarian Academy of Sciences (P. Osenova): Bulgarian IASI (C. Forascu): Romanian U. Basque Country (I. Alegria): Basque ILSP (P.Prokopidis): Greek
Evolution of the Track 2003 2004 2005 2006 2007 2008 Target languages 9 10 11 Collections News 1994 +News 1995 +Wikipedia Nov. 2006 Type of questions 200 Factoid + Temporal restrictions + Definitions - Type of question + Lists + Linked questions + Closed lists Supporting information Doc. Snippet Pilots and Exercises Temporal restrictions Lists AVE Real Time WiQA QAST WSDQA
200 questions FACTOID DEFINITION CLOSED LIST LINKED QUESTIONS (loc, mea, org, per, tim, cnt, obj , oth) DEFINITION (per, org, obj, oth) CLOSED LIST Who were the components of The Beatles? Who were the last three presidents of Italy? LINKED QUESTIONS Who was called the “Iron-Chancellor”? When was he born? Who was his first wife? Temporal restrictions by date, by period, by event NIL questions (without known answer in the collection)
SOURCE LANGUAGES (questions) 43 Activated Language Combinations (at least one registered participant) TARGET LANGUAGES (corpus and answers) BG DE EL EN ES EU FR IT NL PT RO SOURCE LANGUAGES (questions)
Activated Tasks MONOLINGUAL CROSS-LINGUAL TOTAL CLEF 2003 3 5 8 7 Activated Tasks MONOLINGUAL CROSS-LINGUAL TOTAL CLEF 2003 3 5 8 CLEF 2004 6 13 19 CLEF 2005 15 23 CLEF 2006 7 17 24 CLEF 2007 29 37 CLEF 2008 10 33 43
Submitted runs Submitted runs Monolingual Cross-lingual CLEF 2003 17 6 Submitted runs Monolingual Cross-lingual CLEF 2003 17 6 11 CLEF 2004 48 (+182%) 20 28 CLEF 2005 67 (+40%) 43 24 CLEF 2006 77 (+15%) 42 35 CLEF 2007 37 (-52%) CLEF 2008 51 (+38%) 31 8
Participant groups CLEF 2003 - 8 CLEF 2004 13 5 18 (+125%) 22 Newcomers Veterans TOTAL Registered CLEF 2003 - 8 CLEF 2004 13 5 18 (+125%) 22 CLEF 2005 9 15 24 (+33%) 27 CLEF 2006 10 20 30 (+25%) 36 CLEF 2007 14 (-26%) 29 CLEF 2008 21 33
List of Participants (random order) Bulgaria
Groups per year and target collection Natural selection? Task Change Above 20 groups
Groups per target collection
2008 participation: Comparative evaluation? Language Runs Different groups Portuguese 9 6 Spanish 10 4 English 5 German 11 3 Romanian 2 Dutch 1 Basque French Bulgarian Italian Greek Lack from evaluation perspective: 4 languages without comparison between different groups Breakout session
Results: Best and Average scores
Best scores by language
Best scores by participant
Results depend on type of questions Definitions Almost solved for several systems 80%-95% Factoids 50%-65% for several systems Temporal restrictions Same level of difficulty as factoids for some systems Closed lists Still very difficult Linked questions Now wikipedia provides more answers
Conclusion Same task as 2007 Same level of participation (slightly better) 11 target languages (9 with participation) 43 activated subtasks 21 participants 51 runs Same results (slightly better)
Future direction Less participants per language Poor comparison Change methodology: one task for all Critics to QA over wikipedia Easier to find questions with IR No user model Change collection QA proposal for 2009 SC and breakout