3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.

Slides:



Advertisements
Similar presentations
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.
1. LEARN CONNECT & GET SUPPORT GENERATE LEADS START Marketing Community Marketing Services Bureau Microsoft Dynamics Marketplace Demand Generation Campaigns.
CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.
UNED at PASCAL RTE-2 Challenge IR&NLP Group at UNED nlp.uned.es Jesús Herrera Anselmo Peñas Álvaro Rodrigo Felisa Verdejo.
Alicante, September, 22, Workshop Overview of the Multilingual Question Answering Track Danilo Giampiccolo.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
ResPubliQA IR baselines and UNED participation Álvaro Rodrigo Joaquín Pérez Anselmo Peñas Guillermo Garrido Lourdes Araujo nlp.uned.es.
Answer Validation Exercise Anselmo Peñas UNED NLP Group 2005 Breakout session.
Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Christopher Harris Informatics Program The University of Iowa Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011) Hong Kong, Feb. 9, 2011.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( The Multiple Language Question Answering Track at CLEF 2003.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
A Language Independent Method for Question Classification COLING 2004.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,
How robust is CLIR? Proposal for a new robust task at CLEF Thomas Mandl Information Science Universität Hildesheim 6 th Workshop.
Introduction to Dialogue Systems. User Input System Output ?
CLEF 2008 Workshop September 17-19, 2008 Aarhus, Denmark.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Intelligent Key Prediction by N-grams and Error-correction Rules Kanokwut Thanadkran, Virach Sornlertlamvanich and Tanapong Potipiti Information Research.
LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
A Trainable Multi-factored QA System Radu Ion, Dan Ştefănescu, Alexandru Ceauşu, Dan Tufiş, Elena Irimia, Verginica Barbu-Mititelu Research Institute for.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
Measuring Monolinguality
Social Knowledge Mining

Automatic Detection of Causal Relations for Question Answering
Functions (Notation and Evaluating)
What is the Entrance Exams Task
UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…

Machine Reading.
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to… Main task QA organizing committee

nlp.uned.es/clef-qa/ave Answer Validation Exercise 2008 Validate the correctness of real systems answers Question Answering Question Candidate answer Supporting Text Answer is not correct or not enough evidence Question Answer is correct Answer Validation

nlp.uned.es/clef-qa/ave Collections What is Zanussi? was an Italian producer of home appliances Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought who had also been in Cassibile since August 31 Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August (1985) 3 Out of 5 Live (1985) What Is This? Candidate answers grouped by question - Accept or Reject all answers - Select one of the accepted answers

nlp.uned.es/clef-qa/ave Collections  Remove duplicated answers inside the same question group  Discard NIL answers, void answers and answers with too long supporting snippet  This processing lead to a reduction in the number of answers to be validated

nlp.uned.es/clef-qa/ave AVE Collections 2008 (# answers to validate) Available for CLEF participants at nlp.uned.es/clef-qa/ave/ Testing (% Correct)Development % Correct Spanish152810%55123% English % % German % % Portuguese % % Dutch %7815.8% French % % Romanian %8243.7% Basque1047.2%-- Bulgarian2744.4%-- Italian %

nlp.uned.es/clef-qa/ave Evaluation  Not balanced collections (real world)  Approach: Detect if there is enough evidence to accept an answer  Measures: Precision, recall and F over ACCEPTED answers  Baseline system: Accept all answers

nlp.uned.es/clef-qa/ave Participants and runs DEENESFRROTot Fernuniversität in Hagen22 LIMSI22 U. Iasi224 DFKI112 INAOE22 U. Alicante123 UNC22 U. Jaén (UJA)2226 LINA11 Total

nlp.uned.es/clef-qa/ave Evaluation: P, R, F Group SystemFPrecisionRecall DFKILtqa UAOfe UNCJota_ IASIUaic_ UNCJota_ IASIUaic_ % VALIDATED UJAMagc_2(bbr) UJAMagc_1(timbl)000 Precision, Recall and F measure over correct answers for English

nlp.uned.es/clef-qa/ave Additional measures  Compare AVE systems with QA systems performance  Count the answers SELECTED correctly  Reward the detection of groups in which all answers are incorrect Allows a new justified attempt to answer the question new

nlp.uned.es/clef-qa/ave Additional measures new

nlp.uned.es/clef-qa/ave Evaluation: estimated performance System type estimated_ qa_performance qa_accuracy (% best combination) qa_rej_ accuracy qa_ accuracy_max Perfect selection0,560,34 (100%)0,661 ltqaAV0,340,24 (70,37%)0,440,68 ofeAV0,270,19 (57,41%)0,40,59 uaic_2AV0,240,24 (70,37%)0,010,25 wlvs081roenQA0,210,21 (62,96%)00,21 uaic_1AV0,190,19 (57,41%)00,19 jota_2AV0,170,16 (46,30%)0,10,26 dfki081deenQA0,170,17 (50%)00,17 jota_1AV0,160,16 (46,30%)00,16 dcun081deenQA0,100,10 (29,63%)00,10 Random0,090,09 (25,25%)00,09 nlel081enenQA0,060,06 (18,52%)00,06 nlel082enenQA0,050,05 (14,81%)00,05 ilkm081nlenQA0,040,04 (12,96%)00,04 magc_2(bbr)AV0,010,01 (1,85%)0,640,65 dcun082deenQA0,010,01 (1,85%)00,01 magc_1(timbl)AV00 (0%)0,63

nlp.uned.es/clef-qa/ave Comparing AV systems performance with QA systems (English)

nlp.uned.es/clef-qa/ave Techniques reported at AVE 2007 & 2008  10 reports (2007)  9 reports (2008) Generates hypotheses 62 Wordnet 35 Chunking 34 n-grams, longest common Subsequences 54 Phrase transformations 22 NER 57 Num. expressions 67 Temp. expressions 45 Coreference resolution 20 Dependency analysis 33 Syntactic similarity 40 Functions (sub, obj, etc) 33 Syntactic transformations 12 Word-sense disambiguation 21 Semantic parsing 42 Semantic role labeling 21 First order logic representation 32 Theorem prover 31 Semantic similarity 20

nlp.uned.es/clef-qa/ave Conclusion (of AVE)  Three years of evaluation in a real environment Real systems outputs -> AVE input  Developed methodologies Build collections from QA responses Evaluate in chain with a QA Track Compare results with QA systems  Introduction of RTE techniques in QA More NLP More Machine Learning  New testing collections for the QA and RTE communities In 8 languages, not only English

nlp.uned.es/clef-qa/ave Many Thanks!!  CLEF  AVE QA Organizing Committee  AVE participants  UNED team