Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.

Slides:

Advertisements

Similar presentations

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.

Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

1 CLEF 2012, Rome QA4MRE, Question Answering for Machine Reading Evaluation Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner (CELCT,

1 CLEF 2011, Amsterdam QA4MRE, Question Answering for Machine Reading Evaluation Question Answering Track Overview Main Task Anselmo Peñas Eduard Hovy.

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.

3rd Answer Validation Exercise ( AVE 2008) QA subtrack at Cross-Language Evaluation Forum 2008 UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks.

1 CLEF 2009, Corfu Question Answering Track Overview J. Turmo P.R. Comas S. Rosset O. Galibert N. Moreau D. Mostefa P. Rosso D. Buscaldi D. Santos L.M.

UNED at PASCAL RTE-2 Challenge IR&NLP Group at UNED nlp.uned.es Jesús Herrera Anselmo Peñas Álvaro Rodrigo Felisa Verdejo.

CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.

Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University

ResPubliQA IR baselines and UNED participation Álvaro Rodrigo Joaquín Pérez Anselmo Peñas Guillermo Garrido Lourdes Araujo nlp.uned.es.

Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

Spanish Question Answering Evaluation Anselmo Peñas, Felisa Verdejo and Jesús Herrera UNED NLP Group Distance Learning University of Spain CICLing 2004,

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.

CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.

CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,

Abstract Question answering is an important task of natural language processing. Unification-based grammars have emerged as formalisms for reasoning about.

Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.

AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002.

Evaluating Multilingual Question Answering Systems at CLEF Pamela Forner 1, Danilo Giampiccolo 1, Bernardo Magnini 2, Anselmo Peñas 3, Álvaro Rodrigo 3,

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

Recognizing textual entailment: Rational, evaluation and approaches Source:Natural Language Engineering 15 (4) Author:Ido Dagan, Bill Dolan, Bernardo Magnini.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.

Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.

CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

QA Pilot Task at CLEF 2004 Jesús Herrera Anselmo Peñas Felisa Verdejo UNED NLP Group Cross-Language Evaluation Forum Bath, UK - September 2004.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

LREC Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.

Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.

Using Semantic Relations to Improve Information Retrieval

Overview of Statistical NLP IR Group Meeting March 7, 2006.

1 Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung.

A Trainable Multi-factored QA System Radu Ion, Dan Ştefănescu, Alexandru Ceauşu, Dan Tufiş, Elena Irimia, Verginica Barbu-Mititelu Research Institute for.

CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.

Automatically Extending NE coverage of Arabic WordNet using Wikipedia

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Presentation 王睿.

What is the Entrance Exams Task

UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…

Towards Solving Problems Using Textual Entailment over Identified Sources 2015/9/14 Weixi Zhu.

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

Machine Reading.

CLEF 2008 Multilingual Question Answering Track

Presentation transcript:

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second International Workshop on Evaluating Information Access (EVIA-NTCIR 2008) Tokyo, 16 December 2008

UNED nlp.uned.es Content 1. Context and motivation Question Answering at CLEF Answer Validation Exercise at CLEF 2. Evaluating the validation of answers 3. Evaluating the selection of answers Correct selection Correct rejection 4. Analysis and discussion 5. Conclusion

UNED nlp.uned.es Evolution of the CLEF-QA Track Target languages UE Official Collections News 1994+News Wikipedia Nov JRC-Acquis Type of questions 200 Factoid + Temporal restrictions + Definitions - Type of question + Lists + Linked questions + Closed lists Factoid Definition Motive Purpose Procedure Supporting information DocumentSnippetParagraph Pilots and Exercises Temporal restriction Lists AVE Real Time WiQA AVE QAST AVE QAST WSDQ A GikiCLEF QAST

UNED nlp.uned.es Evolution of Results (Spanish) Overall Best result <60% Definitions Best result >80% NOT IR approach

UNED nlp.uned.es Pipeline Upper Bounds Use Answer Validation to break the pipeline Question Answer Question analysis Passage Retrieval Answer Extraction Answer Ranking xx= Not enough evidence

UNED nlp.uned.es Results in CLEF-QA 2006 (Spanish) Perfect combination 81% Best system 52,5% Best with ORGANIZATION Best with PERSON Best with TIME

UNED nlp.uned.es Collaborative architectures Diferent systems response better different types of questions Specialisation Collaboration QA sys 1 QA sys 2 QA sys 3 QA sys n Question Candidate answers Answer Validation & Selection Answer Evaluation Framwork

UNED nlp.uned.es Collaborative architectures How to select the good answer? Redundancy Voting Confidence score Performance history Why not deeper analysis?

UNED nlp.uned.es Answer Validation Exercise (AVE) Objective Validate the correctness of the answers Given by real QA systems......the participants at CLEF QA

UNED nlp.uned.es Answer Validation Exercise (AVE) Question Answering Question Candidate answer Supporting Text Textual Entailment Answer is not correct or not enough evidence Automatic Hypothesis Generation Question Hypothesis Answer is correct AVE 2006 AVE Answer Validation

UNED nlp.uned.es Techniques in AVE 2007 Overview AVE 2007 Generates hypotheses 6 Wordnet 3 Chunking 3 n-grams, longest common Subsequences 5 Phrase transformations 2 NER 5 Num. expressions 6 Temp. expressions 4 Coreference resolution 2 Dependency analysis 3 Syntactic similarity 4 Functions (sub, obj, etc) 3 Syntactic transformations 1 Word-sense disambiguation 2 Semantic parsing 4 Semantic role labeling 2 First order logic representation 3 Theorem prover 3 Semantic similarity 2

UNED nlp.uned.es Evaluation linked to main QA task Question Answering Track Systems’ answers Systems’ Supporting Texts Answer Validation Exercise Questions Systems’ Validation (YES, NO) Human Judgements (R,W,X,U) QA Track results Mapping (YES, NO) Evaluation AVE Track results Reuse human assessments

UNED nlp.uned.es Content 1. Context and motivation 2. Evaluating the validation of answers 3. Evaluating the selection of answers 4. Analysis and discussion 5. Conclusion

UNED nlp.uned.es QA sys 1 QA sys 2 QA sys 3 QA sys n Question Candidate answers Answer Validation & Selection Answer Participant systems in a CLEF – QA Evaluation of Answer Validation & Selection Evaluation Proposed

UNED nlp.uned.es Collections What is Zanussi? was an Italian producer of home appliances Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought who had also been in Cassibile since August 31 Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August (1985) 3 Out of 5 Live (1985) What Is This?

UNED nlp.uned.es Evaluating the Validation Validation Decide if each candidate answer is correct or not YES | NO Not balanced collections Approach: Detect if there is enough evidence to accept an answer Measures: Precision, recall and F over correct answers Baseline system: Accept all answers

UNED nlp.uned.es Evaluating the Validation Correct Answer Incorrect Answer Answer Accepted n CA n WA Answer Rejected n CR n WR

UNED nlp.uned.es Evaluating the Selection Quantify the potential gain of Answer Validation in Question Answering Compare AV systems with QA systems Develop measures more comparable to QA accuracy

UNED nlp.uned.es Evaluating the selection Given a question with several candidate answers Two options: Selection Select an answer ≡ try to answer the question Correct selection: answer was correct Incorrect selection: answer was incorrect Rejection Reject all candidate answers ≡ leave question unanswered Correct rejection: All candidate answers were incorrect Incorrect rejection: Not all candidate answers were incorrect

UNED nlp.uned.es Evaluating the Selection n questions n= n CA + n WA + n WS + n WR + n CR Question with Correct Answer Question without Correct Answer Question Answered Correctly (One Answer Selected) n CA - Question Answered Incorrectly n WA n WS Question Unanswered (All Answers Rejected) n WR n CR Not comparable to qa_accuracy

UNED nlp.uned.es Evaluating the Selection n questions n= n CA + n WA + n WS + n WR + n CR Question with Correct Answer Question without Correct Answer Question Answered Correctly (One Answer Selected) n CA - Question Answered Incorrectly n WA n WS Question Unanswered (All Answers Rejected) n WR n CR

UNED nlp.uned.es Evaluating the Selection Rewards rejection (not balanced cols) Interpretation for QA: all questions correctly rejected by AV will be answered correctly

UNED nlp.uned.es Evaluating the Selection Interpretation for QA: questions correctly rejected by AV will be answered correctly in qa_accuracy proportion

UNED nlp.uned.es Content 1. Context and motivation 2. Evaluating the validation of answers 3. Evaluating the selection of answers 4. Analysis and discussion 5. Conclusion

UNED nlp.uned.es Analysis and discussion (AVE 2007 English) Validation Selection QA_acc correlated to R “Estimated” adjusts it

UNED nlp.uned.es Multi-stream QA performance (AVE 2007 English)

UNED nlp.uned.es Analysis and discussion (AVE 2007 Spanish) Validation Selection Comparing AV & QA

UNED nlp.uned.es Conclusion Evaluation framework for Answer Validation & Selection systems Measures that reward not only Correct Selection but also Correct Rejection Promote improvement of QA systems Allow comparison between AV and QA systems In what conditions multi-stream perform better Room for improvement just using multi-stream-QA Potential gain that AV systems can provide to QA

Thanks! Acknowledgement: EU project T-CLEF (ICT )

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second International Workshop on Evaluating Information Access (EVIA-NTCIR 2008) Tokyo, 16 December 2008