Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC 20081 Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)

Slides:



Advertisements
Similar presentations
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Question Answering for Machine Reading Evaluation Evaluation Campaign at CLEF 2011 Anselmo Peñas (UNED, Spain) Eduard Hovy (USC-ISI, USA) Pamela Forner.
Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.
Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.
Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
Recognizing Textual Entailment Progress towards RTE 4 Scott Settembre University at Buffalo, SNePS Research Group
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Knowledge Representation & Reasoning.  Introduction How can we formalize our knowledge about the world so that:  We can reason about it?  We can do.
Textual Entailment Using Univariate Density Model and Maximizing Discriminant Function “Third Recognizing Textual Entailment Challenge 2007 Submission”
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Third Recognizing Textual Entailment Challenge Potential SNeRG Submission.
Guiding Reading Comprehension
A Confidence Model for Syntactically-Motivated Entailment Proofs Asher Stern & Ido Dagan ISCOL June 2011, Israel 1.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum 2007 UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
The Second PASCAL Recognising Textual Entailment Challenge Roy Bar-Haim, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampicollo, Bernardo Magnini, Idan.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.
RTE Planning Session Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant,
Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
A Language Independent Method for Question Classification COLING 2004.
Recognizing textual entailment: Rational, evaluation and approaches Source:Natural Language Engineering 15 (4) Author:Ido Dagan, Bill Dolan, Bernardo Magnini.
Dr. Shazzad Hosain Department of EECS North South Universtiy Lecture 04 – Part A Knowledge Representation and Reasoning.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Recognising Textual Entailment Johan Bos School of Informatics University of Edinburgh Scotland,UK.
A Brief Introduction to Distant Supervision
Recognizing Partial Textual Entailment
Automatic Detection of Causal Relations for Question Answering
CS246: Information Retrieval
What is the Entrance Exams Task
UNED Anselmo Peñas Álvaro Rodrigo Felisa Verdejo Thanks to…
Presentation transcript:

Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST) Ido Dagan (Bar Ilan University) Bill Dolan (Microsoft Research) Bernardo Magnini (FBK-irst)

Textual Entailment Textual entailment is a directional relation between two text fragments –the entailing text, called t(ext), and the entailed text, called h(ypothesis), so that a human being, with common understanding of language and common background knowledge, can infer that h is most likely true on the basis of the content of t. NIST-Nov. 17, 2008TAC 20082

What was new in RTE 4 RTE was organised jointly by NIST and CELCT and proposed as a track of the Text Analysis Conference. Three-way annotation: introduced by NIST as a pilot task at the Workshop for Paraphrasing and Textual Entailment, ACL 2007, was proposed in the main task, where the systems were required to make a further distinction between pairs where the entailment does not hold because the content of H is contradicted by the content of T, and pairs where the entailment cannot be determined because the truth of H cannot be verified on the basis of the content of T. NIST-Nov. 17, 2008TAC 20083

Definition of the task Given two text snippets - t and h - the system must decide whether: 3 way task: –T entails H - in which case the pair is marked as ENTAILMENT –T contradicted H in which case the pair is marked as CONTRADICTION –The truth of H could not be determined on the basis of T, in which case the pair is marked as UNKNOWN 2-way task: –T entailed H, in which case the pair is marked as ENTAILMENT –T does not entailed H in which case the pair is marked as NO ENTAILMENT NIST-Nov. 17, 2008TAC 20084

Examples YES (entailment holds): –T: Spencer Dryden, the drummer of the legendary American rock band Jefferson Airplane, passed away on Tuesday, Jan. 11. He was 66. Dryden suffered from stomach cancer and heart disease. –H: Spencer Dryden died at 66. CONTRADICTION (T contradicts H): –T: Lower food prices pushed the UK's inflation rate down to 1.1% in August, the lowest level since The headline rate of inflation fell to 1.1% in August, pushed down by falling food prices. –H: Food prices are on the increase. UNKNOWN (not possible to determine the entailment) –T: Four people were killed and at least 20 injured when a tornado tore through an Iowa boy scout camp on Wednesday, where dozens of scouts were gathered for a summer retreat, state officials said. –H: Four boy scouts were killed by a tornado. NIST-Nov. 17, 2008TAC 20085

The Data Set No development set this year 1000 t-h pairs (IE and IR proved to be more difficult) –300 IE –300 IR –200 QA –200 SUM Longer t, with respect to RTE3 Distribution according the entailment: –50% ENTAILMENT –35% UNKNOWN –15% CONTRADICTION NIST-Nov. 17, 2008TAC 20086

Text sources The same as last year: –Output data (both correct and incorrect) of Web- based systems –Input data publicly released by official competitions –Freely available sources such as WikiNews and Wikipedia NIST-Nov. 17, 2008TAC 20087

Pair collection: IE setting Inspired by Information Extraction, where texts and structured templates are turned into t-h pairs. Simulates the need of IE systems to recognize that the given text entails the semantic relation that is expected to hold between the candidate template slot fillers. NIST-Nov. 17, 2008TAC 20088

Pair collection: QA setting From Question-Answer pairs to t-h pairs: –An answer term of the expected answer type is picked from the answer passage. –The question is turned into an affirmative sentence plugging in the answer term. –t-h pairs are generated, using the affirmative sentences as hypotheses and the original answer passages as texts- This process simulates the need of a QA system to verify that the retrieved passage text entails the provided answer. NIST-Nov. 17, 2008TAC 20089

Pair collection: SUM setting Given sentence pairs from the output of multi-document summarization systems, hypotheses are generated by removing sentence parts: –for positive examples, the hypothesis is simplified by removing sentence parts, until it is fully entailed by T. Negative examples – i.e. where the entailment does not hold- are produced in a similar way, i.e. taking away parts of T so that the final information contained in H either contradicts the content of T, or is not enough to determine the entailment. This process simulates the need of a summarization system to identify information redundancy, which should be avoided in the summary. NIST-Nov. 17, 2008TAC

Evaluation measures Automatic evaluation: – Accuracy (main evaluation measure): percentage of correct judgments against the Gold Standard –Average precision (for systems which returned a confidence score): average of the system's precision values at all points in the ranked list in which recall increases, that is at all points in the ranked list for which the gold standard annotation is YES. In the case of three-way judgment submissions the pairs tagged as CONTRADICTION and UNKNOWN were conflated and retagged as NO ENTAILMENT. NIST-Nov. 17, 2008TAC

Participants Participants at RTE4: 26 –RTE1 18 –RTE2 23 –RTE3 26 Provenance –USA: 9 –EU: 13 –ASIA: 4 Participants for tasks –8 at 3-way only –13 at 2-way only –5 at both NIST-Nov. 17, 2008TAC

Results: Average Accuracy THREE-WAY TASK 3-way2-way OverallIESUMIRQAOverallIESUMIRQA TWO-WAY TASK OverallIESUMIRQA 0.57 (0.61 at RTE3) (0.52) (0.58) (0.66) (0.71) NIST-Nov. 17, 2008TAC

Results: BEST RESULTS NIST-Nov. 17, 2008TAC RANKING THREE-WAY TASK TWO-WAY TASK 3-W2-W UAIC UAIC lcc OAQA10.616OAQA10.688UAIC DFKI10.614DFKI10.687DFKI DFKI20.606DFKI20.67DFKI QUANTA10.588QUANTA10.664DFKI DFKI30.56DFKI30.633QUANTA UMD10.556UMD10.619QUANTA UMD20.556UMD20.617DLSIUAES10.608

Resources WordNet, Extended WordNet, Extended WordNet Knowledge Base DIRT FrameNet, ProBank, VerbNet Entailment pairs Corpora (e.g. for estimating IDF) Antonym expressions Gazzetteers Wikipedia

Methods Lexical similarity –Word overlap, Edit distance, etc. Alignment based on syntactic representations –Tree Edit Distance, tree kernels Alignment based on committees Transformation based approaches –Probabilistic setting Individuate contradictions Machine learning –Classifiers take the final decision Logical inferences –Ontology based reasoning Combining specialized entailment engines –Voting, etc.

Conclusion RTE-4 organization moved to NIST with CELCT involved as coordinator High level of maturity and diffusion of textual entailment 3-way evaluation has been introduced