FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Slides:



Advertisements
Similar presentations
EVALITA 2009 Recognizing Textual Entailment (RTE) Italian Chapter Johan Bos 1, Fabio Massimo Zanzotto 2, Marco Pennacchiotti 3 1 University of Rome La.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.
Proposition Knowledge Graphs Gabriel StanovskyOmer LevyIdo Dagan Bar-Ilan University Israel 1.
Semantic Frames: FrameNet. What is FrameNet? FrameNet is an ongoing project at the International Computer Science Institute located in Berkeley California.
Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Shallow semantic parsing: Making most of limited training data Katrin Erk Sebastian Pado Saarland University.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Page 1 Relation Alignment for Textual Entailment Recognition Department of Computer Science University of Illinois at Urbana-Champaign Mark Sammons, V.G.Vinod.
Outline P1EDA’s simple features currently implemented –And their ablation test Features we have reviewed from Literature –(Let’s briefly visit them) –Iftene’s.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Artificial intelligence project
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
SALSA The Saarbrücken Lexical Semantics Annotation & Acquisition Project Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian Pado,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant,
2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
MASC The Manually Annotated Sub- Corpus of American English Nancy Ide, Collin Baker, Christiane Fellbaum, Charles Fillmore, Rebecca Passonneau.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Computational Semantics Day 5: Inference Aljoscha.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.
GermaNet-WS II A WordNet “Detour” to FrameNet Aljoscha Burchardt Katrin Erk Anette Frank* Saarland University, DFKI* Saarbrücken
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Have we had Hard Times or Cosy Times? A Discourse Analysis of Opinions Expressed over Socio-political Events in News Editorials Bal Krishna Bal Information.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.
A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Automatic Ontology Extraction Miloš Husák RASLAN 2010.
WP4 Models and Contents Quality Assessment
Automatically Labeled Data Generation for Large Scale Event Extraction
Learning Textual Entailment from Examples
Two Discourse Driven Language Models for Semantics
CS224N Section 3: Corpora, etc.
Unsupervised Learning of Narrative Schemas and their Participants
Information Retrieval
Presentation transcript:

FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008, Marrakech, 28 May 2008 SALSA II - The Saarbrücken Lexical Semantics Acquisition Project

Summary FrameNet and Textual Entailment FATE annotation schema Annotation examples and statistics Conclusions 28/05/20082 / 17FATE - Marco Pennacchiotti

Frame Semantics Frame: conceptual structure modeling a prototypical situation Frame Elements (FE): participants of the situation Frame Evoking elements (FEE): predicates evoking the situation [Fillmore 1976, 2003] 28/05/20083 / 17FATE - Marco Pennacchiotti Predicate-argument level normalizations FrameNet Berkeley Project 1 – Database of frames for the core lexicon of English – 800 frames, lemmas, annotated sentences (1) “Evelyn spoke about her past” “Evelyn’s statement about her past” STATEMENT( S PEAKER : Evelyn; T OPIC : her past )

Textual Entailment (TE) Given two text fragments, the Text T and the Hypothesis H, T entails H if the meaning of H can be inferred from the meaning of T, as would typically interpreted by people [Dagan 2005] Given two text fragments, the Text T and the Hypothesis H, T entails H if the meaning of H can be inferred from the meaning of T, as would typically interpreted by people [Dagan 2005] T: “Yahoo has recently acquired Overture” H: “Yahoo owns Overture” T  H Recognizing Textual Entailment (RTE) – recognize if entailment holds for a given (T,H) pair – Models core inferences of many NLP applications (QA, IE, MT,…) RTE Challenges [Dagan et al.,2005 ; Giampiccolo et al., 2007] – Compare systems for RTE – Corpus: 800 training pairs, 800 test pairs, evenly split in + and - pairs 28/05/20084 / 17FATE - Marco Pennacchiotti

Predicate-argument and RTE Predicate-level inference plays a relevant role in TE (20% of positive examples in RTE-2 [Garoufi, 2007] ) An avalanche has struck a popular skiing resort in Austria, killing at least 11 people. Humans died in an avalanche. Implementation gap : [Burchardt et al.,2007] : FrameNet system comparable to lexical overlap [Hickl et al.,2006] : PropBank-based features are not effective [Rana et al.,2005]: DIRT paraphrase repository does not help 28/05/20085 / 17FATE - Marco Pennacchiotti DEATH( P ROTAGONIST : 11 people / humans ; C AUSE : avalanche / avalanche )

FATE corpus Reference corpus: RTE-2 test set, 800 pairs, 29,000 tokens Frame resource : FrameNet version 1.3 Corpus Format : SALSA/TIGER XML [Burchardt et al.,2006] Pre-processing: annotation on top of Collins parser syntactic analysis : T and H are randomly reordered to avoid biases Annotation : performed by one highly experienced annotator : inter-annotator agreement over 5% of the corpus – FEE-agreement : 82% – Frame-agreement: 88% – Role-agreement: 91% : annotation carried out using the SALTO tool 1 (1) 28/05/20086 / 17FATE - Marco Pennacchiotti FATE: a manually frame-annotated Textual Entailment corpus, to study the role of frame semantics in RTE

FATE annotation process: an example 28/05/20087 / 17FATE - Marco Pennacchiotti Collins synt. an. full-text annotation (all words considered) [Ruppenhofer,2007]

FATE annotation process: an example 28/05/20088 / 17FATE - Marco Pennacchiotti frame FEE Collins synt. an.

FATE annotation process: an example 28/05/20089 / 17FATE - Marco Pennacchiotti frame FE Collins synt. an. FEE FE filler Maximization principle: chose the largest constituent possible when annotating

Annotation Schema Intuition: annotate as FEE only those words evoking a relevant situation (frame) in the sentence at hand – Very intuitive flavor, but high agreement: 83% on a pilot set of 15 sentences Relevance Principle “Authorities in Brazil hold 200 people as hostage” LEADERSHIPDETAINPEOPLE KIDNAPPING 28/05/ / 17FATE - Marco Pennacchiotti V ICTIM P LACE P ERPETRATOR

Annotation Schema On T of positive pairs, annotate only the fragments (spans) contributing to the inferential process – Spans are obtained from the ARTE annotation [Garoufi,2007] – For negative pairs it is not straightforward to derive spans, hence we do full annotation Span Annotation T: “Soon after the EZLN had returned to Chiapas, Congress approved a different version of the COCOPA Law, which did not include the autonomy clauses, claiming they were in contradiction with some constitutional rights (private property and secret voting); this was seen as a betrayal by the EZLN and other political groups.” H: “EZLN is a political group.” 28/05/ / 17FATE - Marco Pennacchiotti

Annotation Schema Unknown frames: use an U NKNOWN frame for words evoking situations not present in the FrameNet database Anaphora Copula and support verbs Modal expressions Metaphors Existential constructions … Other guidelines 28/05/ / 17FATE - Marco Pennacchiotti

Corpus statistics Annotated pairs : 800 (400 positive, 400 negatives) Annotated frames : 4,500 : avg. 5.6 frames per pair : 1,600 frames in positive pairs : 2,800 in negative pairs Annotated roles : 9,500 :avg. 2.1 roles per frame Annotation time: 230 hours : 90 h for positive pairs (13 min/pair) : 140 h for negative pairs (21 min/pair) 28/05/ / 17FATE - Marco Pennacchiotti

FrameNet and RTE (simple case) 28/05/ / 17FATE - Marco Pennacchiotti Syntactic normalization – Active / Passive EDUCATIONAL_TEACHING( S TUDENT : ground soldiers / soldiers; M ATERIAL : virtual reality/ virtual reality )

(1)Resource coverage is too low (2)Models for predicate-argument inference are weak (3)Automatic annotation models (SRL) are not good enough to be safely used in RTE Implementation gap insights 28/05/ / 17FATE - Marco Pennacchiotti FrameNet coverage is good: – 373 Unknown frames (8 % of total frames) – Unknown roles 1 % of total roles Coverage is unlikely to be a limiting factor for using FrameNet in applications

(1)Resource coverage is too low (2)Models for predicate-argument inference are weak (3)Automatic annotation models (SRL) are not good enough to be safely used in RTE 28/05/ / 17FATE - Marco Pennacchiotti To better study predicate-argument inference in RTE To experiment frame-RTE models on a gold-std corpus To learn better SRL models, by training on FATE Corpus is freely available on-line Why should you use FATE ?

Thank you! Questions? 28/03/2008FATE – Marco Pennacchiotti17 / 17 FATE download:

28/05/200818FATE - Marco Pennacchiotti

FrameNet and RTE Syntactic normalization – Apposition to copula 28/05/200819FATE - Marco Pennacchiotti PEOPLE_BY_VOCATION( P ERSON : Andreotti / Andreotti ; P LACE : Italy / Italy ; A GE : elder/ elder )

FrameNet and RTE 28/05/200820FATE - Marco Pennacchiotti Frame-to-frame inference Sentencing --- HR ---> Imprisonment C ONVICT maps to P RISONER P LACE maps to P LACE

Annotation Schema Locality principle – Annotate the local referent of a role filler – Link the local referent to the external referent via the A NAPHORA frame Anaphora 28/05/200821FATE - Marco Pennacchiotti

Annotation Schema Verbs carrying minimal semantic content (e.g. be, seem) Annotate the noun as FEE, instead of the verb [Ruppenhofer,2007] Support and Copula Verbs 28/05/200822FATE - Marco Pennacchiotti

Annotation Schema Modal expression (e.g. modal verbs, particles, modal triggers) are annotated only when the modal meaning is prevalent in the sentence Modal Expressions 28/05/200823FATE - Marco Pennacchiotti

Annotation Schema Metaphors are annotated with their figurative meaning Existential constructions (e.g. “there is”) are annotated with the frame E XISTENCE, only when it is the only meaning conveyed in the sentence (e.g. “There are 11 official languages”) Unknown frames: use an U NKNOWN frame for words evoking situations not present in the FrameNet database Maximization principle: chose the largest constituent possible when annotating Other guidelines 28/05/200824FATE - Marco Pennacchiotti

Motivations Semantic knowledge at the predicate-argument level is critical in NLP tasks: “From who did BMW buy Rover ?” “Rover was bought by BMW from British Aerospace” “BMW acquired Rover from British Aerospace” “BMW’s purchase of Rover from British Aerospace” “British Aerospace sold Rover to BMW” Predicate-argument resources (e.g. PropBank and FrameNet) allow to map meaning preserving alternations to the same predicative structure BUY_EVENT ( B UYER : BMW, S ELLER : British Aerospace, G OOD : Rover ) 28/05/200825FATE - Marco Pennacchiotti

Motivations Implementation gap: very scarce impact of predicate- argument resource in NLP applications [Fliedner,2007 ; Frank et al.,2006] Possible reasons: (1)Resource coverage is too low (2)Modeling predicate knowledge is too hard (3)Automatic annotation (SRL) is not good enough Our goal: create a gold-standard corpus, manually annotated with predicate-argument structure, to investigate (1)-(3) -Corpus : Second Recognizig Textual Entailment (RTE) Challenge -Annotation : FrameNet Our goal: create a gold-standard corpus, manually annotated with predicate-argument structure, to investigate (1)-(3) -Corpus : Second Recognizig Textual Entailment (RTE) Challenge -Annotation : FrameNet 28/05/200826FATE - Marco Pennacchiotti

FATE Corpus annotation: an example 28/05/200827FATE - Marco Pennacchiotti Collins synt. an. full-text annotation (all words considered) [Ruppenhofer,2007]

Frames are organized in a hierarchy with various frame-to-frame relations Frame Semantics [Fillmore 1976, 2003] LEGEND FrameNet Berkeley Project 1 – Database of frames for the core lexicon of English – 800 frames, lemmas, annotated sentences – Hierarchy: 7 frame relations, 1136 edges, 86 roots (1) 28/05/200828FATE - Marco Pennacchiotti

FATE Corpus annotation: an example 28/05/200829FATE - Marco Pennacchiotti frame FEE Collins synt. an.

FATE Corpus annotation: an example 28/05/200830FATE - Marco Pennacchiotti frame FE Collins synt. an. FEE FE filler Maximization principle: chose the largest constituent possible when annotating

FATE Corpus annotation: an example 28/05/200831FATE - Marco Pennacchiotti frame FE Collins synt. an. FEE FE filler DEATH( P ROTAGONIST : Hiddleston / person; C AUSE : avalanche )

FrameNet and Salsa Project FrameNet Berkeley Project 1 – Database of frames for the core lexicon of English – 800 frames, lemmas, annotated sentences from BNC SALSA Project 2 – A German corpus with frame annotation ( verbal instances) – Semantic frame-based lexicon for German – Methods for automation and application of frame- semantic information (SRL, RTE, discourse interpretation, etc.) (1) (2) 28/05/200832FATE - Marco Pennacchiotti

Annotation Schema On T of positive pairs, annotate only the fragments (spans) contributing to the inferential process – Spans are obtained from the ARTE annotation [Garoufi,2007] – For negative pairs it is not straightforward to derive spans, hence we do full annotation Span Annotation T: “Soon after the EZLN had returned to Chiapas, Congress approved a different version of the COCOPA Law, which did not include the autonomy clauses, claiming they were in contradiction with some constitutional rights (private property and secret voting); this was seen as a betrayal by the EZLN and other political groups.” H: “EZLN is a political group.” 28/05/200833FATE - Marco Pennacchiotti

FrameNet and RTE 28/05/200834FATE - Marco Pennacchiotti Frame-to-frame inference KILLING --- cause ---> DEATH C AUSE maps to C AUSE V ICTIM maps to P ROTAGONIST