Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.

Slides:



Advertisements
Similar presentations
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Advertisements

Syntax-Semantics Mapping Rajat Kumar Mohanty CFILT.
VerbNet Martha Palmer University of Colorado LING 7800/CSCI September 16,
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.
Outline Linguistic Theories of semantic representation  Case Frames – Fillmore – FrameNet  Lexical Conceptual Structure – Jackendoff – LCS  Proto-Roles.
Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Semantic Role Labeling Abdul-Lateef Yussiff
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
1 NSF-ULA Sense tagging and Eventive Nouns Martha Palmer, Miriam Eckert, Jena D. Hwang, Susan Windisch Brown, Dmitriy Dligach, Jinho Choi, Nianwen Xue.
SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
LCS and Approximate Interlingua at UMD Semantic Annotation Planning Meeting April 14, 2004 Bonnie J. Dorr University of Maryland.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Comments to Karin Kipper Christiane Fellbaum Princeton University and Berlin-Brandenburgische Akademie der Wissenschaften.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
1 The BT Digital Library A case study in intelligent content management Paul Warren
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
A Language Independent Method for Question Classification COLING 2004.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Some questions -What is metadata? -Data about data.
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.
From Syntax to Semantics
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
COSC 6336: Natural Language Processing
Leonardo Zilio Supervisors: Prof. Dr. Maria José Bocorny Finatto
Semantic/Thematic Roles Oct 9, 2007 Christopher Manning
Coarse-grained Word Sense Disambiguation
[A Contrastive Study of Syntacto-Semantic Dependencies]
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Representation of Actions as an Interlingua
CSC 594 Topics in AI – Applied Natural Language Processing
CS224N Section 3: Corpora, etc.
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Unsupervised Learning of Narrative Schemas and their Participants
Progress report on Semantic Role Labeling
Presentation transcript:

Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006

Using Lexical Information Many interesting tasks require… Information about lexical items… and how they relate to each other. E.g., question answering. Q: Where are the grape arbors located? A:Every path from back door to yard was covered by a grape-arbor, and every yard had fruit trees.

Lexical Resources Wide variety of lexical resources available –VerbNet, PropBank, FrameNet, WordNet, etc. Each resource was created with different goals and different theoretical backgrounds. –Each resource has a different approach to defining word senses.

SemLink: Mapping Lexical Resources Different lexical resources provide us with different information. To make useful inferences, we need to combine this information. In particular: –PropBank -- How does a verb relate to its arguments? Includes annotated text. –VerbNet -- How do verbs w/ shared semantic & syntactic features (and their arguments) relate? –FrameNet -- How do verbs that describe a common scenario relate? –WordNet -- What verbs are synonymous? –Cyc -- How do verbs relate to a knowledge based ontology? *Martha Palmer, Edward Loper, Andrew Dolbey, Derek Trumbo, Karin Kipper, Szu-Ting Yi

PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each verb’s arguments Argument types are defined on a per-verb basis. –Consistent across uses of a single verb (sense) But the same tags are used (Arg0, Arg1, Arg2, …) –Arg0  proto-typical agent (Dowty) –Arg1  proto-typical patient

PropBank: cover (smear, put over) Arguments: –Arg0 = causer of covering –Arg1 = thing covered –Arg2 = covered with Example: John covered the bread with peanut butter.

PropBank: Trends in Argument Numbering Arg0 = proto-typical agent (Dowty) Agent (85%), Experiencer (7%), Theme (2%), … Arg1 = proto-typical patient (Dowty) Theme (47%),Topic (23%), Patient (11%), … Arg2 = Recipient (22%), Extent (15%), Predicate (14%), … Arg3 = Asset (33%), Theme2 (14%), Recipient (13%), … Arg4 = Location (89%), Beneficiary (5%), … Arg5 = Location (94%), Destination (6%)

PropBank: Adjunct Tags Variety of ArgM’s (Arg#>5): – TMP: when? – LOC: where at? – DIR: where to? – MNR:how? – PRP: why? – REC: himself, themselves, each other – PRD: this argument refers to or modifies another – ADV: others

Limitations to PropBank as Training Data Args2-5 seriously overloaded  poor performance –VerbNet and FrameNet both provide more fine-grained role labels Example Rudolph Agnew,…, was named [ARG2/Predicate a nonexecutive director of this British industrial conglomerate.] ….the latest results appear in today’s New England Journal of Medicine, a forum likely to bring new attention [ARG2/Destination to the problem.]

Limitations to PropBank as Training Data (2) WSJ too domain specific & too financial. Need broader coverage genres for more general annotation. –Additional Brown corpus annotation, also GALE data –FrameNet has selected instances from BNC

How Can SemLink Help? In PropBank, Arg2-Arg5 are overloaded. –But in VerbNet, the same thematic roles across verbs. PropBank training data is too domain specific. –Use VerbNet as a bridge to merge PropBank w/ FrameNet  Expand the size and variety of the training data

VerbNet Organizes verbs into classes that have common syntax/semantics linking behavior Classes include… –A list of member verbs (w/ WordNet senses) –A set of thematic roles (w/ selectional restr.s) –A set of frames, which define both syntax & semantics using thematic roles. Classes are organized hierarchically

VerbNet Example

What do mappings look like? 2 Types of mappings: –Type mappings describe which entries from two resources might correspond; and how their fields (e.g. arguments) relate. Potentially many-to-many Generated manually or semi-automatically –Token mappings tell us, for a given sentence or instance, which type mapping applies. Can often be thought of as a type of classifier –Built from a single corpus w/ parallel annotations Can also be though of as word sense disambiguation –Because each resource defines word senses differently!

Mapping Issues Mappings are often many-to-many –Different resources focus on different distinctions Incomplete coverage –A resource may be missing a relevant lexical item entirely. –A resource may have the relevant lexical item, but not in the appropriate category or w/ the appropriate sense Field mismatches –It may not be possible to map the field information for corresponding entries. (E.g., predicate arguments) Extra fields Missing fields Mismatched fields

VerbNet  PropBank Mapping: Type Mapping Verb class  Frame mapped when PropBank was created. –Doesn’t cover all verbs in the intersection of PropBank & VerbNet This intersection has grown significantly since PropBank was created. Argument mapping created semi-automatically Work is underway to extend coverage of both

VerbNet  PropBank Mapping: Token Mapping Built using parallel VerbNet/PropBank training data –Also allows direct training of VerbNet-based SRL VerbNet annotations generated semi-automatically –Two automatic methods: Use WordNet as an intermediary Check syntactic similarities –Followed by hand correction

Using SemLink: Semantic Role Labeling Overall goal: –Identify the semantic entities in a document & determine how they relate to one another. As a machine learning task: –Find the predicate words (verbs) in a text. –Identify the predicates’ arguments. –Label each argument with its semantic role. Train & test using PropBank

Current Problems for SRL PropBank role labels (Arg2-5) are not consistent across different verbs. –If we train within verbs, data is too sparse. –If we train across verbs, the output tags are too heterogeneous. Existing systems do not generalize well to new genes. –Training corpus (WSJ) contains a highly specialized genre, with many domain-specific verb senses. –Because of the verb-dependant nature of PropBank role labels, systems are forced to learn based on verb-specific features. –These features do not generalize well to new genres, where verbs are used with different word senses. –System performance drops on the Brown corpus

Improving SRL Performance w/ SemLink Existing PropBank role labels are too heterogeneous –So subdivide them into new role label sets, based on the SemLink mapping. Experimental Paradigm: –Subdivide existing PropBank roles based on what VerbNet thematic role (Agent, Patient, etc.) it is mapped to. –Compare the performance of: The original SRL system (trained on PropBank) The mapped SRL system (trained w/ subdivided roles)

Subdividing PropBank Roles Subdividing based on individual VerbNet theta roles leads to very sparse data. Instead, subdivide PropBank roles based on groups of VerbNet roles. Groupings created manually, based on analysis of argument use & suggestions from Karin Kipper. Two groupings: 1.Subdivide Arg1 into 6 new roles: Arg1 Group1, Arg1 Group2, …, Arg1 Group6 2.Subdivide Arg2 into 5 new roles: Arg2 Group1, Arg2 Group2, …, Arg2 Group5 Two test genres: Wall Street Journal & Brown Corpus

Arg1 groupings (Total count 59,710) Group1 (53.11%) Group2 (23.04%) Group3 (16%) Group4 (4.67%) Group5 (.20%) Theme; Theme1; Theme2; Predicate; Stimulus; Attribute TopicPatient; Product; Patient1; Patient2 Agent; Actor2; Cause; Experiencer Asset

Arg2 groupings (Total count 11,068) Group1 (43.93%) Group2 (14.74%) Group3 (32.13%) Group4 (6.81%) Group5 (2.39%) Recipient; Destination; Location; Source; Material; Beneficiary Extent; AssetPredicate; Attribute; Theme; Theme2; Theme1; Topic Patient2; Product Instrument; Actor2; Cause; Experiencer

Experimental Results: What do we expect? By subdividing PropBank roles, we make them more coherent. … so they should be easier to learn. But by creating more role categories, we increase data sparseness. … so they should be harder to learn. Arg1 is more coherent than Arg2 … so we expect more improvement from the Arg2 experiments. WSJ is the same genre that we trained on; Brown is a new genre. … so we expect more improvement from Brown corpus experiments.

Experimental Results: Wall Street Journal Corpus PrecisionRecallF1 Arg1-Original Arg1-Mapped Difference Arg2-Original Arg2-Mapped Difference

Experimental Results: Brown Corpus PrecisionRecallF1 Arg1-Original Arg1-Mapped Difference Arg2-Original Arg2-Mapped Difference

Conclusions By using more coherent semantic role labels, we can improve machine learning performance. –Can we use learnability to help evaluate role label sets? The process of mapping resources helps us improve them. –Helps us see what information is missing (e.g., roles). –Semi-automatically extend coverage. Mapping lexical resources allows to combine information in a single system. –Useful for QA, Entailment, IE, etc…