10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University.

Slides:



Advertisements
Similar presentations
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Advertisements

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Relation Extraction.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Multilinugual PennTools that capture parses and predicate-argument structures, and their use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus,
FrameNet, PropBank, VerbNet Rich Pell. FrameNet, PropBank, VerbNet  When syntactic information is not enough  Lexical databases  Annotate a natural.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Outline Linguistic Theories of semantic representation  Case Frames – Fillmore – FrameNet  Lexical Conceptual Structure – Jackendoff – LCS  Proto-Roles.
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
Statistical NLP: Lecture 3
Semantic Role Labeling Abdul-Lateef Yussiff
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
PropBanks, 10/30/03 1 Penn Putting Meaning Into Your Trees Martha Palmer Paul Kingsbury, Olga Babko-Malaya, Scott Cotton, Nianwen Xue, Shijong Ryu, Ben.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Drexel – 4/22/13 1/39 Treebank Analysis Using Derivation Trees Seth Kulick
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Introduction to treebanks Session 1: 7/08/
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer, Dan Gildea, Paul Kingsbury University of Pennsylvania February.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
CIS630 1 Penn Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa.
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
Korean Treebank & Propbank Martha Palmer, Narae Han, Jinyoung Choi, Shijong Ryu University of Pennsylvania May 23, 2005.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Semantic Role Labeling: English PropBank
MASC The Manually Annotated Sub- Corpus of American English Nancy Ide, Collin Baker, Christiane Fellbaum, Charles Fillmore, Rebecca Passonneau.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Natural Language Processing
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Supertagging CMSC Natural Language Processing January 31, 2006.
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,
CSE391 – 2005 NLP 1 Events From KRR lecture. CSE391 – 2005 NLP 2 Ask Jeeves – A Q/A, IR ex. What do you call a successful movie? Tips on Being a Successful.
ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus.
CIS630, 9/13/04 1 Penn Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004.
Natural Language Processing Vasile Rus
COSC 6336: Natural Language Processing
Leonardo Zilio Supervisors: Prof. Dr. Maria José Bocorny Finatto
Semantic/Thematic Roles Oct 9, 2007 Christopher Manning
English Proposition Bank: Status Report
Coarse-grained Word Sense Disambiguation
[A Contrastive Study of Syntacto-Semantic Dependencies]
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Progress report on Semantic Role Labeling
Owen Rambow 6 Minutes.
Presentation transcript:

10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University

10/9/01PropBank2 Outline  Overview ( Ace consensus: BBN,NYU,MITRE,Penn)  Motivation  Approach Guidelines, lexical resources, frame sets Tagging process, hand correction of automatic tagging  Status: accuracy, progress  Colleagues: Joseph Rosenzweig, Paul Kingsbury, Hoa Dang, Karin Kipper, Scott Cotton, Laren Delfs, Christiane Fellbaum

10/9/01PropBank3 Proposition Bank: Generalizing from Sentences to Propositions Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)

10/9/01PropBank4 Penn English Treebank  1.3 million words  Wall Street Journal and other sources  Tagged with Part-of-Speech  Syntactically Parsed  Widely used in NLP community  Available from Linguistic Data Consortium

10/9/01PropBank5 A TreeBanked Sentence Analysts S NP-SBJ VP have VPbeenVP expecting NP a GM-Jaguar pact NP that SBAR WHNP-1 *T*-1 S NP-SBJ VP would VP give the US car maker NP an eventual 30% stake NP the British company NP PP-LOC in (S (NP-SBJ Analysts) (VP have (VP been (VP expecting (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) (VP would (VP give (NP the U.S. car maker) (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.

10/9/01PropBank6 The same sentence, PropBanked Analysts have been expecting a GM-Jaguar pact Arg0 Arg1 (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake)

10/9/01PropBank7 Motivation  Why do we need accurate predicate-argument relations?  They have a major impact on Information Processing.  Ex: Korean/English Machine Translation: ARL/SBIR CoGenTex, Penn, Systran (K/E Bilinugal Lexicon, 20K) 4K words ( < 500 words from Systran, military messages) Plug and play architecture based on DsyntS (rich dependency structure) Converter bug led to random relabeling of predicate arguments Correction of predicate argument labels alone led to tripling of acceptable sentence output

10/9/01PropBank8 Focusing on Parser comparisons  200 sentences hand selected to represent “good” translations given a correct parse.  Used to compare: Corrected DsyntS output Juntae’s parser output (off-the-shelf) Anoop’s parser output (Treebank trained, 95% F)

10/9/01PropBank9 Evaluating translation quality  Compare DLI Human translation to system output (200)  Criteria used by human judges (2 or more, not blind) [g] = good, exactly right [f1] = fairly good, but small grammatical mistakes [f2] = Needs fixing, but vocabulary basically there [f3] = Needs quite a bit of fixing, usually some un-translated vocabulary, but most v. is right [m] = seems grammatical, but semantically wrong, actually misleading [i] = irredeemable, really wrong, major problems

10/9/01PropBank10 Results Comparison = 200 sent.

10/9/01PropBank11 Plug and play?  Converter used to map Parser outputs into MT DsyntS format Bug in the converter affected both systems Predicate argument structure labels were being lost in the conversion process, relabeled randomly  The converter was also still tuned to Juntae’s parse output, needed to be customized to Anoop’s

10/9/01PropBank12 Anoop’s parse -> MTW DsyntS –0010Target: Unit designations are normally transmitted in code. –0010Corrected: Normally unit designations are notified in the code. –0010Anoop: Normally it is notified unit designations in code. notified unit normally code designations C = Arg1 P = Arg0

10/9/01PropBank13 Anoop’s parse -> MTW DsyntS 0022Target: Under what circumstances does radio inteference occur? 0022Corrected: In what circumstances does the interference happen in the radio? 0022Anoop: Do in what circumstance happen interference in radio? happen what radio interference circumstances C = Arg0 P = ArgM C = Arg1 P = Arg0

10/9/01PropBank14 New and Old Results Comparison

10/9/01PropBank15 English PropBank  1M words of Treebank over 2 years, May’01-03  New semantic augmentations Predicate-argument relations for verbs label arguments: Arg0, Arg1, Arg2, … First subtask, 300K word financial subcorpus (12K sentences, 35K+ predicates)  Spin-off: Guidelines (necessary for annotators) English lexical resource verbs with labeled examples, rich semantics

10/9/01PropBank16 Task: not just undoing passives  The earthquake shook the building.  The walls shook; the building rocked. ;  The guidelines = lexicon with examples: Frames Files

10/9/01PropBank17 Guidelines: Frames Files  Created manually – Paul Kingsbury working on semi-automatic expansion  Refer to VerbNet, WordNet and Framenet  Currently in place for 230 verbs Can expand to using VerbNet Will need hand correction  Use “semantic role glosses” unique to each verb (map to Arg0, Arg1 labels appropriate to class)

10/9/01PropBank18 Frames Example: expect Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

10/9/01PropBank19 Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

10/9/01PropBank20 The same sentence, PropBanked Analysts have been expecting a GM-Jaguar pact Arg0 Arg1 (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake)

10/9/01PropBank21 Complete Sentence Analysts have been expecting a GM-Jaguar pact that *T*-1 would give the U.S. car maker an eventual 30% stake in the British company and create joint ventures that *T*-2 would produce an executive-model range of cars.

10/9/01PropBank22 How are arguments numbered?  Examination of example sentences  Determination of required / highly preferred elements  Sequential numbering, Arg0 is typical first argument, except Oergative/unaccusative verbs (shake example) OArguments mapped for "synonymous" verbs

10/9/01PropBank23 Additional tags (arguments or adjuncts?)  Variety of ArgM’s (Arg#>4): TMP - when? LOC - where at? DIR - where to? MNR - how? PRP -why? REC - himself, themselves, each other PRD -this argument refers to or modifies another ADV -others

10/9/01PropBank24 Tense/aspect  Verbs also marked for tense/aspect OPassive OPerfect OProgressive OInfinitival  Modals and negation marked as ArgMs

10/9/01PropBank25 Ergative/Unaccusative Verbs: rise Roles Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. *Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could have used ArgM-Source, ArgM-Goal. Arbitrary distinction.

10/9/01PropBank26 Synonymous Verbs: add in sense rise Roles: Arg1 = Logical subject, patient, thing rising/gaining/being added to Arg2 = EXT, amount risen Arg4 = end point The Nasdaq composite index added 1.01 to on paltry volume.

10/9/01PropBank27 Phrasal Verbs  Put together  Put in  Put off  Put on  Put out  Put up ...

10/9/01PropBank28 Frames: Multiple Rolesets  Rolesets are not necessarily consistent between different senses of the same verb Verb with multiple senses can have multiple frames, but not necessarily  Roles and mappings onto argument labels are consistent between different verbs that share similar argument structures, Similar to Framenet Levin / VerbNet classes  Out of the 179 most frequent verbs: 1 Roleset – 92 2 rolesets – rolesets – 42 (includes light verbs)

10/9/01PropBank29 Annotation procedure  Extraction of all sentences with given verb  First pass – automatic tagging  Second pass: Double blind hand correction Variety of backgrounds less syntactic training than for treebanking  Script to discover discrepancies  Third pass: Solomonization (adjudication)

10/9/01PropBank30 Inter-annotator agreement

10/9/01PropBank31 Annotator Accuracy vs. Gold Standard  One version of annotation chosen (sr. annotator)  Solomon modifies => Gold Standard

10/9/01PropBank32 Status  179 verbs framed (+ Senseval2 verbs)  97 verbs first-passed O12,300+ predicates ODoes not include ~3000 predicates tagged for Senseval  54 verbs second-passed O6600+ predicates  9 verbs solomonized O885 predicates

10/9/01PropBank33 Throughput  Framing: approximately 2 verbs per hour  Annotation: approximately 50 sentences per hour  Solomonization: approximately 1 hour per verb

10/9/01PropBank34 Automatic Predicate Argument Tagger  Predicate argument labels Uses TreeBank “cues” Consults lexical semantic KB —Hierarchically organized verb subcategorization frames and alternations associated with tree templates —Ontology of noun-phrase referents —Multi-word lexical items Matches annotated tree templates against parse in Tree- adjoining Grammar style standoff annotation in external file referencing treenodes  Preliminary accuracy rate of 83.7% (800+ predicates)

10/9/01PropBank35 Summary  Predicate-argument structure labels are arbitrary to a certain degree, but still consistent, and generic enough to be mappable to particular theoretical frameworks  Automatic tagging as a first pass makes the task feasible  Agreement and accuracy figures are reassuring

10/9/01PropBank36 Solomonization Source tree: Intel told analysts that the company will resume shipments of the chips within two to three weeks. *** kate said: arg0 : Intel arg1 : the company will resume shipments of the chips within two to three weeks arg2 : analysts *** erwin said: arg0 : Intel arg1 : that the company will resume shipments of the chips within two to three weeks arg2 : analysts

10/9/01PropBank37 Solomonization Such loans to Argentina also remain classified as non-accruing, *TRACE*-1 costing the bank $ 10 million *TRACE*-*U* of interest income in the third period. *** kate said: argM-TMP : in the third period arg3 : the bank arg2 : $ 10 million *TRACE*-*U* of interest income arg1 : *TRACE*-1 *** erwin said: argM-TMP : in the third period arg3 : the bank arg2 : $ 10 million *TRACE*-*U* of interest income arg1 : *TRACE*-1 Such loans to Argentina

10/9/01PropBank38 Solomonization Also, substantially lower Dutch corporate tax rates helped the company keep its tax outlay flat relative to earnings growth. *** kate said: argM-MNR : relative to earnings growth arg3-PRD : flat arg1 : its tax outlay arg0 : the company *** katherine said: argM-ADV : relative to earnings growth arg3-PRD : flat arg1 : its tax outlay arg0 : the company