CIS630 1 Penn Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Syntax-Semantics Mapping Rajat Kumar Mohanty CFILT.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Relation Extraction.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
VerbNet Martha Palmer University of Colorado LING 7800/CSCI September 16,
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Outline Linguistic Theories of semantic representation  Case Frames – Fillmore – FrameNet  Lexical Conceptual Structure – Jackendoff – LCS  Proto-Roles.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
Statistical NLP: Lecture 3
Semantic Role Labeling Abdul-Lateef Yussiff
10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University.
PropBanks, 10/30/03 1 Penn Putting Meaning Into Your Trees Martha Palmer Paul Kingsbury, Olga Babko-Malaya, Scott Cotton, Nianwen Xue, Shijong Ryu, Ben.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
The Relevance of a Cognitive Model of the Mental Lexicon to Automatic Word Sense Disambiguation Martha Palmer and Susan Brown University of Colorado August.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
DS-to-PS conversion Fei Xia University of Washington July 29,
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer, Dan Gildea, Paul Kingsbury University of Pennsylvania February.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
Embedded Clauses in TAG
Comments to Karin Kipper Christiane Fellbaum Princeton University and Berlin-Brandenburgische Akademie der Wissenschaften.
Transitivity / Intransitivity Lecture 7. (IN)TRANSITIVITY is a category of the VERB Verbs which require an OBJECT are called TRANSITIVE verbs. My son.
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Semantic Role Labeling: English PropBank
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Linguistic Essentials
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Supertagging CMSC Natural Language Processing January 31, 2006.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
CSE391 – 2005 NLP 1 Events From KRR lecture. CSE391 – 2005 NLP 2 Ask Jeeves – A Q/A, IR ex. What do you call a successful movie? Tips on Being a Successful.
Princeton 11/06/03 1 Penn Putting Meaning Into Your Trees Martha Palmer University of Pennsylvania Princeton Cognitive Science Laboratory November 6, 2003.
CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
CIS630, 9/13/04 1 Penn Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004.
English Proposition Bank: Status Report
Coarse-grained Word Sense Disambiguation
SENSEVAL: Evaluating WSD Systems
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
WordNet WordNet, WSD.
Linguistic Essentials
CS224N Section 3: Corpora, etc.
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Progress report on Semantic Role Labeling
Presentation transcript:

CIS630 1 Penn Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa Dang, Szuting Yi, Edward Loper, Jinying Chen, Tom Morton, William Schuler, Fei Xia, Joseph Rosenzweig, Dan Gildea, Christiane Fellbaum September 8, 2003

CIS630 2 Penn Elusive nature of “meaning” Natural Language Understanding Natural Language Processing or Natural Language Engineering Empirical techniques rule!

CIS630 3 Penn Statistical Machine Translation results  CHINESE TEXT  The japanese court before china photo trade huge & lawsuit.  A large amount of the proceedings before the court dismissed workers.  japan’s court, former chinese servant industrial huge disasters lawsuit.  Japanese Court Rejects Former Chinese Slave Workers’ Lawsuit for Huge Compensation.

CIS630 4 Penn Leverage from shallow techniques?  Still need an approximation of meaning for accurate MT, IR, Q&A, IE  Sense tagging  Labeled dependency structures  What do we have as available resources?  What can we do with them?

CIS630 5 Penn Outline  Introduction – need for semantics  Sense tagging Issues highlighted by Senseval1  VerbNet  Senseval2 – groupings, impact on ITA  Automatic WSD, impact on scores  Proposition Bank  Framesets, automatic role labellers  Hierarchy of sense distinctions  Mapping VerbNet to PropBank

CIS630 6 Penn WordNet - Princeton  On-line lexical reference (dictionary)  Words organized into synonym sets concepts  Hypernyms (ISA), antonyms, meronyms (PART)  Useful for checking selectional restrictions  (doesn’t tell you what they should be)  Typical top nodes - 5 out of 25  (act, action, activity)  (animal, fauna)  (artifact)  (attribute, property)  (body, corpus)

CIS630 7 Penn WordNet – president, 6 senses 1.president -- (an executive officer of a firm or corporation) -->CORPORATE EXECUTIVE, BUSINESS EXECUTIVE…  LEADER 2. President of the United States, President, Chief Executive -- (the person who holds the office of head of state of the United States government; "the President likes to jog every morning") -->HEAD OF STATE, CHIEF OF STATE 3. president -- (the chief executive of a republic) -->HEAD OF STATE, CHIEF OF STATE 4. president, chairman, chairwoman, chair, chairperson -- (the officer who presides at the meetings of an organization; "address your remarks to the chairperson") --> PRESIDING OFFICER  LEADER 5. president -- (the head administrative officer of a college or university) --> ACADEMIC ADMINISTRATOR ….  LEADER 6. President of the United States, President, Chief Executive -- (the office of the United States head of state; "a President is elected every four years") --> PRESIDENCY, PRESIDENTSHIP  POSITION

CIS630 8 Penn Limitations to WordNet  Poor inter-annotator agreement (73%)  Just sense tags - no representations  Very little mapping to syntax  No predicate argument structure  no selectional restrictions  No generalizations about sense distinctions  No hierarchical entries

CIS630 9 Penn SIGLEX98/SENSEVAL  Workshop on Word Sense Disambiguation  54 attendees, 24 systems, 3 languages  34 Words ( Nouns, Verbs, Adjectives )  Both supervised and unsupervised systems  Training data, Test data  Hector senses - very corpus based (mapping to WordNet)  lexical samples - instances, not running text  Inter-annotator agreement over 90% ACL-SIGLEX98,SIGLEX99, CHUM00

CIS Penn Hector - bother, 10 senses  1. intransitive verb, - (make an effort), after negation, usually with to infinitive; (of a person) to take the trouble or effort needed (to do something). Ex. “About 70 percent of the shareholders did not bother to vote at all.”  1.1 (can't be bothered), idiomatic, be unwilling to make the effort needed (to do something), Ex. ``The calculations needed are so tedious that theorists cannot be bothered to do them.''  2. vi; after neg; with `about" or `with"; rarely cont – (of a person) to concern oneself (about something or someone) “He did not bother about the noise of the typewriter because Danny could not hear it above the sound of the tractor.”  2.1 v-passive; with `about" or `with“ - (of a person) to be concerned about or interested in (something) “The only thing I'm bothered about is the well-being of the club.”

CIS Penn Mismatches between lexicons: Hector - WordNet, shake

CIS Penn Levin classes (3100 verbs)  47 top level classes, 193 second and third level  Based on pairs of syntactic frames. John broke the jar. / Jars break easily. / The jar broke. John cut the bread. / Bread cuts easily. / *The bread cut. John hit the wall. / *Walls hit easily. / *The wall hit.  Reflect underlying semantic components contact, directed motion, exertion of force, change of state  Synonyms, syntactic patterns (conative), relations

CIS Penn Confusions in Levin classes?  Not semantically homogenous  {braid, clip, file, powder, pluck, etc...}  Multiple class listings  homonymy or polysemy?  Alternation contradictions?  Carry verbs disallow the Conative, but include  {push,pull,shove,kick,draw,yank,tug}  also in Push/pull class, does take the Conative

CIS Penn Intersective Levin classes

CIS Penn Regular Sense Extensions John pushed the chair. +force, +contact John pushed the chairs apart. +ch-state John pushed the chairs across the room. +ch- loc John pushed at the chair. -ch-loc The train whistled into the station. +ch-loc The truck roared past the weigh station. +ch-loc AMTA98,ACL98,TAG98

CIS Penn Intersective Levin Classes  More syntactically and semantically coherent  sets of syntactic patterns  explicit semantic components  relations between senses VERBNET

CIS Penn VerbNet  Computational verb lexicon  Clear association between syntax and semantics  Syntactic frames (LTAGs) and selectional restrictions (WordNet)  Lexical semantic information – predicate argument structure  Semantic components represented as predicates  Links to WordNet senses  Entries based on refinement of Levin Classes  Inherent temporal properties represented explicitly  during(E), end(E), result(E) TAG00, AAAI00, Coling00

CIS Penn VerbNet Class entries:  Verb classes allow us to capture generalizations about verb behavior  Verb classes are hierarchically organized  Members have common semantic elements, thematic roles, syntactic frames and coherent aspect Verb entries:  Each verb can refer to more than one class (for different senses)  Each verb sense has a link to the appropriate synsets in WordNet (but not all senses of WordNet may be covered)  A verb may add more semantic information to the basic semantics of its class

Basic TransitiveA V P cause(Agent,E) /\ manner (during(E),directedmotion,Agent)/\ manner (end(E), forceful,Agent)/\ contact(end(E),Agent,Patient) ConativeAV at Pmanner (during (E), directedmotion, Agent) ¬contact(end(E),Agent,Patient) With/against alternationA V I against/on P cause(Agent, E) /\ manner(during (E),directedmotion, Instr)/\ manner(end(E), forceful, Instr)/\ contact (end(E), Instr, Patient) MEMBERS:[bang(1,3),bash(1),... hit(2,4,7,10), kick (3),...] THEMATIC ROLES:Agent, Patient, Instrument SELECT RESTRICTIONS:Agent(int_control), Patient(concrete), Instrument(concrete) FRAMES and PREDICATES: Hit class – hit-18.1

CIS Penn VERBNET

CIS Penn VerbNet/WordNet

CIS Penn Mapping WN-Hector via VerbNet SIGLEX99, LREC00

CIS Penn SENSEVAL2 –ACL’01 Adam Kilgarriff, Phil Edmond and Martha Palmer All-words taskLexical sample task CzechBasque DutchChineseEnglish EstonianItalian Japanese Korean Spanish Swedish

CIS Penn English Lexical Sample - Verbs  Preparation for Senseval 2  manual tagging of 29 highly polysemous verbs (call, draw, drift, carry, find, keep, turn,...)  WordNet (pre-release version 1.7)  To handle unclear sense distinctions  detect and eliminate redundant senses  detect and cluster closely related senses NOT ALLOWED

CIS Penn WordNet – call, 28 senses 1.name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; "Take two aspirin and call me in the morning") ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") -> LABEL

CIS Penn WordNet – call, 28 senses 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER 5. shout, shout out, cry, call, yell, scream, holler, hollo, squall -- (utter a sudden loud cry; "she cried with pain when the doctor inserted the needle"; "I yelled to her from the window but she couldn't hear me") -> UTTER 6. visit, call in, call -- (pay a brief visit; "The mayor likes to call on some of the prominent citizens") -> MEET

CIS Penn Groupings Methodology  Double blind groupings, adjudication  Syntactic Criteria (VerbNet was useful)  Distinct subcategorization frames  call him a bastard  call him a taxi  Recognizable alternations – regular sense extensions:  play an instrument  play a song  play a melody on an instrument

CIS Penn Groupings Methodology (cont.)  Semantic Criteria  Differences in semantic classes of arguments  Abstract/concrete, human/animal, animate/inanimate, different instrument types,…  Differences in the number and type of arguments  Often reflected in subcategorization frames  John left the room.  I left my pearls to my daughter-in-law in my will.  Differences in entailments  Change of prior entity or creation of a new entity?  Differences in types of events  Abstract/concrete/mental/emotional/….  Specialized subject domains

CIS Penn WordNet: - call, 28 senses WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24

CIS Penn WordNet: - call, 28 senses, groups WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24, Phone/radio Label Loud cry Bird or animal cry Request Call a loan/bond Visit Challenge Bid

CIS Penn WordNet – call, 28 senses, Group1 1.name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") --> LABEL 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") --> LABEL 19. call -- (consider or regard as being; "I would not call her beautiful")--> SEE 22. address, call -- (greet, as with a prescribed form, title, or name; "He always addresses me with `Sir'"; "Call me Mister"; "She calls him by first name") --> ADDRESS

CIS Penn Sense Groups: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20

CIS Penn Results – averaged over 28 verbs CallDevelopTotal WN/corpus 28/14 21/ /10.83 Grp/corp 11/79/68.07/5.90 Entropy ITA-fine 69%67%71% ITA-coarse 89%85%82%

CIS Penn Maximum Entropy WSD Hoa Dang (in progress)  Maximum entropy framework  combines different features with no assumption of independence  estimates conditional probability that W has sense X in context Y, (where Y is a conjunction of linguistic features  feature weights are determined from training data  weights produce a maximum entropy probability distribution

CIS Penn Features used  Topical contextual linguistic feature for W:  presence of automatically determined keywords in S  Local contextual linguistic features for W:  presence of subject, complements  words in subject, complement positions, particles, preps  noun synonyms and hypernyms for subjects, complements  named entity tag (PERSON, LOCATION,..) for proper Ns  words within +/- 2 word window

CIS Penn Grouping improved sense identification for MxWSD  75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses  Most commonly confused senses suggest grouping:  (1) name, call--assign a specified proper name to; ``They called their son David''  (2) call--ascribe a quality to or give a name that reflects a quality; ``He called me a bastard'';  (3) call--consider or regard as being; ``I would not call her beautiful''  (4) address, call--greet, as with a prescribed form, title, or name; ``Call me Mister''; ``She calls him by his first name''

CIS Penn Results – averaged over 28 verbs Total WN/corpus 16.28/10.83 Grp/corp 8.07/5.90 Entropy 2.81 ITA-fine 71% ITA-coarse 82% MX-fine 59% MX-coarse 69%

CIS Penn Results - first 5 Senseval2 verbs VerbBeginCallCarryDevelopDrawDress WN/corpus 10/9 28/14 39/2221/1635/2115/8 Grp/corp 10/911/716/119/615/97/4 Entropy ITA-fine ITA-coarse MX-fine MX-coarse

CIS Penn Summary of WSD  Choice of features is more important than choice of machine learning algorithm  Importance of syntactic structure (English WSD but not Chinese)  Importance of dependencies  Importance of an hierarchical approach to sense distinctions, and quick adaptation to new usages.

CIS Penn Outline  Introduction – need for semantics  Sense tagging Issues highlighted by Senseval1  VerbNet  Senseval2 – groupings, impact on ITA  Automatic WSD, impact on scores  Proposition Bank  Framesets, automatic role labellers  Hierarchy of sense distinctions  Mapping VerbNet to PropBank

CIS Penn Proposition Bank: From Sentences to Propositions Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)

CIS Penn Capturing semantic roles*  Charles broke [ ARG1 the LCD Projector.]  [ARG1 The windows] were broken by the hurricane.  [ARG1 The vase] broke into pieces when it toppled over. SUBJ *See also Framenet,

CIS Penn A TreeBanked Sentence Analysts S NP-SBJ VP have VP beenVP expecting NP a GM-Jaguar pact NP that SBAR WHNP-1 *T*-1 S NP-SBJ VP would VP give the US car maker NP an eventual 30% stake NP the British company NP PP-LOC in (S (NP-SBJ Analysts) (VP have (VP been (VP expecting (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) (VP would (VP give (NP the U.S. car maker) (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.

CIS Penn The same sentence, PropBanked Analysts have been expecting a GM-Jaguar pact Arg0 Arg1 (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake)

CIS Penn English PropBank  1M words of Treebank over 2 years, May’01-03  New semantic augmentations  Predicate-argument relations for verbs  label arguments: Arg0, Arg1, Arg2, …  First subtask, 300K word financial subcorpus (12K sentences, 29K predicates,1700 lemmas)  Spin-off: Guidelines  FRAMES FILES - (necessary for annotators)  verbs with labeled examples, rich semantics, 118K predicates

CIS Penn Frames Example: expect Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

CIS Penn Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

CIS Penn How are arguments numbered?  Examination of example sentences  Determination of required / highly preferred elements  Sequential numbering, Arg0 is typical first argument, except  ergative/unaccusative verbs (shake example)  Arguments mapped for "synonymous" verbs

CIS Penn Trends in Argument Numbering  Arg0 = agent  Arg1 = direct object / theme / patient  Arg2 = indirect object / benefactive / instrument / attribute / end state  Arg3 = start point / benefactive / instrument / attribute  Arg4 = end point

CIS Penn Additional tags (arguments or adjuncts?)  Variety of ArgM’s (Arg#>4):  TMP - when?  LOC - where at?  DIR - where to?  MNR - how?  PRP -why?  REC - himself, themselves, each other  PRD -this argument refers to or modifies another  ADV -others

CIS Penn Inflection  Verbs also marked for tense/aspect  Passive/Active  Perfect/Progressive  Third singular (is has does was)  Present/Past/Future  Infinitives/Participles/Gerunds/Finites  Modals and negation marked as ArgMs

CIS Penn Phrasal Verbs  Put together  Put in  Put off  Put on  Put out  Put up ...

CIS Penn Ergative/Unaccusative Verbs: rise Roles Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. *Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could have used ArgM-Source, ArgM-Goal. Arbitrary distinction.

CIS Penn Synonymous Verbs: add in sense rise Roles: Arg1 = Logical subject, patient, thing rising/gaining/being added to Arg2 = EXT, amount risen Arg4 = end point The Nasdaq composite index added 1.01 to on paltry volume.

CIS Penn Annotation procedure  Extraction of all sentences with given verb  First pass: Automatic tagging (Joseph Rosenzweig)   Second pass: Double blind hand correction  Variety of backgrounds  Less syntactic training than for treebanking  Tagging tool highlights discrepancies  Third pass: Solomonization (adjudication)

CIS Penn Inter-Annotator Agreement

CIS Penn Solomonization Also, substantially lower Dutch corporate tax rates helped the company keep its tax outlay flat relative to earnings growth. *** Kate said: arg0 : the company arg1 : its tax outlay arg3-PRD : flat argM-MNR : relative to earnings growth *** Katherine said: arg0 : the company arg1 : its tax outlay arg3-PRD : flat argM-ADV : relative to earnings growth

CIS Penn Automatic Labelling of Semantic Relations Features:  Predicate  Phrase Type  Parse Tree Path  Position (Before/after predicate)  Voice (active/passive)  Head Word

CIS Penn Labelling Accuracy-Known Boundaries Automatic Gold Standard PropBank > 10 instances PropBankFramenetParses Accuracy of semantic role prediction for known boundaries--the system is given the constituents to classify. Framenet examples (training/test) are handpicked to be unambiguous.

CIS Penn Labelling Accuracy – Unknown Boundaries Automatic Gold Standard PropBank Precision Recall Framenet Precision Recall Parses Accuracy of semantic role prediction for unknown boundaries--the system must identify the constituents as arguments and give them the correct roles.

CIS Penn Additional Automatic Role Labelers  Szuting Yi –  EM clustering, unsupervised  Conditional Random Fields  Yinying Chen - using role labels as features for WSD,  decision trees, supervised,  EM clustering, unsupervised

CIS Penn Outline  Introduction – need for semantics  Sense tagging Issues highlighted by Senseval1  VerbNet  Senseval2 – groupings, impact on ITA  Automatic WSD, impact on scores  Proposition Bank  Framesets, automatic role labellers  Hierarchy of sense distinctions  Mapping VerbNet to PropBank

CIS Penn Frames: Multiple Framesets  Framesets are not necessarily consistent between different senses of the same verb  Verb with multiple senses can have multiple frames, but not necessarily  Roles and mappings onto argument labels are consistent between different verbs that share similar argument structures, Similar to Framenet  Levin / VerbNet classes   Out of the 720 most frequent verbs:  1 frameset  2 framesets  3+ framesets - 95 (includes light verbs)

CIS Penn Word Senses in PropBank  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room  Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses as in WordNet?

CIS Penn WordNet: - leave, 14 senses WN1 WN5 WN3 WN7 WN8 WN2 WN12 WN9 WN10 WN13 WN14 WN4 WN6 WN11

CIS Penn WordNet: - leave, groups WN1 WN5 WN3 WN7 WN8 WN2 WN12 WN9 WN10 WN13 WN14 WN4 WN6 WN11

CIS Penn WordNet: - leave, framesets WN1 WN5 WN3 WN7 WN8 WN2 WN12 WN9 WN10 WN13 WN14 WN4 WN6 WN11

CIS Penn Overlap between Groups and Framesets – 95% WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 Frameset1 Frameset2 develop

CIS Penn Sense Hierarchy  Framesets – coarse grained distinctions  Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap  WordNet – fine grained distinctions

CIS Penn leave.01 - move away from  VerbNet  Levin class: escape ; WordNet Senses: WN 1, 5, 8  Thematic Roles: Location[+concrete] Theme[+concrete]  Frames with Semantics Basic Intransitive "The convict escaped" motion(during(E),Theme) direction(during(E),Prep,Theme,?Location) Intransitive (+ path PP) "The convict escaped from the prison" Locative Preposition Drop "The convict escaped the prison"

CIS Penn leave.02 - give  VerbNet  Levin class: future_having-13.3 ; WordNet Senses: WN 2,10,13  Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[]  Frames with Semantics Dative "I promised somebody my time" Agent V Recipient Theme has_possession(start(E),Agent,Theme) future_possession(end(E),Recipient,Theme) cause(Agent,E) Transitive (+ Recipient PP) "We offered our paycheck to her" Agent V Theme Prep(to) Recipient ) Transitive (Theme Object) "I promised my house (to somebody)" Agent V Theme

CIS Penn Propbank to VN mapping from Text meaning workshop  Cluster verbs based on frames of arg labels  K-nearest neighbors  EM  Compare derived clusters to VerbNet classes  sim(X, Y) =  Only a rough measure  Not all verbs in VerbNet are attested in PropBank  Not all verbs in PropBank are treated in VerbNet

PropBank Frame for Clustering For [ Arg4 Mr. Sherwin], [ Arg0 a conviction] could [ Rel carry] [ Arg1 penalties of five years in prison and a $250,000 fine on each count] (wsj_1331) reduces to: arg4 arg0 rel arg1 Frameset tags, ~7K annotations, 200 schemas, 921 verbs

1, tran, 2 – ditran, 3 unaccusative

Adding to VerbNet Classes  36.3 'combative meetings'  fight, consult,...  Clustering analysis adds hedge  Hedge one's bets against...  But some investors might prefer a simpler strategy than hedging their individual holdings (wsj\_1962)  Thus, buying puts after a big market slide can be an expensive way to hedge against risk (wsj\_2415)

CIS Penn Lexical Semantics at Penn  Annotation of Penn Treebank with semantic role labels (propositions) and sense tags  Links to VerbNet and WordNet  Provides additional semantic information that clearly distinguishes verb senses  Class based to facilitate extension to previously unseen usages

CIS Penn PropBank I Also, [ Arg0 substantially lower Dutch corporate tax rates] helped [ Arg1 [ Arg0 the company] keep [ Arg1 its tax outlay] [ Arg3-PRD flat] [ ArgM-ADV relative to earnings growth]]. relative to earnings… flatits tax outlay the company keep the company keep its tax outlay flat tax rateshelp ArgM-ADVArg3- PRD Arg1Arg0REL Event variables; ID# h23 k16 nominal reference; sense tags; help2,5 tax rate1 keep1 company1 discourse connectives { } I