English Proposition Bank: Status Report Olga Babko-Malaya, Paul Kingsbury, Scott Cotton, Martha Palmer, Mitch Marcus March 25, 2003
Outline Overview Status Report Mapping of Propbank Framesets to other sense distinctions
Example He sent merchants around the country a form asking them to check one of three answers. Arg0: He REL: sent Arg2 : merchants around the country Arg1: a form asking them to check one of three answers.
Predicate-argument structure send Agent: He Goal: merchants Theme: form NP1 NP2 NP2 He sent merchants around the country a form asking them to check one of three answers.
Used At MITRE, Xerox Parc, Sheffield University, BBN, Syracuse University, IBM, NYU, SRA, CMU, MIT, University of Texas at Dallas, University of Toronto, Columbia University, SPAWAR, and the JHU summer workshop. Also to JK Davis, John Josef Costandi, and Steve Maiorano. Improvements in IE reported in ACL’03 Submission
Annotation procedure Extraction of all sentences with given verb First pass: Automatic tagging (Joseph Rosenzweig) http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon Second pass: Double blind hand annotation Third pass: adjudication Tagging tool highlights inconsistencies Given these guidelines, a number of annotators, mostly undergraduate students majoring in linguistics, extend the templates in the frames to examples from the corpus. The rate of annotation is approximately 50 sentences per annotator-hour.
Projected delivery dates Financial subcorpus alpha release: December, 2001--DONE! beta release: July, 2002--DONE! adjudicated release: summer 2003 Propbank corpus beta release: Summer 2003 adjudicated release: December 2003
English PropBank - Current Status 3183 frame files, corresponding to 3625 distinct predicates (including phrasal variants) - finished! At least single annotated: 2915 verbs, 94.5K instances (80% of the TreeBank) At least double annotated: 2250 verbs, 60K instances (67% of the Treebank) Adjudicated: 1032 verbs, 25K instances (20% of the Treebank) Coordinating with NYU on nominalizations – using Penn tagger and Frames files
Word Sense in Propbank Original plan to ignore Word sense not feasible for 700+ verbs Mary left the room Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0:entity leaving Arg1:place left Frameset leave.02 "give": Arg0:giver Arg1:thing given Arg2:beneficiary How do these relate to traditional word senses as in WordNet?
Fine-grained WordNet Senses Senseval 2 – WSD Bakeoff, usingWordNet 1.7 Verb ‘Develop’ WN1: CREATE, MAKE SOMETHING NEW They developed a new technique WN2: CREATE BY MENTAL ACT They developed a new theory of evolution develop a better way to introduce crystallography techniques
WN Senses: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20
Sense Groups: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20
Propbank Framesets for verb ‘develop’ Frameset 1 (sense: create/improve) Arg0: agent Arg1: thing developed Example: They developed a new technique Frameset 2 (sense: come about) Arg1: non-intentional theme Example: The plot develops slowly This verb has 2 Rolesets: ‘come about’ and ‘create’, which are distinguished by whether or not the development process had to be instigated by an outside causal agent, marked as Arg0 in PropBank. The outside agent usages are more likely to be transitive, whereas the internally controlled ones are more likely to be intransitive, but alternations do occur.
Mapping between Groups and Framesets WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20
Sense Hierarchy Framesets – coarse grained distinctions Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap WordNet – fine grained distinctions We have been investigating whether or not the sense groups developed for Senseval-2 can provide an intermediate level of hierarchy in between the PropBank Rolesets and the WN 1.7 senses. Our preliminary results show that 95% of the verb instances map directly from sense groups to Rolesets, with each Roleset typically corresponding to two or more sense groups.
Sense-Tagging of Propbank Sense tagging is primarily confined to the financial subcorpus, consists of about 90% of the polysemous instances in that corpus, and spans 415 verbs. single tagged 12k polysemous instances with roleset identifiers. double tagged 3k polysemous instances. 94% agreement between annotators
Training Automatic Taggers Stochastic tagger (Dan Gildea) Results: Gold Standard parses 73.5 P, 71.7 R Automatic parses 59.0 P, 55.4 R New results Using argument labels as features for WSD EM clustering for assigning argument labels