Download presentation
Presentation is loading. Please wait.
Published byEdwin Stanley Modified over 8 years ago
1
CIS630, 9/13/04 1 Penn Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004
2
CIS630, 9/13/04 2 Penn Meaning? Complete representation of real world knowledge - Natural Language Understanding? NLU Only build useful representations for small vocabularies Major impediment to accurate Machine Translation, Information Retrieval and Question Answering
3
CIS630, 9/13/04 3 Penn Outline Introduction Background: WordNet, Levin classes, VerbNet Proposition Bank Captures shallow semantics Associated lexical frame files Supports training of an automatic tagger Mapping PropBank to VerbNet Mapping PropBank to WordNet Future directions
4
CIS630, 9/13/04 4 Penn Ask Jeeves – A Q/A, IR ex. What do you call a successful movie? Tips on Being a Successful Movie Vampire... I shall call the police. Successful Casting Call & Shoot for ``Clash of Empires''... thank everyone for their participation in the making of yesterday's movie. Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague... VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Blockbuster
5
CIS630, 9/13/04 5 Penn Ask Jeeves – filtering w/ POS tag What do you call a successful movie? Tips on Being a Successful Movie Vampire... I shall call the police. Successful Casting Call & Shoot for ``Clash of Empires''... thank everyone for their participation in the making of yesterday's movie. Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague... VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer.
6
CIS630, 9/13/04 6 Penn Filtering out “call the police” call(you,movie,what) ≠ call(you,police) Different senses, - different syntax, - different participants you movie what you police
7
CIS630, 9/13/04 7 Penn Machine Translation Lexical Choice- Word Sense Disambiguation Iraq lost the battle. Ilakuka centwey ciessta. [Iraq ] [battle] [lost]. John lost his computer. John-i computer-lul ilepelyessta. [John] [computer] [misplaced].
8
CIS630, 9/13/04 8 Penn Cornerstone: English lexical resource That provides sets of possible syntactic frames for verbs. And provides clear, replicable sense distinctions. AskJeeves: Who do you call for a good electronic lexical database for English?
9
CIS630, 9/13/04 9 Penn WordNet – Princeton (Miller 1985, Fellbaum 1998) On-line lexical reference (dictionary) Nouns, verbs, adjectives, and adverbs grouped into synonym sets Other relations include hypernyms (ISA), antonyms, meronyms Typical top nodes - 5 out of 25 (act, action, activity) (animal, fauna) (artifact) (attribute, property) (body, corpus)
10
CIS630, 9/13/04 10 Penn WordNet – call, 28 senses 1.name, call -- (assign a specified, proper name to; "They named their son David"; …) -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; …) ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; …) -> LABEL 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER
11
CIS630, 9/13/04 11 Penn WordNet – Princeton (Miller 1985, Fellbaum 1998) Limitations as a computational lexicon Contains little syntactic information Comlex has syntax but no sense distinctions No explicit lists of participants Sense distinctions very fine-grained, Definitions often vague Causes problems with creating training data for supervised Machine Learning – SENSEVAL2 Verbs > 16 senses (including call) Inter-annotator Agreement ITA 73%, Automatic Word Sense Disambiguation, WSD 60.2% Dang & Palmer, SIGLEX02
12
CIS630, 9/13/04 12 Penn WordNet: - call, 28 senses WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24
13
CIS630, 9/13/04 13 Penn WordNet: - call, 28 senses, Senseval2 groups (engineering!) WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid
14
CIS630, 9/13/04 14 Penn Grouping improved scores: ITA 82%, MaxEnt WSD 69% Call: 31% of errors due to confusion between senses within same group 1: name, call -- (assign a specified, proper name to; They named their son David) call -- (ascribe a quality to or give a name of a common noun that reflects a quality; He called me a bastard) call -- (consider or regard as being;I would not call her beautiful) 75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses Palmer, Dang, Fellbaum,, submitted, NLE
15
CIS630, 9/13/04 15 Penn Groups Based on VerbNet, an English Lexical resource that is under development, Which is in turn based on Levin’s English verb classes….
16
CIS630, 9/13/04 16 Penn Levin classes (Levin, 1993) 3100 verbs, 47 top level classes, 193 second and third level Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. John cut the bread. / *The bread cut. / Bread cuts easily. John hit the wall. / *The wall hit. / *Walls hit easily.
17
CIS630, 9/13/04 17 Penn Levin classes (Levin, 1993) Verb class hierarchy: 3100 verbs, 47 top level classes, 193 Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. change-of-state John cut the bread. / *The bread cut. / Bread cuts easily. change-of-state, recognizable action, sharp instrument John hit the wall. / *The wall hit. / *Walls hit easily. contact, exertion of force
18
CIS630, 9/13/04 18 Penn Limitations to Levin Classes Coverage of only half of the verbs (types) in the Penn Treebank (1M words,WSJ) Usually only one or two basic senses are covered for each verb Confusing sets of alternations Different classes have almost identical “syntactic signatures” or worse, contradictory signatures Dang, Kipper & Palmer, ACL98
19
CIS630, 9/13/04 19 Penn Multiple class listings Homonymy or polysemy? draw a picture, draw water from the well Conflicting alternations? Carry verbs disallow the Conative, (*she carried at the ball), but include {push,pull,shove,kick,yank,tug} also in Push/pull class, does take the Conative (she kicked at the ball)
20
CIS630, 9/13/04 20 Penn Intersective Levin Classes “at” ¬CH-LOC “across the room” CH-LOC “apart” CH-STATE Dang, Kipper & Palmer, ACL98
21
CIS630, 9/13/04 21 Penn Intersective Levin Classes More syntactically and semantically coherent sets of syntactic patterns explicit semantic components relations between senses VERBNET www.cis.upenn.edu/verbnet
22
CIS630, 9/13/04 22 Penn VerbNet – Karin Kipper Class entries: Capture generalizations about verb behavior Organized hierarchically Members have common semantic elements, semantic roles and syntactic frames Verb entries: Refer to a set of classes (different senses) each class member linked to WN synset(s) (not all WN senses are covered) Dang, Kipper & Palmer, IJCAI00, Coling00
23
CIS630, 9/13/04 23 Penn Semantic role labels: Grace broke the LCD projector. break (agent(Grace), patient(LCD-projector)) cause(agent(Grace), change-of-state(LCD-projector)) (broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P),…
24
CIS630, 9/13/04 24 Penn VerbNet entry for leave Levin class: future_having-13.3 WordNet Senses: leave, (WN 2,10,13), promise, offer, …. Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[] Frames with Semantic Roles "I promised somebody my time" Agent V Recipient Theme “I left my fortune to Esmerelda" Agent V Theme Prep(to) Recipient ) "I offered my services" Agent V Theme
25
CIS630, 9/13/04 25 Penn Handmade resources vs. Real data VerbNet is based on linguistic theory – how useful is it? How well does it correspond to syntactic variations found in naturally occurring text? PropBank
26
CIS630, 9/13/04 26 Penn Proposition Bank: From Sentences to Propositions (Predicates!) Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)
27
CIS630, 9/13/04 27 Penn Capturing semantic roles* Jerry broke [ PATIENT the laser pointer.] [PATIENT The windows] were broken by the hurricane. [PATIENT The vase] broke into pieces when it toppled over. SUBJ
28
CIS630, 9/13/04 28 Penn Capturing semantic roles* Jerry broke [ ARG1 the laser pointer.] [ARG1 The windows] were broken by the hurricane. [ARG1 The vase] broke into pieces when it toppled over. *See also Framenet, http://www.icsi.berkeley.edu/~framenet/http://www.icsi.berkeley.edu/~framenet/
29
CIS630, 9/13/04 29 Penn A TreeBanked phrase a GM-Jaguar pact NP S VP would VP give the US car maker NP an eventual 30% stake NP the British company NP PP-LOC in A GM-Jaguar pact would give the U.S. car maker an eventual 30% stake in the British company.
30
CIS630, 9/13/04 30 Penn The same phrase, PropBanked a GM-Jaguar pact would give the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 give(GM-J pact, US car maker, 30% stake) A GM-Jaguar pact would give the U.S. car maker an eventual 30% stake in the British company.
31
CIS630, 9/13/04 31 Penn Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation
32
CIS630, 9/13/04 32 Penn Annotation procedure PTB II – Extract all sentences of a verb Create Frame File for that verb Paul Kingsbury (3100+ lemmas, 4700 framesets,120K predicates) First pass: Automatic tagging Joseph Rosenzweig Second pass: Double blind hand correction Inter-annotator agreement 84% Third pass: Solomonization (adjudication) Olga Babko-Malaya
33
CIS630, 9/13/04 33 Penn Annotator accuracy – ITA 84%
34
CIS630, 9/13/04 34 Penn Trends in Argument Numbering Arg0 = proto-typical agent ( Dowty ) Arg1 = proto-typical patient Arg2 = indirect object / benefactive / instrument / attribute / end state Arg3 = start point / benefactive / instrument / attribute Arg4 = end point
35
CIS630, 9/13/04 35 Penn Additional tags (arguments or adjuncts?) Variety of ArgM’s (Arg#>4): TMP - when? LOC - where at? DIR - where to? MNR - how? PRP -why? REC - himself, themselves, each other PRD -this argument refers to or modifies another ADV –others
36
CIS630, 9/13/04 36 Penn Inflection, etc. Verbs also marked for tense/aspect Passive/Active Perfect/Progressive Third singular (is has does was) Present/Past/Future Infinitives/Participles/Gerunds/Finites Modals and negations marked as ArgMs
37
CIS630, 9/13/04 37 Penn PropBank/FrameNet Buy Arg0: buyer Arg1: goods Arg2: seller Arg3: rate Arg4: payment Sell Arg0: seller Arg1: goods Arg2: buyer Arg3: rate Arg4: payment Broader, more neutral, more syntactic – maps readily to VN,TR,FN Rambow, et al, PMLB03
38
CIS630, 9/13/04 38 Penn Outline Introduction Background: WordNet, Levin classes, VerbNet Proposition Bank Captures shallow semantics Associated lexical frame files Supports training of an automatic tagger Mapping PropBank to VerbNet Mapping PropBank to WordNet
39
CIS630, 9/13/04 39 Penn Approach Pre-processing: A heuristic which filters out unwanted constituents with significant confidence Argument Identification A binary SVM classifier which identifies arguments Argument Classification A multi-class SVM classifier which tags arguments as ARG0-5, ARGA, and ARGM
40
CIS630, 9/13/04 40 Penn Automatic Semantic Role Labeling Stochastic Model Basic Features: Predicate, (verb) Phrase Type, (NP or S-BAR) Parse Tree Path Position (Before/after predicate) Voice (active/passive) Head Word of constituent Subcategorization Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02
41
CIS630, 9/13/04 41 Penn Discussion Part I – Szuting Yi Comparisons between Pradhan and Penn (SVM) Both systems are SVM-based Kernel: Pradhan uses a degree 2 polynomial kernel; Penn uses a degree 3 RGB kernel Multi-classification: Pradhan uses a one-versus-others approach; Penn uses a pairwise approach Features: Pradhan includes rich features including NE, head word POS, partial path, verb classes, verb sense, head word of PP, first or last word/pos in the constituent, constituent tree distance, constituent relative features, temporal cue words, dynamic class context (Pradhan et al, 2004)
42
CIS630, 9/13/04 42 Penn Discussion Part II Different features for different subtasks Basic features analysis Feature Path Bad feature for classification - Doesn ’ t discriminate constituents at the same level - Doesn ’ t have full view of the subcat frame Best feature for identification - captures the syntactic configuration between a constituent and the predicate Sub-cat Bad feature for identification -do not vary with current constituent voice HW Good for classification if represented by combining with the predicate PT Xue & Palmer, EMNLP04
43
CIS630, 9/13/04 43 Penn Discussion Part III (New Features – Bert Xue) Syntactic frame use NPs as “ pivots ” varying with position within the frame lexicalization with predicate Predicate + head word phrase type head word of PP parent Position + voice
44
CIS630, 9/13/04 44 Penn Results DataSystem (feature set)PRF1A 2002G&P71646777.0 2002G&H766872- 2002Pradhan (basic)83798187.9 2002SVM-RGB Penn (basic)---93.1 2002Pradhan (rich features)89858791.0 2004SVM-RGB Penn (basic)8988 93.5 2004Pradhan (rich features)9089 93.0 2004MaxEnt Penn (designated features) --90.695.4
45
CIS630, 9/13/04 45 Penn Word Senses in PropBank Orders to ignore word sense not feasible for 700+ verbs Mary left the room Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in VerbNet and WordNet?
46
CIS630, 9/13/04 46 Penn Frames: Multiple Framesets Out of the 787 most frequent verbs: 1 Frameset – 521 2 Frameset – 169 3+ Frameset - 97 (includes light verbs) 90% ITA Framesets are not necessarily consistent between different senses of the same verb Framesets are consistent between different verbs that share similar argument structures, (like FrameNet)
47
CIS630, 9/13/04 47 Penn Ergative/Unaccusative Verbs Roles (no ARG0 for unaccusative verbs) Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. The Nasdaq composite index added 1.01 to 456.6 on paltry volume.
48
CIS630, 9/13/04 48 Penn Mapping from PropBank to VerbNet Frameset id = leave.02 Sense = give VerbNet class = future-having 13.3 Arg0GiverAgent Arg1Thing givenTheme Arg2BenefactiveRecipient
49
CIS630, 9/13/04 49 Penn Mapping from PB to VerbNet
50
CIS630, 9/13/04 50 Penn Mapping from PropBank to VerbNet Overlap with PropBank framesets 50,000 PropBank instances 85% VN classes Results MATCH - 78.63%. (80.90% relaxed) (VerbNet isn’t just linguistic theory!) Benefits Thematic role labels and semantic predicates Can extend PropBank coverage with VerbNet classes WordNet sense tags Kingsbury & Kipper, NAACL03, Text Meaning Workshop http://www.cs.rochester.edu/~gildea/VerbNet/
51
CIS630, 9/13/04 51 Penn Word Senses in PropBank Orders to ignore word sense not feasible for 700+ verbs Mary left the room Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in WordNet?
52
CIS630, 9/13/04 52 Penn WordNet: - call, 28 senses, groups WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid
53
CIS630, 9/13/04 53 Penn Overlap with PropBank Framesets WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid
54
CIS630, 9/13/04 54 Penn Overlap between Senseval2 Groups and Framesets – 95% WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 Frameset1 Frameset2 develop Palmer, Babko-Malaya,Dang SNLU 2004
55
CIS630, 9/13/04 55 Penn Sense Hierarchy PropBank Framesets – ITA 94% coarse grained distinctions 20 Senseval2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy Sense Groups (Senseval-2) - ITA 82% -> 89% Intermediate level (includes Levin classes) – 69% WordNet – ITA 71% fine grained distinctions, 60.2%
56
CIS630, 9/13/04 56 Penn Maximum Entropy WSD Hoa Dang, best performer on Verbs Maximum entropy framework, p(sense|context) Contextual Linguistic Features Topical feature for W: +2.5%, keywords (determined automatically) Local syntactic features for W: +1.5 to +5%, presence of subject, complements, passive? words in subject, complement positions, particles, preps, etc. Local semantic features for W: +6% Semantic class info from WordNet (synsets, etc.) Named Entity tag (PERSON, LOCATION,..) for proper Ns words within +/- 2 word window
57
CIS630, 9/13/04 57 Penn A Chinese Treebank Sentence 国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law “The Congress passed the banking law recently.” (IP (NP-SBJ (NN 国会 /Congress)) (VP (ADVP (ADV 最近 /recently)) (VP (VV 通过 /pass) (AS 了 /ASP) (NP-OBJ (NN 银行法 /banking law)))))
58
CIS630, 9/13/04 58 Penn The Same Sentence, PropBanked 通过 (f2) (pass) arg0 argM arg1 国会 最近 银行法 (law) (congress) (IP (NP-SBJ arg0 (NN 国会 )) (VP argM (ADVP (ADV 最近 )) (VP f2 (VV 通过 ) (AS 了 ) arg1 (NP-OBJ (NN 银行法 )))))
59
CIS630, 9/13/04 59 Penn A Korean Treebank Sentence (S (NP-SBJ 그 /NPN+ 은 /PAU) (VP (S-COMP (NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP (NP-ADV 3/NNU 월 /NNX+ 말 /NNX+ 까지 /PAU) (VP (NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN)./SFN) 그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다. He added that Renault has a deadline until the end of March for a merger proposal.
60
CIS630, 9/13/04 60 Penn The same sentence, PropBanked 덧붙이었다 그는갖고 있다 르노가인수제의 시한을 덧붙이다 ( 그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다 ) (add) (he) (Renaut has a deadline until the end of March for a merger proposal) 갖다 ( 르노가, 3 월말까지, 인수제의 시한을 ) (has) (Renaut) (until the end of March) (a deadline for a merger proposal) Arg0Arg2 Arg0Arg1 (S Arg0 (NP-SBJ 그 /NPN+ 은 /PAU) (VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP ( ArgM NP-ADV 3/NNU 월 /NNX+ 말 /NNX+ 까지 /PAU) (VP ( Arg1 NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN)./SFN) 3 월말까지 ArgM
61
CIS630, 9/13/04 61 Penn PropBank I Also, [ Arg0 substantially lower Dutch corporate tax rates] helped [ Arg1 [ Arg0 the company] keep [ Arg1 its tax outlay] [ Arg3- PRD flat] [ ArgM-ADV relative to earnings growth]]. relative to earnings… flatits tax outlaythe company keep the company keep its tax outlay flat tax rateshelp ArgM-ADVArg3- PRD Arg1Arg0REL Event variables; ID# h23 k16 nominal reference;sense tags; help2,5 tax rate1 keep1 company1 discourse connectives { } I
62
CIS630, 9/13/04 62 Penn PropBank II Nominalizations NYU Lexical Frames DONE Event Variables, (including temporals and locatives) More fine-grained sense tagging Tagging nominalizations w/ WordNet sense Selected verbs and nouns Nominal Coreference not names Clausal Discourse connectives – selected subset
63
CIS630, 9/13/04 63 Penn Summary “Meaning” is shallow semantic annotation that captures critical dependencies, semantic role labels and sense distinctions Supports training of accurate, supervised automatic taggers Methodology ports readily to other languages English PropBank release – spring 2004 Chinese PropBank release – fall 2004 Korean PropBank release – summer 2005
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.