Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIS630, 9/13/04 1 Penn Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004.

Similar presentations


Presentation on theme: "CIS630, 9/13/04 1 Penn Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004."— Presentation transcript:

1 CIS630, 9/13/04 1 Penn Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004

2 CIS630, 9/13/04 2 Penn Meaning?  Complete representation of real world knowledge - Natural Language Understanding? NLU  Only build useful representations for small vocabularies  Major impediment to accurate Machine Translation, Information Retrieval and Question Answering

3 CIS630, 9/13/04 3 Penn Outline  Introduction  Background: WordNet, Levin classes, VerbNet  Proposition Bank  Captures shallow semantics  Associated lexical frame files  Supports training of an automatic tagger  Mapping PropBank to VerbNet  Mapping PropBank to WordNet  Future directions

4 CIS630, 9/13/04 4 Penn Ask Jeeves – A Q/A, IR ex. What do you call a successful movie?  Tips on Being a Successful Movie Vampire... I shall call the police.  Successful Casting Call & Shoot for ``Clash of Empires''... thank everyone for their participation in the making of yesterday's movie.  Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague...  VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Blockbuster

5 CIS630, 9/13/04 5 Penn Ask Jeeves – filtering w/ POS tag What do you call a successful movie?  Tips on Being a Successful Movie Vampire... I shall call the police.  Successful Casting Call & Shoot for ``Clash of Empires''... thank everyone for their participation in the making of yesterday's movie.  Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague...  VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer.

6 CIS630, 9/13/04 6 Penn Filtering out “call the police” call(you,movie,what) ≠ call(you,police) Different senses, - different syntax, - different participants you movie what you police

7 CIS630, 9/13/04 7 Penn Machine Translation Lexical Choice- Word Sense Disambiguation Iraq lost the battle. Ilakuka centwey ciessta. [Iraq ] [battle] [lost]. John lost his computer. John-i computer-lul ilepelyessta. [John] [computer] [misplaced].

8 CIS630, 9/13/04 8 Penn Cornerstone: English lexical resource  That provides sets of possible syntactic frames for verbs.  And provides clear, replicable sense distinctions. AskJeeves: Who do you call for a good electronic lexical database for English?

9 CIS630, 9/13/04 9 Penn WordNet – Princeton (Miller 1985, Fellbaum 1998) On-line lexical reference (dictionary)  Nouns, verbs, adjectives, and adverbs grouped into synonym sets  Other relations include hypernyms (ISA), antonyms, meronyms  Typical top nodes - 5 out of 25  (act, action, activity)  (animal, fauna)  (artifact)  (attribute, property)  (body, corpus)

10 CIS630, 9/13/04 10 Penn WordNet – call, 28 senses 1.name, call -- (assign a specified, proper name to; "They named their son David"; …) -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; …) ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; …) -> LABEL 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER

11 CIS630, 9/13/04 11 Penn WordNet – Princeton (Miller 1985, Fellbaum 1998)  Limitations as a computational lexicon  Contains little syntactic information  Comlex has syntax but no sense distinctions  No explicit lists of participants  Sense distinctions very fine-grained,  Definitions often vague  Causes problems with creating training data for supervised Machine Learning – SENSEVAL2  Verbs > 16 senses (including call)  Inter-annotator Agreement ITA 73%,  Automatic Word Sense Disambiguation, WSD 60.2% Dang & Palmer, SIGLEX02

12 CIS630, 9/13/04 12 Penn WordNet: - call, 28 senses WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24

13 CIS630, 9/13/04 13 Penn WordNet: - call, 28 senses, Senseval2 groups (engineering!) WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid

14 CIS630, 9/13/04 14 Penn Grouping improved scores: ITA 82%, MaxEnt WSD 69%  Call: 31% of errors due to confusion between senses within same group 1:  name, call -- (assign a specified, proper name to; They named their son David)  call -- (ascribe a quality to or give a name of a common noun that reflects a quality; He called me a bastard)  call -- (consider or regard as being;I would not call her beautiful)  75% with training and testing on grouped senses vs.  43% with training and testing on fine-grained senses Palmer, Dang, Fellbaum,, submitted, NLE

15 CIS630, 9/13/04 15 Penn Groups  Based on VerbNet, an English Lexical resource that is under development,  Which is in turn based on Levin’s English verb classes….

16 CIS630, 9/13/04 16 Penn Levin classes (Levin, 1993)  3100 verbs, 47 top level classes, 193 second and third level  Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. John cut the bread. / *The bread cut. / Bread cuts easily. John hit the wall. / *The wall hit. / *Walls hit easily.

17 CIS630, 9/13/04 17 Penn Levin classes (Levin, 1993)  Verb class hierarchy: 3100 verbs, 47 top level classes, 193  Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. change-of-state John cut the bread. / *The bread cut. / Bread cuts easily. change-of-state, recognizable action, sharp instrument John hit the wall. / *The wall hit. / *Walls hit easily. contact, exertion of force

18 CIS630, 9/13/04 18 Penn Limitations to Levin Classes  Coverage of only half of the verbs (types) in the Penn Treebank (1M words,WSJ)  Usually only one or two basic senses are covered for each verb  Confusing sets of alternations  Different classes have almost identical “syntactic signatures”  or worse, contradictory signatures Dang, Kipper & Palmer, ACL98

19 CIS630, 9/13/04 19 Penn Multiple class listings  Homonymy or polysemy?  draw a picture, draw water from the well  Conflicting alternations?  Carry verbs disallow the Conative, (*she carried at the ball), but include {push,pull,shove,kick,yank,tug}  also in Push/pull class, does take the Conative (she kicked at the ball)

20 CIS630, 9/13/04 20 Penn Intersective Levin Classes “at” ¬CH-LOC “across the room” CH-LOC “apart” CH-STATE Dang, Kipper & Palmer, ACL98

21 CIS630, 9/13/04 21 Penn Intersective Levin Classes  More syntactically and semantically coherent  sets of syntactic patterns  explicit semantic components  relations between senses VERBNET www.cis.upenn.edu/verbnet

22 CIS630, 9/13/04 22 Penn VerbNet – Karin Kipper  Class entries:  Capture generalizations about verb behavior  Organized hierarchically  Members have common semantic elements, semantic roles and syntactic frames  Verb entries:  Refer to a set of classes (different senses)  each class member linked to WN synset(s) (not all WN senses are covered) Dang, Kipper & Palmer, IJCAI00, Coling00

23 CIS630, 9/13/04 23 Penn Semantic role labels: Grace broke the LCD projector. break (agent(Grace), patient(LCD-projector)) cause(agent(Grace), change-of-state(LCD-projector)) (broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P),…

24 CIS630, 9/13/04 24 Penn VerbNet entry for leave Levin class: future_having-13.3  WordNet Senses: leave, (WN 2,10,13), promise, offer, ….  Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[]  Frames with Semantic Roles "I promised somebody my time" Agent V Recipient Theme “I left my fortune to Esmerelda" Agent V Theme Prep(to) Recipient ) "I offered my services" Agent V Theme

25 CIS630, 9/13/04 25 Penn Handmade resources vs. Real data  VerbNet is based on linguistic theory – how useful is it?  How well does it correspond to syntactic variations found in naturally occurring text? PropBank

26 CIS630, 9/13/04 26 Penn Proposition Bank: From Sentences to Propositions (Predicates!) Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)

27 CIS630, 9/13/04 27 Penn Capturing semantic roles*  Jerry broke [ PATIENT the laser pointer.]  [PATIENT The windows] were broken by the hurricane.  [PATIENT The vase] broke into pieces when it toppled over. SUBJ

28 CIS630, 9/13/04 28 Penn Capturing semantic roles*  Jerry broke [ ARG1 the laser pointer.]  [ARG1 The windows] were broken by the hurricane.  [ARG1 The vase] broke into pieces when it toppled over. *See also Framenet, http://www.icsi.berkeley.edu/~framenet/http://www.icsi.berkeley.edu/~framenet/

29 CIS630, 9/13/04 29 Penn A TreeBanked phrase a GM-Jaguar pact NP S VP would VP give the US car maker NP an eventual 30% stake NP the British company NP PP-LOC in A GM-Jaguar pact would give the U.S. car maker an eventual 30% stake in the British company.

30 CIS630, 9/13/04 30 Penn The same phrase, PropBanked a GM-Jaguar pact would give the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 give(GM-J pact, US car maker, 30% stake) A GM-Jaguar pact would give the U.S. car maker an eventual 30% stake in the British company.

31 CIS630, 9/13/04 31 Penn Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

32 CIS630, 9/13/04 32 Penn Annotation procedure  PTB II – Extract all sentences of a verb  Create Frame File for that verb Paul Kingsbury  (3100+ lemmas, 4700 framesets,120K predicates)  First pass: Automatic tagging Joseph Rosenzweig  Second pass: Double blind hand correction  Inter-annotator agreement 84%  Third pass: Solomonization (adjudication)  Olga Babko-Malaya

33 CIS630, 9/13/04 33 Penn Annotator accuracy – ITA 84%

34 CIS630, 9/13/04 34 Penn Trends in Argument Numbering  Arg0 = proto-typical agent ( Dowty )  Arg1 = proto-typical patient  Arg2 = indirect object / benefactive / instrument / attribute / end state  Arg3 = start point / benefactive / instrument / attribute  Arg4 = end point

35 CIS630, 9/13/04 35 Penn Additional tags (arguments or adjuncts?)  Variety of ArgM’s (Arg#>4):  TMP - when?  LOC - where at?  DIR - where to?  MNR - how?  PRP -why?  REC - himself, themselves, each other  PRD -this argument refers to or modifies another  ADV –others

36 CIS630, 9/13/04 36 Penn Inflection, etc.  Verbs also marked for tense/aspect  Passive/Active  Perfect/Progressive  Third singular (is has does was)  Present/Past/Future  Infinitives/Participles/Gerunds/Finites  Modals and negations marked as ArgMs

37 CIS630, 9/13/04 37 Penn PropBank/FrameNet Buy Arg0: buyer Arg1: goods Arg2: seller Arg3: rate Arg4: payment Sell Arg0: seller Arg1: goods Arg2: buyer Arg3: rate Arg4: payment Broader, more neutral, more syntactic – maps readily to VN,TR,FN Rambow, et al, PMLB03

38 CIS630, 9/13/04 38 Penn Outline  Introduction  Background: WordNet, Levin classes, VerbNet  Proposition Bank  Captures shallow semantics  Associated lexical frame files  Supports training of an automatic tagger  Mapping PropBank to VerbNet  Mapping PropBank to WordNet

39 CIS630, 9/13/04 39 Penn Approach  Pre-processing:  A heuristic which filters out unwanted constituents with significant confidence  Argument Identification  A binary SVM classifier which identifies arguments  Argument Classification  A multi-class SVM classifier which tags arguments as ARG0-5, ARGA, and ARGM

40 CIS630, 9/13/04 40 Penn Automatic Semantic Role Labeling Stochastic Model Basic Features:  Predicate, (verb)  Phrase Type, (NP or S-BAR)  Parse Tree Path  Position (Before/after predicate)  Voice (active/passive)  Head Word of constituent  Subcategorization Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02

41 CIS630, 9/13/04 41 Penn Discussion Part I – Szuting Yi  Comparisons between Pradhan and Penn (SVM)  Both systems are SVM-based  Kernel: Pradhan uses a degree 2 polynomial kernel; Penn uses a degree 3 RGB kernel  Multi-classification: Pradhan uses a one-versus-others approach; Penn uses a pairwise approach  Features: Pradhan includes rich features including NE, head word POS, partial path, verb classes, verb sense, head word of PP, first or last word/pos in the constituent, constituent tree distance, constituent relative features, temporal cue words, dynamic class context (Pradhan et al, 2004)

42 CIS630, 9/13/04 42 Penn Discussion Part II Different features for different subtasks Basic features analysis Feature Path Bad feature for classification - Doesn ’ t discriminate constituents at the same level - Doesn ’ t have full view of the subcat frame Best feature for identification - captures the syntactic configuration between a constituent and the predicate Sub-cat Bad feature for identification -do not vary with current constituent voice HW Good for classification if represented by combining with the predicate PT Xue & Palmer, EMNLP04

43 CIS630, 9/13/04 43 Penn Discussion Part III (New Features – Bert Xue)  Syntactic frame  use NPs as “ pivots ”  varying with position within the frame  lexicalization with predicate  Predicate +  head word  phrase type  head word of PP parent  Position + voice

44 CIS630, 9/13/04 44 Penn Results DataSystem (feature set)PRF1A 2002G&P71646777.0 2002G&H766872- 2002Pradhan (basic)83798187.9 2002SVM-RGB Penn (basic)---93.1 2002Pradhan (rich features)89858791.0 2004SVM-RGB Penn (basic)8988 93.5 2004Pradhan (rich features)9089 93.0 2004MaxEnt Penn (designated features) --90.695.4

45 CIS630, 9/13/04 45 Penn Word Senses in PropBank  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room  Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in VerbNet and WordNet?

46 CIS630, 9/13/04 46 Penn Frames: Multiple Framesets  Out of the 787 most frequent verbs:  1 Frameset – 521  2 Frameset – 169  3+ Frameset - 97 (includes light verbs)  90% ITA  Framesets are not necessarily consistent between different senses of the same verb  Framesets are consistent between different verbs that share similar argument structures, (like FrameNet)

47 CIS630, 9/13/04 47 Penn Ergative/Unaccusative Verbs Roles (no ARG0 for unaccusative verbs) Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. The Nasdaq composite index added 1.01 to 456.6 on paltry volume.

48 CIS630, 9/13/04 48 Penn Mapping from PropBank to VerbNet Frameset id = leave.02 Sense = give VerbNet class = future-having 13.3 Arg0GiverAgent Arg1Thing givenTheme Arg2BenefactiveRecipient

49 CIS630, 9/13/04 49 Penn Mapping from PB to VerbNet

50 CIS630, 9/13/04 50 Penn Mapping from PropBank to VerbNet  Overlap with PropBank framesets  50,000 PropBank instances  85% VN classes  Results  MATCH - 78.63%. (80.90% relaxed)  (VerbNet isn’t just linguistic theory!)  Benefits  Thematic role labels and semantic predicates  Can extend PropBank coverage with VerbNet classes  WordNet sense tags Kingsbury & Kipper, NAACL03, Text Meaning Workshop http://www.cs.rochester.edu/~gildea/VerbNet/

51 CIS630, 9/13/04 51 Penn Word Senses in PropBank  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room  Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in WordNet?

52 CIS630, 9/13/04 52 Penn WordNet: - call, 28 senses, groups WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid

53 CIS630, 9/13/04 53 Penn Overlap with PropBank Framesets WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid

54 CIS630, 9/13/04 54 Penn Overlap between Senseval2 Groups and Framesets – 95% WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 Frameset1 Frameset2 develop Palmer, Babko-Malaya,Dang SNLU 2004

55 CIS630, 9/13/04 55 Penn Sense Hierarchy  PropBank Framesets – ITA 94% coarse grained distinctions 20 Senseval2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy  Sense Groups (Senseval-2) - ITA 82% -> 89% Intermediate level (includes Levin classes) – 69%  WordNet – ITA 71% fine grained distinctions, 60.2%

56 CIS630, 9/13/04 56 Penn Maximum Entropy WSD Hoa Dang, best performer on Verbs  Maximum entropy framework, p(sense|context)  Contextual Linguistic Features  Topical feature for W: +2.5%,  keywords (determined automatically)  Local syntactic features for W: +1.5 to +5%,  presence of subject, complements, passive?  words in subject, complement positions, particles, preps, etc.  Local semantic features for W: +6%  Semantic class info from WordNet (synsets, etc.)  Named Entity tag (PERSON, LOCATION,..) for proper Ns  words within +/- 2 word window

57 CIS630, 9/13/04 57 Penn A Chinese Treebank Sentence 国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law “The Congress passed the banking law recently.” (IP (NP-SBJ (NN 国会 /Congress)) (VP (ADVP (ADV 最近 /recently)) (VP (VV 通过 /pass) (AS 了 /ASP) (NP-OBJ (NN 银行法 /banking law)))))

58 CIS630, 9/13/04 58 Penn The Same Sentence, PropBanked 通过 (f2) (pass) arg0 argM arg1 国会 最近 银行法 (law) (congress) (IP (NP-SBJ arg0 (NN 国会 )) (VP argM (ADVP (ADV 最近 )) (VP f2 (VV 通过 ) (AS 了 ) arg1 (NP-OBJ (NN 银行法 )))))

59 CIS630, 9/13/04 59 Penn A Korean Treebank Sentence (S (NP-SBJ 그 /NPN+ 은 /PAU) (VP (S-COMP (NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP (NP-ADV 3/NNU 월 /NNX+ 말 /NNX+ 까지 /PAU) (VP (NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN)./SFN) 그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다. He added that Renault has a deadline until the end of March for a merger proposal.

60 CIS630, 9/13/04 60 Penn The same sentence, PropBanked 덧붙이었다 그는갖고 있다 르노가인수제의 시한을 덧붙이다 ( 그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다 ) (add) (he) (Renaut has a deadline until the end of March for a merger proposal) 갖다 ( 르노가, 3 월말까지, 인수제의 시한을 ) (has) (Renaut) (until the end of March) (a deadline for a merger proposal) Arg0Arg2 Arg0Arg1 (S Arg0 (NP-SBJ 그 /NPN+ 은 /PAU) (VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP ( ArgM NP-ADV 3/NNU 월 /NNX+ 말 /NNX+ 까지 /PAU) (VP ( Arg1 NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN)./SFN) 3 월말까지 ArgM

61 CIS630, 9/13/04 61 Penn PropBank I Also, [ Arg0 substantially lower Dutch corporate tax rates] helped [ Arg1 [ Arg0 the company] keep [ Arg1 its tax outlay] [ Arg3- PRD flat] [ ArgM-ADV relative to earnings growth]]. relative to earnings… flatits tax outlaythe company keep the company keep its tax outlay flat tax rateshelp ArgM-ADVArg3- PRD Arg1Arg0REL Event variables; ID# h23 k16 nominal reference;sense tags; help2,5 tax rate1 keep1 company1 discourse connectives { } I

62 CIS630, 9/13/04 62 Penn PropBank II  Nominalizations NYU  Lexical Frames DONE  Event Variables, (including temporals and locatives)  More fine-grained sense tagging  Tagging nominalizations w/ WordNet sense  Selected verbs and nouns  Nominal Coreference  not names  Clausal Discourse connectives – selected subset

63 CIS630, 9/13/04 63 Penn Summary  “Meaning” is shallow semantic annotation that captures critical dependencies, semantic role labels and sense distinctions  Supports training of accurate, supervised automatic taggers  Methodology ports readily to other languages  English PropBank release – spring 2004  Chinese PropBank release – fall 2004  Korean PropBank release – summer 2005


Download ppt "CIS630, 9/13/04 1 Penn Putting Meaning into Your Trees Martha Palmer CIS630 September 13, 2004."

Similar presentations


Ads by Google