Download presentation
Presentation is loading. Please wait.
Published byJesse Fletcher Modified over 9 years ago
1
CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004
2
CIS630 2 Penn Statistical Machine Translation results CHINESE TEXT The japanese court before china photo trade huge & lawsuit. A large amount of the proceedings before the court dismissed workers. japan’s court, former chinese servant industrial huge disasters lawsuit. Japanese Court Rejects Former Chinese Slave Workers’ Lawsuit for Huge Compensation.
3
CIS630 3 Penn Outline MT example Sense tagging Issues highlighted by Senseval1 Senseval2 Groupings, Impact on ITA Automatic WSD, impact on scores
4
CIS630 4 Penn WordNet - Princeton On-line lexical reference (dictionary) Words organized into synonym sets concepts Hypernyms (ISA), antonyms, meronyms (PART) Useful for checking selectional restrictions (doesn’t tell you what they should be) Typical top nodes - 5 out of 25 (act, action, activity) (animal, fauna) (artifact) (attribute, property) (body, corpus)
5
CIS630 5 Penn WordNet – president, 6 senses 1.president -- (an executive officer of a firm or corporation) -->CORPORATE EXECUTIVE, BUSINESS EXECUTIVE… LEADER 2. President of the United States, President, Chief Executive -- (the person who holds the office of head of state of the United States government; "the President likes to jog every morning") -->HEAD OF STATE, CHIEF OF STATE 3. president -- (the chief executive of a republic) -->HEAD OF STATE, CHIEF OF STATE 4. president, chairman, chairwoman, chair, chairperson -- (the officer who presides at the meetings of an organization; "address your remarks to the chairperson") --> PRESIDING OFFICER LEADER 5. president -- (the head administrative officer of a college or university) --> ACADEMIC ADMINISTRATOR …. LEADER 6. President of the United States, President, Chief Executive -- (the office of the United States head of state; "a President is elected every four years") --> PRESIDENCY, PRESIDENTSHIP POSITION
6
CIS630 6 Penn Limitations to WordNet Poor inter-annotator agreement (73%) Just sense tags - no representations Very little mapping to syntax No predicate argument structure no selectional restrictions No generalizations about sense distinctions No hierarchical entries
7
CIS630 7 Penn SIGLEX98/SENSEVAL Workshop on Word Sense Disambiguation 54 attendees, 24 systems, 3 languages 34 Words ( Nouns, Verbs, Adjectives ) Both supervised and unsupervised systems Training data, Test data Hector senses - very corpus based (mapping to WordNet) lexical samples - instances, not running text Replicability over 90%, ITA 85% ACL-SIGLEX98,SIGLEX99, CHUM00
8
CIS630 8 Penn Hector - bother, 10 senses 1. intransitive verb, - (make an effort), after negation, usually with to infinitive; (of a person) to take the trouble or effort needed (to do something). Ex. “About 70 percent of the shareholders did not bother to vote at all.” 1.1 (can't be bothered), idiomatic, be unwilling to make the effort needed (to do something), Ex. ``The calculations needed are so tedious that theorists cannot be bothered to do them.'' 2. vi; after neg; with `about" or `with"; rarely cont – (of a person) to concern oneself (about something or someone) “He did not bother about the noise of the typewriter because Danny could not hear it above the sound of the tractor.” 2.1 v-passive; with `about" or `with“ - (of a person) to be concerned about or interested in (something) “The only thing I'm bothered about is the well-being of the club.”
9
CIS630 9 Penn Mismatches between lexicons: Hector - WordNet, shake
10
CIS630 10 Penn VERBNET
11
CIS630 11 Penn VerbNet/WordNet
12
CIS630 12 Penn Mapping WN-Hector via VerbNet SIGLEX99, LREC00
13
CIS630 13 Penn SENSEVAL2 –ACL’01 Adam Kilgarriff, Phil Edmond and Martha Palmer All-words taskLexical sample task CzechBasque DutchChineseEnglish EstonianItalian Japanese Korean Spanish Swedish
14
CIS630 14 Penn English Lexical Sample - Verbs Preparation for Senseval 2 manual tagging of 29 highly polysemous verbs (call, draw, drift, carry, find, keep, turn,...) WordNet (pre-release version 1.7) To handle unclear sense distinctions detect and eliminate redundant senses detect and cluster closely related senses NOT ALLOWED
15
CIS630 15 Penn WordNet – call, 28 senses 1.name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; "Take two aspirin and call me in the morning") ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") -> LABEL
16
CIS630 16 Penn WordNet – call, 28 senses 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER 5. shout, shout out, cry, call, yell, scream, holler, hollo, squall -- (utter a sudden loud cry; "she cried with pain when the doctor inserted the needle"; "I yelled to her from the window but she couldn't hear me") -> UTTER 6. visit, call in, call -- (pay a brief visit; "The mayor likes to call on some of the prominent citizens") -> MEET
17
CIS630 17 Penn Groupings Methodology Double blind groupings, adjudication Syntactic Criteria (VerbNet was useful) Distinct subcategorization frames call him a bastard call him a taxi Recognizable alternations – regular sense extensions: play an instrument play a song play a melody on an instrument
18
CIS630 18 Penn Groupings Methodology (cont.) Semantic Criteria Differences in semantic classes of arguments Abstract/concrete, human/animal, animate/inanimate, different instrument types,… Differences in entailments Change of prior entity or creation of a new entity? Differences in types of events Abstract/concrete/mental/emotional/…. Specialized subject domains
19
CIS630 19 Penn WordNet: - call, 28 senses WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24
20
CIS630 20 Penn WordNet: - call, 28 senses, groups WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24, Phone/radio Label Loud cry Bird or animal cry Request Call a loan/bond Visit Challenge Bid
21
CIS630 21 Penn WordNet – call, 28 senses, Group1 1.name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") --> LABEL 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") --> LABEL 19. call -- (consider or regard as being; "I would not call her beautiful")--> SEE 22. address, call -- (greet, as with a prescribed form, title, or name; "He always addresses me with `Sir'"; "Call me Mister"; "She calls him by first name") --> ADDRESS
22
CIS630 22 Penn Sense Groups: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20
23
CIS630 23 Penn Groups 1 and 2 of Develop GroupSense No. GlossHypernym 1 – Abstract WN1 WN2 Products, or mental creations Mental creations – “new theory” Gradually unfold – “the plot …” create 2 – New (property) WN3 WN4 Personal attribute – “a passion for …”Physical characteristic – “a beard” change
24
CIS630 24 Penn Group 3 of Develop GroupSense No. GlossHypernym 3 – New (self) WN5 WN9 WN10 WN14 WN20 Originate- “new religious movement” Gradually unfold – “the plot …” Grow – “a flower developed …” Mature – “The child developed …” Happen – “report the news as it …” become occur grow change occur
25
CIS630 25 Penn Group 4 of Develop GroupSense No. GlossHypernym 4 – Improve item WN6 WN7 WN8 WN11 WN12 WN13 WN19 Resources – “natural resources” Ideas – “ideas in your thesis” Train animate beings – “violinists” Civilize – “developing countries” Make, grow – “develop the grain” Business – “develop the market” Music – “develop the melody” improve theorize teach change generate complicate
26
CIS630 26 Penn Maximum Entropy WSD Hoa Dang (in progress) Maximum entropy framework combines different features with no assumption of independence estimates conditional probability that W has sense X in context Y, (where Y is a conjunction of linguistic features feature weights are determined from training data weights produce a maximum entropy probability distribution
27
CIS630 27 Penn Features used Topical contextual linguistic feature for W: presence of automatically determined keywords in S Local contextual linguistic features for W: presence of subject, complements words in subject, complement positions, particles, preps noun synonyms and hypernyms for subjects, complements named entity tag (PERSON, LOCATION,..) for proper Ns words within +/- 2 word window
28
CIS630 28 Penn Maximum Entropy WSD Hoa Dang, Senseval2 Verbs (best) Maximum entropy framework, p(sense|context) Contextual Linguistic Features Topical feature for W: +2.5%, keywords (determined automatically) Local syntactic features for W: +1.5 to +5%, presence of subject, complements, passive? words in subject, complement positions, particles, preps, etc. Local semantic features for W: +6% Semantic class info from WordNet (synsets, etc.) Named Entity tag (PERSON, LOCATION,..) for proper Ns words within +/- 2 word window
29
CIS630 29 Penn Results - first 5 Senseval2 verbs VerbBeginCallCarryDevelopDrawDress WN/corpus 10/9 28/14 39/2221/1635/2115/8 Grp/corp 10/911/716/119/615/97/4 Entropy 1.763.683.973.174.602.89 ITA-fine.812.693.607.678.767.865 ITA-coarse.814.892.753.852.8251.00 MX-fine.832.470.379.493.366.610 MX-coarse.832.636.485.681.512.898
30
CIS630 30 Penn Results – averaged over 28 verbs Total WN/corpus 16.28/10.83 Grp/corp 8.07/5.90 Entropy 2.81 ITA-fine 71% ITA-coarse 82% MX-fine 59% MX-coarse 69%
31
CIS630 31 Penn Grouping improved sense identification for MxWSD 75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses Most commonly confused senses suggest grouping: (1) name, call--assign a specified proper name to; ``They called their son David'' (2) call--ascribe a quality to or give a name that reflects a quality; ``He called me a bastard''; (3) call--consider or regard as being; ``I would not call her beautiful'' (4) address, call--greet, as with a prescribed form, title, or name; ``Call me Mister''; ``She calls him by his first name''
32
CIS630 32 Penn Criteria to split Framesets Semantic classes of arguments, such as animacy vs. inanimacy Serve 01. Act, work Group 1: function (His freedom served him well) Group 2: work (He served in Congress)
33
CIS630 33 Penn Criteria to split Framesets Semantic type of event (abstract vs. concrete) See 01. View Group 1: Perceive by sight (Can you see the bird?) Group 5: determine, check (See whether it works)
34
CIS630 34 Penn Overlap with PropBank Framesets WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid
35
CIS630 35 Penn Overlap between Senseval2 Groups and Framesets – 95% WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 Frameset1 Frameset2 develop
36
CIS630 36 Penn Framesets →Groups→ WordNet WN1 WN2 WN9 WN8 WN3 WN4 WN12 WN5 WN16 WN18 WN14 WN7 WN15 WN10 WN6 WN13 Frameset1 Frameset2 drop WN11 Frameset3
37
CIS630 37 Penn Groups 1 and 2 of Develop GroupSense No. GlossHypernym 1 – Abstract WN1 WN2 Products, or mental creations Mental creations – “new theory” Gradually unfold – “the plot …” create 2 – New (property) WN3 WN4 Personal attribute – “a passion for …”Physical characteristic – “a beard” change
38
CIS630 38 Penn Group 3 of Develop GroupSense No. GlossHypernym 3 – New (self) WN5 WN9 WN10 WN14 WN20 Originate- “new religious movement” Gradually unfold – “the plot …” Grow – “a flower developed …” Mature – “The child developed …” Happen – “report the news as it …” become occur grow change occur
39
CIS630 39 Penn Translations of Develop groups GroupSense No.PortugueseGerman G4 G1 G2 G4 G3 WN13 markets WN1 products WN2 ways WN2 theory WN3 understanding WN2 character WN10 bacteria WN5 movements desenvolver desenvolver-se entwickeln bilden ausbilden bilden sich bilden
40
CIS630 40 Penn Translations of Develop groups GroupSense No.ChineseKorean G4 G1 G2 G4 G3 WN13 markets WN1 products WN2 ways WN2 theory WN3 understanding WN2 character WN10 bacteria WN5 movements kai1-fa1 fa1-zhan3 pei2-yang3-chu1 pei2-yang3 fa1-yu4 xing2-cheng2 hyengsengha-ta kaypalha-ta palcensikhi-ta yangsengha-ta paltalha-ta hyengsengtoy-ta
41
CIS630 41 Penn An Example of Mapping: verb ‘serve’ Assignment: Do you agree? Frameset id = serve.01 Sense Groups serve 01: Act, work Roles: Arg0:worker Arg1:job, project Arg2:employer GROUP 1: WN1 (function) WN3(contribute to) WN12 (answer) GROUP 2: WN2 (do duty) WN13 (do military service) GROUP 3: WN4 (be used by) WN8 (serve well) WN14 (service) GROUP 5: WN7 (devote one’s efforts) WN10 (attend to)
42
CIS630 42 Penn Frameset Tagging Results: overall accuracy 90%* (baseline 73.5%) VerbFramesetsInstancesAccuracy call115220.835 carry41950.933 develop22400.938 draw3940.926 leave31470.762 pull6880.784 serve21500.967 use28200.988 work73980.955 * Gold Standard parses
43
CIS630 43 Penn Sense Hierarchy PropBank Framesets – ITA 94% coarse grained distinctions 20 Senseval2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy Sense Groups (Senseval-2) - ITA 82% (now 89%) Intermediate level (includes Levin classes) – 69% WordNet – ITA 71% fine grained distinctions, 60.2%
44
CIS630 44 Penn Summary of WSD Choice of features is more important than choice of machine learning algorithm Importance of syntactic structure (English WSD but not Chinese) Importance of dependencies Importance of an hierarchical approach to sense distinctions, and quick adaptation to new usages.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.