Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coarse-grained Word Sense Disambiguation

Similar presentations


Presentation on theme: "Coarse-grained Word Sense Disambiguation"— Presentation transcript:

1 Coarse-grained Word Sense Disambiguation
Jinying Chen, Martha Palmer March 25th, 2003

2 Outline Motivation Supervised Verb Sense Grouping
Unsupervised Verb Frameset Tagging Future Work

3 Motivation Fine-grained WSD are difficult for both human and machine and well-defined sense groups can alleviate this problem (Martha, Hoa, Christiane, 2002) Potential application in Machine Translation When building up a WSD corpus, the sense hierarchy can help annotators in sense tagging speed and accuracy (hopefully ? )

4 Outline Motivation Supervised Verb Sense Grouping
What’s VSG? Using Semantic Features for VSG Building Decision Tree for VSG Experiment Results Unsupervised Verb Frameset Tagging Future Work

5 What’s VSG? verb sense group verb sense Frameset2 Frameset1
WN1 WN WN3 WN4 WN6 WN7 WN WN5 WN 9 WN10 WN11 WN12 WN WN 14 WN WN20 verb sense

6 Using Semantic Features for VSG
PropBank Each verb is defined by several framesets All verb instances belonging to the same frameset share a common set of roles Roles can be argn (n=0,1,…) and argM-f Frameset is consistent with Verb Sense Group Frameset tags and roles are semantic features for VSG

7 Building Decision Tree for VSG
Use c5.0 of DT 3 Feature Sets: SF (Simple Feature set) works best: VOICE: PAS, ACT FRAMESET: 01,02, … ARGn (n=0,1,2 …) : 0(not occur), 1(occur) CoreFrame: 01-ARG0-ARG1, 02-ARG0-ARG2,… ARGM: 0(has not ARGM), 1(has ARGM) ARGM-f(f=DIS, ADV, …): i (occur i times)

8 Experiment Results Table 2 Error rate of Decision Tree on five verbs

9 Discussion Simple feature set and simple DT algorithms works well
Potential sparse data problem Complicate DT algorithms (e.g. with boosting) tend to overfit the data Complex features are not utilized by the model Solution: use large corpus, e.g. parsed BNC corpus without frameset annotation

10 Outline Task Description Methodology
Unsupervised Verb Frameset Tagging EM Clustering for Frameset Tagging Features Preliminary Experiment Results Future Work

11 EM Clustering for Frameset Tagging
we treat a set of features extracted from the parsed sentences as observed variables and assume they are independent given a hidden variable, c: (1) f1 cluster c f2 …… fm

12 In the expectation step, we compute the probability of c conditioned on the set of observed features: (2) In the maximization step, we re-compute and by maximizing the log-likelihood of all of the observed data. Repeat the Expectation and Maximization steps for a fixed number of rounds or until the change of the probability parameters and is under a threshold.

13 To do clustering, we compute for each verb instance with the same formula as in (2) and assign this instance to the cluster that has the maximal To evaluate we count the majority of the instances in a single cluster which have the same gold-standard Frameset. Other instances not in the majority of a cluster are treated as misclassified.

14 Features WordNet classes for Subject: Person, Animate, State, Event, …
WordNet classes for Object Passivization: 0, 1 Transitivity: 0, 1 PP as adjuncts: location, direction, beneficiary … Double objects: 0, 1 Clausal complements: 0, 1

15 Preliminary Experiment Results
Table 3 Accuracy of EM clustering on five verbs

16 Outline Task Description Methodology
Unsupervised Verb Frameset Tagging Future Work

17 Future Work To improve current model by
Refine Subcategorization Extraction Use More Features Example: a. He has to live with this programming work. (live 02 endure) b. He lived with his relatives. (live 01 inhabit) To cluster nouns automatically instead of using WordNet to group nouns

18 Thanks!

19 Table 4 lower bound on Decision Tree error rate

20 Table 5 Error rate of DT with different feature sets

21 Table 6 Accuracy of EM clustering on five verbs

22 What’s VSG? Aggregate the senses of a verb into several groups according to their similarities Example: Learn GROUP 1: WN1, WN3 (acquire a skill) GROUP 2: WN2, WN6 (find out) SINGLETON: WN4 (be a student) SINGLETON: WN5 (teach) WordNet Meaning (simplified): 1. acquire or gain knowledge or skills -- ("She learned dancing”) 2. hear, find out -- ("I learned that she has two grown-up children“) 3. memorize, con -- (commit to memory; learn by heart) 4. study, read, take -- (be a student of a certain subject; "She is learning for the bar exam") 5. teach, learn, instruct -- ("I learned them French") 6. determine, find out -- ("I want to learn whether she speaks French“)

23 Groups Senses Portuguese German G1 WN1,WN2 desenvolver entwickeln G2
bilden G3 WN8 ausbilden WN13 G4 WN5 desenvolver-se WN10 Bilden sich Table 7 Portuguese and German translations of develop


Download ppt "Coarse-grained Word Sense Disambiguation"

Similar presentations


Ads by Google