Presentation is loading. Please wait.

Presentation is loading. Please wait.

WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.

Similar presentations


Presentation on theme: "WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu."— Presentation transcript:

1 WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu

2 Introduction Regular approaches  All words  Sample (small trial section) Problems  Ambiguity, especially at fine granularity  New senses in text that are not in dictionary

3 Approach Integrates partial sources of information  Part-of-speech  Dictionary definitions  Pragmatic codes  Selectional restrictions Integration  Filters  Partial selectors (taggers)

4 Dictionary for senses Longman Dictionary of Contemporary English (LDOCE) Two levels:  Homograph  Sense

5 Methodology Preprocessing  Part-of-speech tagger (Brill) Part-of-speech  Filter – eliminate all incompatible homographs  If no sense remains – keep all senses

6 Methodology (cont.) Dictionary definitions  Partial tagger: Count number of words that appear both in definition and the context Normalize by the length of the definition Return a list of candidate senses

7 Methodology (cont.) Pragmatic codes  Partial tagger - Uses the hierarchy of LDOCE pragmatic codes (subject area)  Modified simulated annealing  Optimize the number of pragmatic codes of the same type in the sentence  Whole paragraph - Only for nouns ?

8 Methodology (cont.) Selectional Restrictions  Filter  LDOCE senses – 35 semantic classes (H = human, M = human male, P = plant, etc)  Nouns – their type, adjs – the type of the object they modify, adv – type of their modifier, verbs – types of S, DO, IO

9 Methodology (cont.) Combine knowledge sources  Decision lists  Can assign sense to unknown words, if there is a definition in LDOCE

10 Evaluation Create a corpus based on SemCor (200,000 words; tagged with WordNet senses)  SENSUS – merging between LDOCE and WordNet (for Machine Translation)  Still ambiguity  36,869 out of 85,747 words (personal opinion: strongly biased)

11 Results Baseline: 49.8% 70% of the 1 st sense – correctly tagged 83.4% accuracy = 92.8% accuracy on all words (!!!) Test by voting:


Download ppt "WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu."

Similar presentations


Ads by Google