Download presentation
Presentation is loading. Please wait.
1
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu
2
Introduction Regular approaches All words Sample (small trial section) Problems Ambiguity, especially at fine granularity New senses in text that are not in dictionary
3
Approach Integrates partial sources of information Part-of-speech Dictionary definitions Pragmatic codes Selectional restrictions Integration Filters Partial selectors (taggers)
4
Dictionary for senses Longman Dictionary of Contemporary English (LDOCE) Two levels: Homograph Sense
5
Methodology Preprocessing Part-of-speech tagger (Brill) Part-of-speech Filter – eliminate all incompatible homographs If no sense remains – keep all senses
6
Methodology (cont.) Dictionary definitions Partial tagger: Count number of words that appear both in definition and the context Normalize by the length of the definition Return a list of candidate senses
7
Methodology (cont.) Pragmatic codes Partial tagger - Uses the hierarchy of LDOCE pragmatic codes (subject area) Modified simulated annealing Optimize the number of pragmatic codes of the same type in the sentence Whole paragraph - Only for nouns ?
8
Methodology (cont.) Selectional Restrictions Filter LDOCE senses – 35 semantic classes (H = human, M = human male, P = plant, etc) Nouns – their type, adjs – the type of the object they modify, adv – type of their modifier, verbs – types of S, DO, IO
9
Methodology (cont.) Combine knowledge sources Decision lists Can assign sense to unknown words, if there is a definition in LDOCE
10
Evaluation Create a corpus based on SemCor (200,000 words; tagged with WordNet senses) SENSUS – merging between LDOCE and WordNet (for Machine Translation) Still ambiguity 36,869 out of 85,747 words (personal opinion: strongly biased)
11
Results Baseline: 49.8% 70% of the 1 st sense – correctly tagged 83.4% accuracy = 92.8% accuracy on all words (!!!) Test by voting:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.