Download presentation
Presentation is loading. Please wait.
Published bySharon Cunningham Modified over 9 years ago
1
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text Applications: –Machine translation –Information retrieval –Semantic interpretation of text
2
Tagging? Idea: Treat sense disambiguation like POS tagging, just with “semantic tags” The problems differ: –POS tags depend on specific structural cues (mostly neighboring tags) –Senses depend on semantic context – less structured, longer distance dependency
3
Approaches Supervised learning: Learn from a pretagged corpus Dictionary-Based Learning Learn to distinguish senses from dictionary entries Unsupervised Learning Automatically cluster word occurrences into different senses
4
Evaluation Train and test on pretagged texts –Difficult to come by Artificial data: ‘merge’ two words to form an ‘ambiguous’ word with two ‘senses’ –E.g, replace all occurrences of door and of window with doorwindow and see if the system figures out which is which
5
Performance Bounds How good is (say) 83.2%?? Evaluate performance relative to lower and upper bounds: –Baseline performance: how well does the simplest “reasonable” algorithm do? –Human performance: what percentage of the time do people agree on classification?
6
Supervised Learning Each ambiguous word token w i = w k in the training is tagged with a sense from s k 1,…,s k n k Each word token occurs in a context c i (usually defined as a window around the word occurrence – up to ~100 words long) Each context contains a set of words used as features v i j
7
Bayesian Classification Bayes decision rule: Classify s(w i ) = arg max s P(s | c i ) Minimizes probability of error How to compute? Use Bayes’ Theorem:
8
Bayes’ Classifier (cont.) Note that P(c) is constant for all senses, therefore:
9
Naïve Bayes Assume: –Features are conditionally independent, given the example class, –Feature order doesn’t matter (bag of words model – repetition counts)
10
Naïve Bayes Training For all senses s k of w, do: –For all words v j in the vocabulary, do: For all senses s k of w, do:
11
Naïve Bayes Classification For all senses s k of w i, do: –score(s k ) = log P(s k ) –For all words vj in the context window c i, do: score(s k ) += log P(v j | s k ) Choose s(w i ) = arg max s k score(s k )
12
Significant Features Senses of drug (Gale et al. 1992): ‘medication’prices, prescription, patent, increase, consumer, pharmaceutical ‘illegal substance’ abuse, paraphernalia, illicit, alcohol, cocaine, traffickers
13
Dictionary-Based Disambiguation Idea: Choose between senses of a word given in a dictionary based on the words in the definitions Cone: 1.A mass of ovule-bearing or pollen-bearing scales in trees of the pine family or in cycads that are arranged usually on a somewhat elongated axis 2.Something that resembles a cone in shape: as a crisp cone-shaped wafer for holding ice cream
14
Algorithm (Lesk 1986) Define D i (w) as the bag of words in the ith definition for w Define E(w) as i D i (w) For all senses s k of w, do: –score(s k ) = similarity(D k (w), [ v j in c E(v j ) ] ) Choose s = arg max s k score(s k )
15
Similarity Metrics similarity(X, Y) =
16
Simple Example ash: s 1 : a tree of the olive family s 2 : the solid residue left when combustible material is burned This cigar burns slowly and creates a stiff ash 2 The ash 1 is one of the last trees to come into leaf After being struck by lightning the olive tree was reduced to ash ?
17
Some Improvements Lesk obtained results of 50-70% accuracy Possible improvements: Run iteratively, each time only using definitions of “appropriate” senses for context words Expand each word to a set of synonyms, using also a thesaurus
18
Thesaurus-Based Disambiguation Thesaurus assigns subject codes to different words, assigning multiple codes to ambiguous words t(s k ) = subject code of sense s k for word w in the thesaurus (t,v) = 1 iff t is a subject code for word v
19
Simple Algorithm Count up number of context words with same subject code: for each sense s k of w i, do: score(s k ) = v j in c i (t(s k ),v j ) s(w i ) = arg max s k score(s k )
20
Some Issues Domain-dependence: In computer manuals, “mouse” will not be evidence for topic “mammal” Coverage: “Michael Jordan” will not likely be in a thesaurus, but is an excellent indicator for topic “sports”
21
Tuning for a Corpus Use a naïve-Bayes formulation: P(t | c) = [ P(t) v j in c P(v j | t) ] / [ v j in c P(v j ) ] Initialize probabilities as uniform Re-estimate P(t) and P(v j | t) for each topic t and each word v j by evaluating all contexts in the corpus, assuming the context has topic t if P(t | c) > (where is a predefined threshhold) Disambiguate by choosing the highest probability topic
22
Training ( based on the paper ): for all topics t l, let W l be the set of words listed under the topic for all topics t l, let T l = { c(w) | w W l } for all words v j, let V j = { c | v j c } for all words v j, topics t l, let: P(v j | t l ) = |V j T l | / ( j |V j T l | ) (with smoothing) for all topics t l, let: P(t l ) = ( j |V j T l | ) / ( l j |V j T l | ) for all words v j, let: P(v j ) = |V j | / N c (where N c is the total number of contexts)
23
Using a Bilingual Corpus Use correlations between phrases in two languages to disambiguate E.g, interest =‘legal share’ (acquire an interest ) ‘attention’(show interest) In GermanBeteiligung erwerben Interesse zeigen Depending on where the translations of related words occur, determine which sense applies
24
Scoring Given a context c in which a syntactic relation R(w,v) holds between w and a context word v: –Score of sense s k is the number of contexts c’ in the second language such that R(w’,v’) c’ where w’ is a translation of s k and v’ is a translation of v. –Choose highest-scoring sense
25
Global Constraints One sense per discourse: The sense of a word tends to be consistent within a given document One sense per collocation: Nearby words provide strong and consistent clues for sense, depending on distance, order, and syntactic relationship
26
Examples D 1 livingexistence of plant and animal life livingclassified as either plant or animal ???bacterial and plant cells are enclosed D 2 livingcontains varied plant and animal life livingthe most common plant life factoryprotected by plant parts remaining
27
Collocations Collocation: a pair of words that tend to occur together Rank collocations that occur with different senses by: rank(f) = P(s k 1 | f) / P(s k 2 | f) A higher rank for a collocation f means it is more highly indicative of its sense
28
for all senses s k of w do: F k = all collocations in s k ’s definition E k = { } while at least one E k changed, do: for all senses s k of w do: E k = { c i | exists f m in c i & f m in F k } for all senses s k of w do: F k = { f m | P(s k | f m ) / P(s n | f m ) > a } for all documents d do: determine the majority sense s* of w in d assign all occurrences of w to s*
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.