Presentation is loading. Please wait.

Presentation is loading. Please wait.

Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.

Similar presentations


Presentation on theme: "Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text."— Presentation transcript:

1 Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text Applications: –Machine translation –Information retrieval –Semantic interpretation of text

2 Tagging? Idea: Treat sense disambiguation like POS tagging, just with “semantic tags” The problems differ: –POS tags depend on specific structural cues (mostly neighboring tags) –Senses depend on semantic context – less structured, longer distance dependency

3 Approaches Supervised learning: Learn from a pretagged corpus Dictionary-Based Learning Learn to distinguish senses from dictionary entries Unsupervised Learning Automatically cluster word occurrences into different senses

4 Evaluation Train and test on pretagged texts –Difficult to come by Artificial data: ‘merge’ two words to form an ‘ambiguous’ word with two ‘senses’ –E.g, replace all occurrences of door and of window with doorwindow and see if the system figures out which is which

5 Performance Bounds How good is (say) 83.2%?? Evaluate performance relative to lower and upper bounds: –Baseline performance: how well does the simplest “reasonable” algorithm do? –Human performance: what percentage of the time do people agree on classification?

6 Supervised Learning Each ambiguous word token w i = w k in the training is tagged with a sense from s k 1,…,s k n k Each word token occurs in a context c i (usually defined as a window around the word occurrence – up to ~100 words long) Each context contains a set of words used as features v i j

7 Bayesian Classification Bayes decision rule: Classify s(w i ) = arg max s P(s | c i ) Minimizes probability of error How to compute? Use Bayes’ Theorem:

8 Bayes’ Classifier (cont.) Note that P(c) is constant for all senses, therefore:

9 Naïve Bayes Assume: –Features are conditionally independent, given the example class, –Feature order doesn’t matter (bag of words model – repetition counts)

10 Naïve Bayes Training For all senses s k of w, do: –For all words v j in the vocabulary, do: For all senses s k of w, do:

11 Naïve Bayes Classification For all senses s k of w i, do: –score(s k ) = log P(s k ) –For all words vj in the context window c i, do: score(s k ) += log P(v j | s k ) Choose s(w i ) = arg max s k score(s k )

12 Significant Features Senses of drug (Gale et al. 1992): ‘medication’prices, prescription, patent, increase, consumer, pharmaceutical ‘illegal substance’ abuse, paraphernalia, illicit, alcohol, cocaine, traffickers

13 Dictionary-Based Disambiguation Idea: Choose between senses of a word given in a dictionary based on the words in the definitions Cone: 1.A mass of ovule-bearing or pollen-bearing scales in trees of the pine family or in cycads that are arranged usually on a somewhat elongated axis 2.Something that resembles a cone in shape: as a crisp cone-shaped wafer for holding ice cream

14 Algorithm (Lesk 1986) Define D i (w) as the bag of words in the ith definition for w Define E(w) as  i D i (w) For all senses s k of w, do: –score(s k ) = similarity(D k (w), [  v j in c E(v j ) ] ) Choose s = arg max s k score(s k )

15 Similarity Metrics similarity(X, Y) =

16 Simple Example ash: s 1 : a tree of the olive family s 2 : the solid residue left when combustible material is burned This cigar burns slowly and creates a stiff ash 2 The ash 1 is one of the last trees to come into leaf After being struck by lightning the olive tree was reduced to ash ?

17 Some Improvements Lesk obtained results of 50-70% accuracy Possible improvements: Run iteratively, each time only using definitions of “appropriate” senses for context words Expand each word to a set of synonyms, using also a thesaurus

18 Thesaurus-Based Disambiguation Thesaurus assigns subject codes to different words, assigning multiple codes to ambiguous words t(s k ) = subject code of sense s k for word w in the thesaurus  (t,v) = 1 iff t is a subject code for word v

19 Simple Algorithm Count up number of context words with same subject code: for each sense s k of w i, do: score(s k ) =  v j in c i  (t(s k ),v j ) s(w i ) = arg max s k score(s k )

20 Some Issues Domain-dependence: In computer manuals, “mouse” will not be evidence for topic “mammal” Coverage: “Michael Jordan” will not likely be in a thesaurus, but is an excellent indicator for topic “sports”

21 Tuning for a Corpus Use a naïve-Bayes formulation: P(t | c) = [ P(t)  v j in c P(v j | t) ] / [  v j in c P(v j ) ] Initialize probabilities as uniform Re-estimate P(t) and P(v j | t) for each topic t and each word v j by evaluating all contexts in the corpus, assuming the context has topic t if P(t | c) >  (where  is a predefined threshhold) Disambiguate by choosing the highest probability topic

22 Training ( based on the paper ): for all topics t l, let W l be the set of words listed under the topic for all topics t l, let T l = { c(w) | w  W l } for all words v j, let V j = { c | v j  c } for all words v j, topics t l, let: P(v j | t l ) = |V j  T l | / (  j |V j  T l | ) (with smoothing) for all topics t l, let: P(t l ) = (  j |V j  T l | ) / (  l  j |V j  T l | ) for all words v j, let: P(v j ) = |V j | / N c (where N c is the total number of contexts)

23 Using a Bilingual Corpus Use correlations between phrases in two languages to disambiguate E.g, interest =‘legal share’ (acquire an interest ) ‘attention’(show interest) In GermanBeteiligung erwerben Interesse zeigen Depending on where the translations of related words occur, determine which sense applies

24 Scoring Given a context c in which a syntactic relation R(w,v) holds between w and a context word v: –Score of sense s k is the number of contexts c’ in the second language such that R(w’,v’)  c’ where w’ is a translation of s k and v’ is a translation of v. –Choose highest-scoring sense

25 Global Constraints One sense per discourse: The sense of a word tends to be consistent within a given document One sense per collocation: Nearby words provide strong and consistent clues for sense, depending on distance, order, and syntactic relationship

26 Examples D 1 livingexistence of plant and animal life livingclassified as either plant or animal ???bacterial and plant cells are enclosed D 2 livingcontains varied plant and animal life livingthe most common plant life factoryprotected by plant parts remaining

27 Collocations Collocation: a pair of words that tend to occur together Rank collocations that occur with different senses by: rank(f) = P(s k 1 | f) / P(s k 2 | f) A higher rank for a collocation f means it is more highly indicative of its sense

28 for all senses s k of w do: F k = all collocations in s k ’s definition E k = { } while at least one E k changed, do: for all senses s k of w do: E k = { c i | exists f m in c i & f m in F k } for all senses s k of w do: F k = { f m | P(s k | f m ) / P(s n | f m ) > a } for all documents d do: determine the majority sense s* of w in d assign all occurrences of w to s*


Download ppt "Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text."

Similar presentations


Ads by Google