Download presentation
Presentation is loading. Please wait.
Published by魂 计 Modified over 5 years ago
1
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Su Weifeng Sep. 22, 2004
2
The SENSEVAL–3 English Lexical Sample Task
Rada Mihalcea, Timothy Chklovski (Senseval-3) Disambiguating Noun Groupings with Respect to WordNet Senses Philip Resnik (1995) Finding Predominant Word Senses in Untagged Text Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll (ACL2004, Best Paper Award) HKUST Human Language Technology Center © 2003
3
An unsupervised WSD method Intuition:
Lesk algorithm assigns a target word the sense that is most related to the senses of its neighboring words. An unsupervised WSD method Intuition: Related word senses will be defined using similar words, and there will be overlaps in their definitions that will indicate their relatedness. We disambiguate a polysemous word by picking the sense of the target word whose definition has the most words in common with the definitions of other words in a given window of context. Key point: Similarity or relatedness between senses Definition of the sense HKUST Human Language Technology Center © 2003
4
A gloss-based similarity method is proposed for WSD.
The SENSEVAL–3 English Lexical Sample Task (Rada Mihalcea, Timothy Chklovski) 66.1 precision & 65.7 recall fine-grained scoring can be get for Senseval-3 English lexical sample task. The baseline precision for this task was 53.5% The second best score of unsupervised method for Senseval-3 English lexical sample task reaches a precision/recall to be 56.3% A gloss-based similarity method is proposed for WSD. HKUST Human Language Technology Center © 2003
5
A gloss-centered method is proposed.
Definition for each sense of the target word is represented as a concatenation of two glosses of the sense: Descriptive glosses: the description of the sense of a word 1. (10) tape -- (a long thin piece of cloth or paper as used for binding or fastening; "he used a piece of tape for a belt"; "he wrapped a tape around the package") 2. (2) tape, tape recording, taping -- (a recording made on magnetic tape; "the several recordings were combined on a master tape") Hypernymy glosses: what hierarchical the sense belong to Vesuvius#n#1 => volcano => mountain, mount => natural elevation, elevation => geological formation => natural object => object, physical object => entity Cosine similarity between words of the context and words of the definition is used as the similarity measure HKUST Human Language Technology Center © 2003
6
Disambiguating Noun Groupings with Respect to WordNet Senses (Philip Resnik, 1995)
Task statement: automatic sense disambiguation of nouns appearing within sets of related nouns. Example: burglars, thief, rob, mugging, stray, robbing, look out, chase, crate, thieves we expect to assign a high score for sense ``lookout, lookout man, sentinel, sentry, watch, scout: a person employed to watch for something to happen.'' and a low score for sense ``an observation tower, or to the activity of watching.'' HKUST Human Language Technology Center © 2003
7
An information content based semantic similarity measure is used.
Information content is a measure of specificity that is assigned to each sense in a hierarchy. A sense with a high information content is very specific to a particular topic Senses with lower information content are associated with more general, less specific senses. carving fork has a higher information content than entity has. Information content of a sense is estimated by counting the frequency of that sense in a large corpus and thereby determining its probability via a maximum likelihood estimate. IC(sense)=-log(P(sense)) lcs: least common subsumer lcs(c1,c2) c1 c2 HKUST Human Language Technology Center © 2003
8
The related words of the target word are treated as context.
The algorithm assign performing well at the task of assigning a high score to the best sense, it does a good job of assigning low scores to senses that are clearly inappropriate. HKUST Human Language Technology Center © 2003
9
Finding Predominant Word Senses in Untagged Text (Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll 2004) Task statement: Use a thesaurus acquired from raw textual corpora to find predominant noun senses automatically. For each target noun, Input: k nearest neighbors and their distributional similarity score to the target word. WordNet similarity package which give a semantic similarity measure Output: The predominant sense of the target noun HKUST Human Language Technology Center © 2003
10
Finding Predominant Word Senses in Untagged Text (Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll 2004) Task statement: Use a thesaurus acquired from raw textual corpora to find predominant noun senses automatically. For each target noun, Input: k nearest neighbors and their distributional similarity score to the target word. WordNet similarity package which give a semantic similarity measure Output: The predominant sense of the target noun HKUST Human Language Technology Center © 2003
11
HKUST Human Language Technology Center
© 2003
12
An adapted Lesk algorithm is performed
An information content based semantic similarity measure is used. Jcn: The semantic similarity is multiplied by a distributional similarity HKUST Human Language Technology Center © 2003
13
The method can effectively discover appropriate predominant senses for words within a corpus.
The acquired predominant senses give a precision of 64% on the nouns of the SENSEVAL-2 English all-words task. This is nearly as well the first sense provided by Semcor. The method discovers appropriate predominant senses for words from two domain specific corpora. When performed on SemCor, the method get a precision of 54% while the random selection can only reach a precision of 32%. The first sense in SemCor provides an upperbound for this task of 67%. HKUST Human Language Technology Center © 2003
14
Conclusions The gloss and hierarchy of a sense provides valuable information for sense disambiguation. Unsupervised WSD method can reach a precision comparable with supervised WSD method. HKUST Human Language Technology Center © 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.