Download presentation
Presentation is loading. Please wait.
1
Semi-Supervised Natural Language Learning Reading Group I set up a site at: http://www.cs.cmu.edu/~acarlson/semisup ervised/ http://www.cs.cmu.edu/~acarlson/semisup ervised/ Cover other applications of semi- supervised learning? Volunteers? Every week or bi-weekly? Time change? 1pm? Noon?
2
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Author: David Yarowsky (1995) Presented by: Andy Carlson
3
Word Sense Disambiguation Determining what sense of a word is meant in a given sentence “Toyota is considering opening a plant in Detroit.” “The banana plant is grown all over the tropics for its fruit.” Different from sense induction– we assume we already know distinct senses
4
Using unlabeled data Two properties of language let us use unlabeled data: One sense per collocation –Nearby words provide strong and consistent clues One sense per discourse –With a document, the sense of a word is highly consistent We can base an iterative bootstrapping algorithm on these two properties
5
One sense per discourse How accurate? How frequently does it apply?
7
Decision Lists List of rules of the form “collocation => sense” Example: life (within 2-10 words) => biological sense of plant Rules are ordered by log-likelihood ratio
8
The algorithm – step 1 Find all occurrences of the given polysemous word We follow examples for the word plant
10
Step 2 – Initial Labeling For each sense of the word, identify a small number of training examples Strategies: dictionary words, human- labelling of most frequent collocates, or human-chosen collocates Example: the words life and manufacturing are used as seed collocations
11
Labeled as ‘living’ plant
12
Unlabeled examples
13
Labeled as ‘factory’ plant
14
Sample initial state
15
Step 3a Train the decision list based on the current labeling of the state space
16
Step 3b Apply learned classifier to all examples
17
Step 3c Optionally, apply the one-sense-per- discourse constraint
18
Step 3c
20
After steps 3b and 3c
21
Step 3d Repeat step 3 iteratively Details – grow window size for collocations, and randomly perturb the class inclusion threshold
22
Step 4 Stop. The algorithm converges to a stable residual set.
23
Sample final state
24
Final decision list
25
Results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.