Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/

Semi-Supervised Natural Language Learning Reading Group I set up a site at: http://www.cs.cmu.edu/~acarlson/semisup ervised/ http://www.cs.cmu.edu/~acarlson/semisup ervised/ Cover other applications of semi- supervised learning? Volunteers? Every week or bi-weekly? Time change? 1pm? Noon?

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Author: David Yarowsky (1995) Presented by: Andy Carlson

Word Sense Disambiguation Determining what sense of a word is meant in a given sentence “Toyota is considering opening a plant in Detroit.” “The banana plant is grown all over the tropics for its fruit.” Different from sense induction– we assume we already know distinct senses

Using unlabeled data Two properties of language let us use unlabeled data: One sense per collocation –Nearby words provide strong and consistent clues One sense per discourse –With a document, the sense of a word is highly consistent We can base an iterative bootstrapping algorithm on these two properties

One sense per discourse How accurate? How frequently does it apply?

Decision Lists List of rules of the form “collocation => sense” Example: life (within 2-10 words) => biological sense of plant Rules are ordered by log-likelihood ratio

The algorithm – step 1 Find all occurrences of the given polysemous word We follow examples for the word plant

Step 2 – Initial Labeling For each sense of the word, identify a small number of training examples Strategies: dictionary words, human- labelling of most frequent collocates, or human-chosen collocates Example: the words life and manufacturing are used as seed collocations

Labeled as ‘living’ plant

Unlabeled examples

Labeled as ‘factory’ plant

Sample initial state

Step 3a Train the decision list based on the current labeling of the state space

Step 3b Apply learned classifier to all examples

Step 3c Optionally, apply the one-sense-per- discourse constraint

Step 3c

After steps 3b and 3c

Step 3d Repeat step 3 iteratively Details – grow window size for collocations, and randomly perturb the class inclusion threshold

Step 4 Stop. The algorithm converges to a stable residual set.

Sample final state

Final decision list

Results

Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/

Similar presentations

Presentation on theme: "Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/

Similar presentations

Presentation on theme: "Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/"— Presentation transcript:

Similar presentations

About project

Feedback