Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 28, 2011.

Slides:



Advertisements
Similar presentations
Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.
Advertisements

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill MacCartney.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Measures of Text Similarity
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Katrin Erk Distributional models. Representing meaning through collections of words Doc 1: Abdullah boycotting challenger commission dangerous election.
Lexical Semantics & Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 16, 2011.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.
Using eigenvectors of a bigram-induced matrix to represent and infer syntactic behavior Mikhail Belkin and John Goldsmith The University of Chicago July.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Advanced Multimedia Text Classification Tamara Berg.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
Corpus-Based Approaches to Word Sense Disambiguation
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.
Text Classification, Active/Interactive learning.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Lexical Semantics & Word Sense Disambiguation CMSC Natural Language Processing May 15, 2003.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Lecture 22 Word Similarity Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarityReadings: NLTK book Chapter.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
NLP.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Vector Semantics Dense Vectors.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
From Frequency to Meaning: Vector Space Models of Semantics
Plan for Today’s Lecture(s)
Information Retrieval: Models and Methods
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Information Retrieval: Models and Methods
Entity- & Topic-Based Information Ordering
Vector Semantics Introduction.
Word Meaning and Similarity
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Statistical NLP: Lecture 9
Representation of documents and queries
Text Categorization Assigning documents to a fixed set of categories
Word Embedding Word2Vec.
Lecture 22 Word Similarity
Text Categorization Berlin Chen 2003 Reference:
Word embeddings (continued)
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 28, 2011

Word Sense Disambiguation Robust Approaches Similarity-based approaches Thesaurus-based techniques Resnik/Lin similarity Unsupervised, distributional approaches Word-space (Infomap) Why they work Why they don’t

Resnik’s Similarity Measure Information content of node: IC(c) = -log P(c) Least common subsumer (LCS): Lowest node in hierarchy subsuming 2 nodes Similarity measure: sim RESNIK (c 1,c 2 ) = - log P(LCS(c 1,c 2 )) Issue: Not content, but difference between node & LCS

Application to WSD Calculate Informativeness For Each Node in WordNet:

Application to WSD Calculate Informativeness For Each Node in WordNet: Sum occurrences of concept and all children Compute IC

Application to WSD Calculate Informativeness For Each Node in WordNet: Sum occurrences of concept and all children Compute IC Disambiguate with WordNet

Application to WSD Calculate Informativeness For Each Node in WordNet: Sum occurrences of concept and all children Compute IC Disambiguate with WordNet Assume set of words in context E.g. {plants, animals, rainforest, species} from article

Application to WSD Calculate Informativeness For Each Node in WordNet: Sum occurrences of concept and all children Compute IC Disambiguate with WordNet Assume set of words in context E.g. {plants, animals, rainforest, species} from article Find Most Informative Subsumer for each pair, I Find LCS for each pair of senses, pick highest similarity

Application to WSD Calculate Informativeness For Each Node in WordNet: Sum occurrences of concept and all children Compute IC Disambiguate with WordNet Assume set of words in context E.g. {plants, animals, rainforest, species} from article Find Most Informative Subsumer for each pair, I Find LCS for each pair of senses, pick highest similarity For each subsumed sense, Vote += I

Application to WSD Calculate Informativeness For Each Node in WordNet: Sum occurrences of concept and all children Compute IC Disambiguate with WordNet Assume set of words in context E.g. {plants, animals, rainforest, species} from article Find Most Informative Subsumer for each pair, I Find LCS for each pair of senses, pick highest similarity For each subsumed sense, Vote += I Select Sense with Highest Vote

IC Example

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We ’ re engineering, manufacturing and commissioning world- wide ready-to-run plants packed with our comprehensive know- how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “ Plant ”

Sense Labeling Under WordNet Use Local Content Words as Clusters Biology: Plants, Animals, Rainforests, species… Industry: Company, Products, Range, Systems…

Sense Labeling Under WordNet Use Local Content Words as Clusters Biology: Plants, Animals, Rainforests, species… Industry: Company, Products, Range, Systems… Find Common Ancestors in WordNet

Sense Labeling Under WordNet Use Local Content Words as Clusters Biology: Plants, Animals, Rainforests, species… Industry: Company, Products, Range, Systems… Find Common Ancestors in WordNet Biology: Plants & Animals isa Living Thing

Sense Labeling Under WordNet Use Local Content Words as Clusters Biology: Plants, Animals, Rainforests, species… Industry: Company, Products, Range, Systems… Find Common Ancestors in WordNet Biology: Plants & Animals isa Living Thing Industry: Product & Plant isa Artifact isa Entity

Sense Labeling Under WordNet Use Local Content Words as Clusters Biology: Plants, Animals, Rainforests, species… Industry: Company, Products, Range, Systems… Find Common Ancestors in WordNet Biology: Plants & Animals isa Living Thing Industry: Product & Plant isa Artifact isa Entity Use Most Informative Result: Correct Selection

Thesaurus Similarity Issues

Coverage: Few languages have large thesauri

Thesaurus Similarity Issues Coverage: Few languages have large thesauri Few languages have large sense tagged corpora

Thesaurus Similarity Issues Coverage: Few languages have large thesauri Few languages have large sense tagged corpora Thesaurus design: Works well for noun IS-A hierarchy

Thesaurus Similarity Issues Coverage: Few languages have large thesauri Few languages have large sense tagged corpora Thesaurus design: Works well for noun IS-A hierarchy Verb hierarchy shallow, bushy, less informative

Distributional Similarity Unsupervised approach: Clustering, WSD, automatic thesaurus enrichment

Distributional Similarity Unsupervised approach: Clustering, WSD, automatic thesaurus enrichment Insight: “You shall know a word by the company it keeps!” (Firth, 1957)

Distributional Similarity Unsupervised approach: Clustering, WSD, automatic thesaurus enrichment Insight: “You shall know a word by the company it keeps!” (Firth, 1957) A bottle of tezguino is on the table. Everybody likes tezguino. Tezguino makes you drunk. We make tezguino from corn.

Distributional Similarity Unsupervised approach: Clustering, WSD, automatic thesaurus enrichment Insight: “You shall know a word by the company it keeps!” (Firth, 1957) A bottle of tezguino is on the table. Everybody likes tezguino. Tezguino makes you drunk. We make tezguino from corn. Tezguino: corn-based, alcoholic beverage

Distributional Similarity Represent ‘company’ of word such that similar words will have similar representations ‘Company’ = context

Distributional Similarity Represent ‘company’ of word such that similar words will have similar representations ‘Company’ = context Word represented by context feature vector Many alternatives for vector

Distributional Similarity Represent ‘company’ of word such that similar words will have similar representations ‘Company’ = context Word represented by context feature vector Many alternatives for vector Initial representation: ‘Bag of words’ binary feature vector

Distributional Similarity Represent ‘company’ of word such that similar words will have similar representations ‘Company’ = context Word represented by context feature vector Many alternatives for vector Initial representation: ‘Bag of words’ binary feature vector Feature vector length N, where N is of vocabulary f i = 1 if word i within window of w, 0 o.w.

Binary Feature Vector

Distributional Similarity Questions

What is the right neighborhood? What is the context?

Distributional Similarity Questions What is the right neighborhood? What is the context? How should we weight the features?

Distributional Similarity Questions What is the right neighborhood? What is the context? How should we weight the features? How can we compute similarity between vectors?

Feature Vector Design Window size: How many words in the neighborhood? Tradeoff:

Feature Vector Design Window size: How many words in the neighborhood? Tradeoff: +/- 500 words

Feature Vector Design Window size: How many words in the neighborhood? Tradeoff: +/- 500 words: ‘topical context’

Feature Vector Design Window size: How many words in the neighborhood? Tradeoff: +/- 500 words: ‘topical context’ +/- 1 or 2 words:

Feature Vector Design Window size: How many words in the neighborhood? Tradeoff: +/- 500 words: ‘topical context’ +/- 1 or 2 words: collocations, predicate-argument

Feature Vector Design Window size: How many words in the neighborhood? Tradeoff: +/- 500 words: ‘topical context’ +/- 1 or 2 words: collocations, predicate-argument Only words in some grammatical relation Parse text (dependency) Include subj-verb; verb-obj; adj-mod NxR vector: word x relation

Example Lin Relation Vector

Weighting Features Baseline: Binary (0/1)

Weighting Features Baseline: Binary (0/1) Minimally informative Can’t capture intuition that frequent features informative

Weighting Features Baseline: Binary (0/1) Minimally informative Can’t capture intuition that frequent features informative Frequency or Probability:

Weighting Features Baseline: Binary (0/1) Minimally informative Can’t capture intuition that frequent features informative Frequency or Probability: Better but,

Weighting Features Baseline: Binary (0/1) Minimally informative Can’t capture intuition that frequent features informative Frequency or Probability: Better but, Can overweight a priori frequent features Chance cooccurrence

Pointwise Mutual Information PMI: - Contrasts observed cooccurrence - With that expected by chance (if independent)

Pointwise Mutual Information PMI: - Contrasts observed cooccurrence - With that expected by chance (if independent) -Generally only use positive values - Negatives inaccurate unless corpus huge

Vector Similarity Euclidean or Manhattan distances:

Vector Similarity Euclidean or Manhattan distances: Too sensitive to extreme values

Vector Similarity Euclidean or Manhattan distances: Too sensitive to extreme values Dot product:

Vector Similarity Euclidean or Manhattan distances: Too sensitive to extreme values Dot product: Favors long vectors: More features or higher values

Vector Similarity Euclidean or Manhattan distances: Too sensitive to extreme values Dot product: Favors long vectors: More features or higher values Cosine:

Schutze’s Word Space Build a co-occurrence matrix

Schutze’s Word Space Build a co-occurrence matrix Restrict Vocabulary to 4 letter sequences

Schutze’s Word Space Build a co-occurrence matrix Restrict Vocabulary to 4 letter sequences Similar effect to stemming Exclude Very Frequent - Articles, Affixes

Schutze’s Word Space Build a co-occurrence matrix Restrict Vocabulary to 4 letter sequences Similar effect to stemming Exclude Very Frequent - Articles, Affixes Entries in Matrix Apply Singular Value Decomposition (SVD) Reduce to 97 dimensions

Schutze’s Word Space Build a co-occurrence matrix Restrict Vocabulary to 4 letter sequences Similar effect to stemming Exclude Very Frequent - Articles, Affixes Entries in Matrix Apply Singular Value Decomposition (SVD) Reduce to 97 dimensions Word Context 4grams within 1001 Characters Sum & Normalize Vectors for each 4gram Distances between Vectors by dot product

Schutze’s Word Space Word Sense Disambiguation

Schutze’s Word Space Word Sense Disambiguation Context Vectors of All Instances of Word

Schutze’s Word Space Word Sense Disambiguation Context Vectors of All Instances of Word Automatically Cluster Context Vectors

Schutze’s Word Space Word Sense Disambiguation Context Vectors of All Instances of Word Automatically Cluster Context Vectors Hand-label Clusters with Sense Tag

Schutze’s Word Space Word Sense Disambiguation Context Vectors of All Instances of Word Automatically Cluster Context Vectors Hand-label Clusters with Sense Tag Tag New Instance with Nearest Cluster

There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We ’ re engineering, manufacturing and commissioning world- wide ready-to-run plants packed with our comprehensive know- how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “ Plant ”

Sense Selection in “Word Space” Build a Context Vector

Sense Selection in “Word Space” Build a Context Vector 1,001 character window - Whole Article

Sense Selection in “Word Space” Build a Context Vector 1,001 character window - Whole Article Compare Vector Distances to Sense Clusters

Sense Selection in “Word Space” Build a Context Vector 1,001 character window - Whole Article Compare Vector Distances to Sense Clusters Only 3 Content Words in Common Distant Context Vectors Clusters - Build Automatically, Label Manually

Sense Selection in “Word Space” Build a Context Vector 1,001 character window - Whole Article Compare Vector Distances to Sense Clusters Only 3 Content Words in Common Distant Context Vectors Clusters - Build Automatically, Label Manually Result: 2 Different, Correct Senses 92% on Pair-wise tasks

Odd Cluster Examples The “Ste.” Cluster: Dry Oyster Whisky Hot Float Ice

Odd Cluster Examples The “Ste.” Cluster: Dry Oyster Whisky Hot Float Ice Why? – River name

Odd Cluster Examples The “Ste.” Cluster: Dry Oyster Whisky Hot Float Ice Why? – River name Learning the Corpus, not the Sense Keeping cluster: Bring Hoping Wiping Could Should Some Them Rest

Odd Cluster Examples The “Ste.” Cluster: Dry Oyster Whisky Hot Float Ice Why? – River name Learning the Corpus, not the Sense Keeping cluster: Bring Hoping Wiping Could Should Some Them Rest Uninformative: Wide context misses verb sense

The Question of Context Shared Intuition: Context -> Sense Area of Disagreement: What is context? Wide vs Narrow Window Word Co-occurrences

Taxonomy of Contextual Information Topical Content Word Associations Syntactic Constraints Selectional Preferences World Knowledge & Inference

Limits of Wide Context Comparison of Wide-Context Techniques (LTV ‘93) Neural Net, Context Vector, Bayesian Classifier, Simulated Annealing

Limits of Wide Context Comparison of Wide-Context Techniques (LTV ‘93) Neural Net, Context Vector, Bayesian Classifier, Simulated Annealing Results: 2 Senses - 90+%; 3+ senses ~ 70%

Limits of Wide Context Comparison of Wide-Context Techniques (LTV ‘93) Neural Net, Context Vector, Bayesian Classifier, Simulated Annealing Results: 2 Senses - 90+%; 3+ senses ~ 70% Nouns: 92%; Verbs: 69%

Limits of Wide Context Comparison of Wide-Context Techniques (LTV ‘93) Neural Net, Context Vector, Bayesian Classifier, Simulated Annealing Results: 2 Senses - 90+%; 3+ senses ~ 70% Nouns: 92%; Verbs: 69% People: Sentences ~100%; Bag of Words: ~70%

Limits of Wide Context Comparison of Wide-Context Techniques (LTV ‘93) Neural Net, Context Vector, Bayesian Classifier, Simulated Annealing Results: 2 Senses - 90+%; 3+ senses ~ 70% Nouns: 92%; Verbs: 69% People: Sentences ~100%; Bag of Words: ~70% Inadequate Context

Limits of Wide Context Comparison of Wide-Context Techniques (LTV ‘93) Neural Net, Context Vector, Bayesian Classifier, Simulated Annealing Results: 2 Senses - 90+%; 3+ senses ~ 70% Nouns: 92%; Verbs: 69% People: Sentences ~100%; Bag of Words: ~70% Inadequate Context Need Narrow Context Local Constraints Override Retain Order, Adjacency

Interactions Below the Surface Constraints Not All Created Equal “The Astronomer Married the Star”

Interactions Below the Surface Constraints Not All Created Equal “The Astronomer Married the Star” Selectional Restrictions Override Topic

Interactions Below the Surface Constraints Not All Created Equal “The Astronomer Married the Star” Selectional Restrictions Override Topic No Surface Regularities “The emigration/immigration bill guaranteed passports to all Soviet citizens

Interactions Below the Surface Constraints Not All Created Equal “The Astronomer Married the Star” Selectional Restrictions Override Topic No Surface Regularities “The emigration/immigration bill guaranteed passports to all Soviet citizens No Substitute for Understanding