600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
Introduction to Information Retrieval
Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
Intro to NLP - J. Eisner1 Grouping Words.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.
Text Classification, Active/Interactive learning.
1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)
Ontology-Based Information Extraction: Current Approaches.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Word Sense Disambiguation Kyung-Hee Sung Foundations of Statistical NLP Chapter 7.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
1 Bootstrapping without the Boot Jason Eisner Damianos Karakos HLT-EMNLP, October 2005.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.
For Friday Finish chapter 24 No written homework.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Analysis of Bootstrapping Algorithms Seminar of Machine Learning for Text Mining UPC, 18/11/2004 Mihai Surdeanu.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Lecture 15: Text Classification & Naive Bayes
Machine Learning Basics
Lecture 21 Computational Lexical Semantics
Statistical NLP: Lecture 9
Machine Learning in Practice Lecture 11
Introduction Task: extracting relational facts from text
Lexical Ambiguity Resolution / Sense Disambiguation
Grouping Words Intro to NLP - J. Eisner.
Word embeddings (continued)
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Intro to NLP - J. Eisner9 Representing Word as Vector  Could average over many occurrences of the word...  Each word type has a different vector?  Each word token has a different vector?  Each word sense has a different vector? (for this one, we need sense-tagged training data) (is this more like a type vector or a token vector?)  What is each of these good for?

Intro to NLP - J. Eisner10 Each word type has a different vector  We saw this yesterday  It’s good for grouping words  similar semantics?  similar syntax?  depends on how you build the vector

Intro to NLP - J. Eisner11 Each word token has a different vector  Good for splitting words - unsupervised WSD  Cluster the tokens: each cluster is a sense!  have turned it into the hot dinner-party topic. The comedy is the  selection for the World Cup party, which will be announced on May 1  the by-pass there will be a street party. "Then," he says, "we are going  in the 1983 general election for a party which, when it could not bear to  to attack the Scottish National Party, who look set to seize Perth and  number-crunchers within the Labour party, there now seems little doubt  that had been passed to a second party who made a financial decision  A future obliges each party to the contract to fulfil it by

Intro to NLP - J. Eisner12 Each word sense has a different vector  Represent each new word token as vector, too  Now assign each token the closest sense  (could lump together all tokens of the word in the same document: assume they all have same sense)  have turned it into the hot dinner-party topic. The comedy is the  selection for the World Cup party, which will be announced on May 1  the by-pass there will be a street party. "Then," he says, "we are going  let you know that there’s a party at my house tonight. Directions: Drive  in the 1983 general election for a party which, when it could not bear to  to attack the Scottish National Party, who look set to seize Perth and  number-crunchers within the Labour party, there now seems little doubt ? ?

Intro to NLP - J. Eisner13 Where can we get sense- labeled training data?  To do supervised WSD, need many examples of each sense in context  have turned it into the hot dinner-party topic. The comedy is the  selection for the World Cup party, which will be announced on May 1  the by-pass there will be a street party. "Then," he says, "we are going  let you know that there’s a party at my house tonight. Directions: Drive  in the 1983 general election for a party which, when it could not bear to  to attack the Scottish National Party, who look set to seize Perth and  number-crunchers within the Labour party, there now seems little doubt ? ?

Intro to NLP - J. Eisner14 Where can we get sense- labeled training data?  Sources of sense-labeled training text:  Human-annotated text - expensive  Bilingual text (Rosetta stone) – can figure out which sense of “plant” is meant by how it translates  Dictionary definition of a sense is one sample context  Roget’s thesaurus entry of a sense is one sample context hardly any data per sense – but we’ll use it later to get unsupervised training started  To do supervised WSD, need many examples of each sense in context

Intro to NLP - J. Eisner15 A problem with the vector model  Bad idea to treat all context positions equally:  Possible solutions:  Faraway words don’t count as strongly?  Words in different positions relative to plant are different elements of the vector?  (i.e., (pesticide, -1) and (pesticide,+1) are different features)  Words in different syntactic relationships to plant are different elements of the vector?

Intro to NLP - J. Eisner16 Just one cue is sometimes enough... slide courtesy of D. Yarowsky (modified)

Intro to NLP - J. Eisner17 An assortment of possible cues... slide courtesy of D. Yarowsky (modified) generates a whole bunch of potential cues – use data to find out which ones work best

Intro to NLP - J. Eisner18 An assortment of possible cues... slide courtesy of D. Yarowsky (modified) merged ranking of all cues of all these types only a weak cue... but we’ll trust it if there’s nothing better

Intro to NLP - J. Eisner19 Final decision list for lead (abbreviated) slide courtesy of D. Yarowsky (modified) To disambiguate a token of lead :  Scan down the sorted list  The first cue that is found gets to make the decision all by itself  Not as subtle as combining cues, but works well for WSD Cue’s score is its log-likelihood ratio: log [ p(cue | sense A) [smoothed] / p(cue | sense B) ]

slide courtesy of D. Yarowsky (modified) very readable paper at sketched on the following slides... unsupervised learning!

Intro to NLP - J. Eisner21 First, find a pair of “seed words” that correlate well with the 2 senses unsupervised learning!  If “plant” really has 2 senses, it should appear in  2 dictionary entries: Pick content words from those  2 thesaurus entries: Pick synonyms from those  2 different clusters of documents: Pick representative words from those  2 translations in parallel text: Use the translations as seed words  Or just have a human name the seed words (maybe from a list of words that occur unusually often near “plant”) make that “minimally supervised”

Intro to NLP - J. Eisner 22 target word: plant table taken from Yarowsky (1995) Yarowsky’s bootstrapping algorithm (life, manufacturing) life (1%) manufacturing (1%) 98% unsupervised learning!

Intro to NLP - J. Eisner 23 figure taken from Yarowsky (1995) Yarowsky’s bootstrapping algorithm (life, manufacturing) Learn a classifier that distinguishes A from B. It will notice features like “animal”  A, “automate”  B. unsupervised learning!

Intro to NLP - J. Eisner 24 no surprise what the top cues are but other cues also good for discriminating these seed examples table taken from Yarowsky (1995) Yarowsky’s bootstrapping algorithm (life, manufacturing) Learn a classifier that distinguishes A from B. It will notice features like “animal”  A, “automate”  B. unsupervised learning!

Intro to NLP - J. Eisner 25 slide courtesy of D. Yarowsky (modified) the strongest of the new cues help us classify more examples... from which we can extract and rank even more cues that discriminate them... unsupervised learning! (life, manufacturing)

Intro to NLP - J. Eisner 26 figure taken from Yarowsky (1995) (life, manufacturing) Should be a good classifier, unless we accidentally learned some bad cues along the way that polluted the original sense distinction. unsupervised learning!

life and manufacturing are no longer even in the top cues! many unexpected cues were extracted, without supervised training slide courtesy of D. Yarowsky (modified) Now use the final decision list to classify test examples: top ranked cue appearing in this test example

Intro to NLP - J. Eisner28 “One sense per discourse”  A final trick:  All tokens of plant in the same document probably have the same sense. slide courtesy of D. Yarowsky (modified) unsupervised learning! 3 tokens in same document gang up on the 4th

Intro to NLP - J. Eisner29 “One sense per discourse”  A final trick:  All tokens of plant in the same document probably have the same sense. slide courtesy of D. Yarowsky (modified) unsupervised learning!

Intro to NLP - J. Eisner30 these stats come from term papers of known authorship (i.e., supervised training) A Note on Combining Cues slide courtesy of D. Yarowsky (modified)

Intro to NLP - J. Eisner31 A Note on Combining Cues slide courtesy of D. Yarowsky (modified) “Naive Bayes” model for classifying text (Note the naive independence assumptions!) We’ll look at it again in a later lecture Would this kind of sentence be more typical of a student A paper or a student B paper?

Intro to NLP - J. Eisner32 “Naive Bayes” model for classifying text Used here for word sense disambiguation A Note on Combining Cues slide courtesy of D. Yarowsky (modified) plantA plantB plantA Would this kind of sentence be more typical of a plant A context or a plant B context?