CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
5/16/2015CPSC503 Winter CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini.
 Aim to get back on Tuesday  I grade on a curve ◦ One for graduate students ◦ One for undergraduate students  Comments?
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Lexical Semantics & Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 16, 2011.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.
Semantic similarity, vector space models and word- sense disambiguation Corpora and Statistical Methods Lecture 6.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
1 Empirical Learning Methods in Natural Language Processing Ido Dagan Bar Ilan University, Israel.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/
Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.
Natural Language Processing Lecture 22—11/14/2013 Jim Martin.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
Corpus-Based Approaches to Word Sense Disambiguation
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.
Natural Language Processing word sense disambiguation Updated 1/12/2005.
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.
Text Classification, Active/Interactive learning.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Lexical Semantics & Word Sense Disambiguation CMSC Natural Language Processing May 15, 2003.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
Word Sense Disambiguation Kyung-Hee Sung Foundations of Statistical NLP Chapter 7.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Classification Techniques: Bayesian Classification
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Data Mining and Decision Support
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
KNN & Naïve Bayes Hongning Wang
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
CSC 594 Topics in AI – Natural Language Processing
Lecture 21 Computational Lexical Semantics
Word Sense Disambiguation
Statistical NLP: Lecture 9
Revision (Part II) Ke Chen
Supervised vs. unsupervised Learning
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
CPSC 503 Computational Linguistics
Presentation transcript:

CS 4705 Lecture 19 Word Sense Disambiguation

Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based techniques

Disambiguation via Selectional Restrictions Eliminates ambiguity by eliminating ill-formed semantic representations much as syntactic parsing eliminates ill-formed syntactic analyses –Different verbs select for different thematic roles wash the dishes (takes washable-thing as patient) serve delicious dishes (takes food-type as patient) Method: rule-to-rule syntactico-semantic analysis –Semantic attachment rules are applied as sentences are syntactically parsed –Selectional restriction violation: no parse

Requires: –Selectional restrictions for each sense of each predicate –Hierarchical type information about each argument (a la WordNet) Limitations: –Sometimes not sufficiently constraining to disambiguate (Which dishes do you like?) –Violations that are intentional (Eat dirt, worm!) –Metaphor and metonymy

Selectional Restrictions as Preferences Resnik ‘97, ‘98’s selectional association: –Probabilistic measure of strength of association between predicate and class dominating argument –Derive predicate/argument relations from tagged corpus –Derive hyponymy relations from WordNet –Selects sense with highest selectional association between an ancestor and predicate (44% correct) Brian ate the dish. WN: dish is a kind of crockery and a kind of food tagged corpus counts: ate/ vs. ate/

Machine Learning Approaches Learn a classifier to assign one of possible word senses for each word –Acquire knowledge from labeled or unlabeled corpus –Human intervention only in labeling corpus and selecting set of features to use in training Input: feature vectors –Target (dependent variable) –Context (set of independent variables) Output: classification rules for unseen text

Input Features for WDS POS tags of target and neighbors Surrounding context words (stemmed or not) Partial parsing to identify thematic/grammatical roles and relations Collocational information: –How likely are target and left/right neighbor to co- occur Is the bass fresh today? [w-2, w-2/pos, w-1,w-/pos,w+1,w+1/pos,w+2,w+2/pos… [is,V,the,DET,fresh,RB,today,N...

Co-occurrence of neighboring words –How often does sea or words with root sea (e.g. seashore, seafood, seafaring) occur in a window of size N –How choose? M most frequent content words occurring within window of M in training data

Supervised Learning Training and test sets with words labeled as to correct sense (It was the biggest [fish: bass] I’ve seen.) –Obtain independent vars automatically (POS, co- occurrence information, etc.) –Run classifier on training data –Test on test data –Result: Classifier for use on unlabeled data

Types of Classifiers Naïve Bayes –  = P(s|V), or –Where s is one of the senses possible and V the input vector of features –Assume features independent, so probability of V is the product of probabilities of each feature, given s, so – and P(V) same for any s –If P(s) is the prior

Decision lists: –like case statements applying tests to input in turn fish within window--> bass 1 striped bass--> bass 1 guitar within window--> bass 2 bass player--> bass 1 … –Yarowsky ‘96’s approach orders tests by individual accuracy on entire training set based on log-likehood ratio

Bootstrapping I –Start with a few labeled instances of target item as seeds to train initial classifier, C –Use high confidence classifications of C on unlabeled data as training data –Iterate Bootstrapping II –Start with sentences containing words strongly associated with each sense (e.g. sea and music for bass), either intuitively or from corpus or from dictionary entries –One Sense per Discourse hypothesis

Unsupervised Learning Cluster automatically derived feature vectors to ‘discover’ word senses using some similarity metric –Represent each cluster as average of feature vectors it contains –Label clusters by hand with known senses –Classify unseen instances by proximity to these known and labeled clusters Evaluation problem –What are the ‘right’ senses?

–Cluster impurity –How do you know how many clusters to create? –Some clusters may not map to ‘known’ senses

Dictionary Approaches Problem of scale for all ML approaches –Build a classifier for each sense ambiguity Machine readable dictionaries (Lesk ‘86) –Retrieve all definitions of content words in context of target –Compare for overlap with sense definitions of target –Choose sense with most overlap Limitations –Entries are short --> expand entries to ‘related’ words using subject codes