CS 47051 Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation.

Advertisements

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.

Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.

Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -

 Aim to get back on Tuesday  I grade on a curve ◦ One for graduate students ◦ One for undergraduate students  Comments?

What is Statistical Modeling

Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.

CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Word Sense Disambiguation CMSC 723: Computational Linguistics I ― Session #11 Jimmy Lin The iSchool University of Maryland Wednesday, November 11, 2009.

CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

LSA 311 Computational Lexical Semantics Dan Jurafsky Stanford University Lecture 2: Word Sense Disambiguation.

Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/

Bootstrapping Goals: –Utilize a minimal amount of (initial) supervision –Obtain learning from many unlabeled examples (vs. selective sampling) General.

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

Semi-Supervised Learning

Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.

Bayesian Decision Theory Making Decisions Under uncertainty 1.

Lexical Semantics CSCI-GA.2590 – Lecture 7A

A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.

Bayesian Networks. Male brain wiring Female brain wiring.

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Natural Language Processing word sense disambiguation Updated 1/12/2005.

Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.

Text Classification, Active/Interactive learning.

1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Word Sense Disambiguation Kyung-Hee Sung Foundations of Statistical NLP Chapter 7.

Classification Techniques: Bayesian Classification

Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.

Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Analysis of Bootstrapping Algorithms Seminar of Machine Learning for Text Mining UPC, 18/11/2004 Mihai Surdeanu.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Lecture 21 Computational Lexical Semantics

Statistical NLP: Lecture 9

Unsupervised Word Sense Disambiguation Using Lesk algorithm

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

CS Word Sense Disambiguation

2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’? Flies [V] vs. Flies [N] He robbed the bank. He sat on the bank. How do we determine the correct ‘sense’ of the word? Machine Learning –Supervised methods Evaluation –Lightly supervised and Unsupervised Bootstrapping Dictionary-based techniques Selection restrictions Clustering

3 Supervised WSD Approaches: –Tag a corpus with correct senses of particular words (lexical sample) or all words (all-words task) E.g. SENSEVAL corpora –Lexical sample: Extract features which might predict word sense –POS? Word identity? Punctuation after? Previous word? Its POS? Use Machine Learning algorithm to produce a classifier which can predict the senses of one word or many –All-words Use semantic concordance: each open class word labeled with sense from dictionary or thesaurus

4 –E.g. SemCor (Brown Corpus), tagged with WordNet senses

5 What Features Are Useful? “Words are known by the company they keep” –How much ‘company’ do we need to look at? –What do we need to know about the ‘friends’? POS, lemmas/stems/syntactic categories,… Collocations: words that frequently appear with the target, identified from large corpora federal government, honor code, baked potato –Position is key Bag-of-words: words that appear somewhere in a context window I want to play a musical instrument so I chose the bass. –Ordering/proximity not critical

6 Punctuation, capitalization, formatting

7 Rule Induction Learners and WSD Given a feature vector of values for independent variables associated with observations of values for the training set Top-down greedy search driven by information gain: how will entropy of (remaining) data be reduced if we split on this feature? Produce a set of rules that perform best on the training data, e.g. –bank 2 if w-1==‘river’ & pos==NP & src==‘Fishing News’… –… Easy to understand result but many passes to achieve each decision, susceptible to over-fitting

8 Naïve Bayes ŝ = p(s|V), or Where s is one of the senses S possible for a word w and V the input vector of feature values for w Assume features independent, so probability of V is the product of probabilities of each feature, given s, so p(V) same for any ŝ Then

9 How do we estimate p(s) and p(v j |s)? –p(s i ) is max. likelihood estimate from a sense-tagged corpus (count(s i,w j )/count(w j )) – how likely is bank to mean ‘financial institution’ over all instances of bank? –P(v j |s) is max. likelihood of each feature given a candidate sense (count(v j,s)/count(s)) – how likely is the previous word to be ‘river’ when the sense of bank is ‘financial institution’ Calculate for each possible sense and take the highest scoring sense as the most likely choice

10 Transparent Like case statements applying tests to input in turn fish within window--> bass 1 striped bass--> bass 1 guitar within window--> bass 2 bass player--> bass 1 –Yarowsky ‘96’s approach orders tests by individual accuracy on entire training set based on log-likelihood ratio Decision List Classifiers

11 Bootstrapping I –Start with a few labeled instances of target item as seeds to train initial classifier, C –Use high confidence classifications of C on unlabeled data as training data –Iterate Bootstrapping II –Start with sentences containing words strongly associated with each sense (e.g. sea and music for bass), either intuitively or from corpus or from dictionary entries, and label those automatically –One Sense per Discourse hypothesis Bootstrapping to Get More Labeled Data

12 Evaluating WSD In vivo/end-to-end/task-based/extrinsic vs. in vitro/stand- alone/intrinsic: evaluation in some task (parsing? q/a? IVR system?) vs. application independent –In vitro metrics: classification accuracy on held-out test set or precision/recall/f-measure if not all instances must be labeled Baseline: –Most frequent sense? –Lesk algorithms Ceiling: human annotator agreement

13 Dictionary Approaches Problem of scale for all ML approaches –Building a classifier for each word with multiple senses Machine-Readable dictionaries with senses identified and examples –Simplified Lesk: Retrieve all content words occurring in context of target (e.g. Sailors love to fish for bass.) –Compute overlap with sense definitions of target entry »bass 1 : a musical instrument… »bass 2 : a type of fish that lives in the sea…

14 –Choose sense with most content-word overlap –Original Lesk: Compare dictionary entries of all content-words in context with entries for each sense Limits: –Dictionary entries are short; performance best with longer entries, so…. Expand with entries of ‘related’ words that appear in the entry If tagged corpus available, collect all the words appearing in context of each sense of target word (e.g. all words appearing in sentences with bass 1 ) to signature for bass 1 –Weight each by frequency of occurrence in all ‘documents’ (e.g. all senses of bass) to capture how discriminating a word is for the target word’s senses –Corpus Lesk performs best of all Lesk approaches

15 Summary Many useful approaches developed to do WSD –Supervised and unsupervised ML techniques –Novel uses of existing resources (WN, dictionaries) Future –More tagged training corpora becoming available –New learning techniques being tested, e.g. co-training Next class: –Ch 18:6-9