Knowledge-based Methods for Word Sense Disambiguation From a tutorial at AAAI by Ted Pedersen and Rada Mihalcea [edited by J. Wiebe]

Slides:



Advertisements
Similar presentations
11 Chapter 20 Part 2 Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani.
Advertisements

Word sense disambiguation (1) Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide set was adapted.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Extracting Knowledge-Bases from Machine- Readable Dictionaries: Have We Wasted Our Time? Nancy Ide and Jean Veronis Proc KB&KB’93 Workshop, 1993, pp
Week 3b. Constituents CAS LX 522 Syntax I.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 28, 2011.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Advances in Word Sense Disambiguation Tutorial at AAAI-2005 July 9, 2005 Rada Mihalcea University of North Texas Ted Pedersen.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Albert Gatt Corpora and Statistical Methods Lecture 5.
ELN – Natural Language Processing Giuseppe Attardi
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
9/8/20151 Natural Language Processing Lecture Notes 1.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Part 3. Knowledge-based Methods for Word Sense Disambiguation.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Word Sense Disambiguation (WSD)
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
11 Chapter 20 Part 2 Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
1 Query Operations Relevance Feedback & Query Expansion.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
UNIT 7 DEIXIS AND DEFINITENESS
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.
Sight Words.
High Frequency Words.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Natural Language Processing Vasile Rus
Plan for Today’s Lecture(s)
Statistical NLP: Lecture 9
A method for WSD on Unrestricted Text
Introduction to Semantics
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

Knowledge-based Methods for Word Sense Disambiguation From a tutorial at AAAI by Ted Pedersen and Rada Mihalcea [edited by J. Wiebe]

2 Our last topic NLP at a more fine-grained level; So far, we’ve only worked with document-level classification The question of Polysemy came up in the last topic (more than one meaning of a term); word-sense disambiguation addresses the problem Includes various measures of semantic similarity, which can be used for clustering, search, paraphrase recognition, etc. Introduce you to resources you can use if you ever work with text Note: Ted Pedersen’s group created: Very useful!

3 Definitions Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. –Sense Inventory usually comes from a dictionary or thesaurus. –Knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches Word sense discrimination is the problem of dividing the usages of a word into different meanings, without regard to any particular existing sense inventory. –Unsupervised techniques

4 Computers versus Humans Polysemy – most words have many possible meanings. A computer program has no basis for knowing which one is appropriate, even if it is obvious to a human… Ambiguity is rarely a problem for humans in their day to day communication, except in extreme cases…

5 Ambiguity for Humans - Newspaper Headlines! DRUNK GETS NINE YEARS IN VIOLIN CASE FARMER BILL DIES IN HOUSE PROSTITUTES APPEAL TO POPE STOLEN PAINTING FOUND BY TREE RED TAPE HOLDS UP NEW BRIDGE RESIDENTS CAN DROP OFF TREES INCLUDE CHILDREN WHEN BAKING COOKIES MINERS REFUSE TO WORK AFTER DEATH [mixtures of part of speech, word sense, and syntactic ambiguities]

6 Ambiguity for a Computer The fisherman jumped off the bank and into the water. The bank down the street was robbed! Back in the day, we had an entire bank of computers devoted to this problem. The bank in that road is entirely too steep and is really dangerous. The plane took a bank to the left, and then headed off towards the mountains.

7 Outline Task definition –Machine Readable Dictionaries Algorithms based on Machine Readable Dictionaries Selectional Restrictions Measures of Semantic Similarity Heuristic-based Methods

8 Task Definition Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text Resources –Yes Machine Readable Dictionaries Raw corpora –No Manually annotated corpora Though combinations of these types of techniques and machine learning techniques are possible, of course

9 Machine Readable Dictionaries In recent years, most dictionaries made available in Machine Readable format (MRD) –Oxford English Dictionary –Collins –Longman Dictionary of Ordinary Contemporary English (LDOCE) Thesauruses – add synonymy information –Roget Thesaurus Semantic networks – add more semantic relations –WordNet –EuroWordNet

10 MRD – A Resource for Knowledge-based WSD For each word in the language vocabulary, an MRD provides: –A list of meanings –Definitions (for all word meanings) –Typical usage examples (for most word meanings) WordNet definitions/examples for the noun plant 1.buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles“ 2.a living organism lacking the power of locomotion 3.something planted secretly for discovery by another; "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant" 4.an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience

11 MRD – A Resource for Knowledge-based WSD A thesaurus adds: –An explicit synonymy relation between word meanings A semantic network adds: –Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, entailnment, etc. WordNet synsets for the noun “plant” 1. plant, works, industrial plant 2. plant, flora, plant life WordNet related concepts for the meaning “plant life” {plant, flora, plant life} hypernym: {organism, being} hypomym: {house plant}, {fungus}, … meronym: {plant tissue}, {plant part} holonym: {Plantae, kingdom Plantae, plant kingdom}

12 Outline Task definition –Machine Readable Dictionaries Algorithms based on Machine Readable Dictionaries Selectional Restrictions Measures of Semantic Similarity Heuristic-based Methods

13 Lesk Algorithm (Michael Lesk 1986): Identify senses of words in context using definition overlap Algorithm: 1.Retrieve from MRD all sense definitions of the words to be disambiguated 2.Determine the definition overlap for all possible sense combinations 3.Choose senses that lead to highest overlap Example: disambiguate PINE CONE PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees Pine#1  Cone#1 = 0 Pine#2  Cone#1 = 0 Pine#1  Cone#2 = 1 Pine#2  Cone#2 = 0 Pine#1  Cone#3 = 2 Pine#2  Cone#3 = 0

14 Lesk Algorithm for More than Two Words? I saw a man who is 98 years old and can still walk and tell jokes –nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) 43,929,600 sense combinations! How to find the optimal sense combination? Simulated annealing (Cowie, Guthrie, Guthrie 1992) –Define a function E = combination of word senses in a given text. –Find the combination of senses that leads to highest definition overlap (redundancy) 1. Start with E = the most frequent sense for each word 2. At each iteration, replace the sense of a random word in the set with a different sense, and measure E 3. Stop iterating when there is no change in the configuration of senses

15 Lesk Algorithm: A Simplified Version Original Lesk definition: measure overlap between sense definitions for all words in context –Identify simultaneously the correct senses for all words in context Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and current context –Identify the correct sense for one word at a time Search space significantly reduced

16 Lesk Algorithm: A Simplified Version Example: disambiguate PINE in “Pine cones hanging in a tree” PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness [Actually, would a WSD system be choosing between these?] Pine#1  Sentence = 1 Pine#2  Sentence = 0 Algorithm for simplified Lesk: 1.Retrieve from MRD all sense definitions of the word to be disambiguated 2.Determine the overlap between each sense definition and the current context 3.Choose the sense that leads to highest overlap

17 Lesk Algorithm: A Simplified Version Example: disambiguate PINE in “Pine cones hanging in a tree” PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness [Actually, would a WSD system be choosing between these?][Typically, no – they are different parts of speech. While POS taggers do make mistakes, they make fewer than WSD systems. Combined with a ML approach, one could assign the best overall interpretation, considering POS and sense.] Pine#1  Sentence = 1 Pine#2  Sentence = 0 Algorithm for simplified Lesk: 1.Retrieve from MRD all sense definitions of the word to be disambiguated 2.Determine the overlap between each sense definition and the current context 3.Choose the sense that leads to highest overlap

18 Outline Task definition –Machine Readable Dictionaries Algorithms based on Machine Readable Dictionaries Selectional Preferences Measures of Semantic Similarity Heuristic-based Methods

19 Selectional Preferences A way to constrain the possible meanings of words in a given context E.g. “Wash a dish” vs. “Cook a dish” –WASH-OBJECT vs. COOK-FOOD Capture information about possible relations between semantic classes –Common sense knowledge Alternative terminology –Selectional Restrictions –Selectional Preferences –Selectional Constraints

20 Acquiring Selectional Preferences From annotated corpora But sense annotated data are not plentiful From raw corpora –Frequency counts –Information theory measures –Class-to-class relations

21 Preliminaries: Learning Word-to-Word Relations An indication of the semantic fit between two words 1. Frequency counts –Pairs of words connected by a syntactic relations 2. Conditional probabilities –Condition on one of the words

22 From Resnik 1993 The alternative view of selectional constraints I am proposing can be phrased as follows: rather than restrictions or hard constraints on applicability, a predicate preferentially associates with certain kinds of arguments, and these preferences constitute the effect that the predicate has on what appears in an argument position. For example, the predicate blue does not restrict itself to arguments having a tangible surface — the sky is blue, and so is ocean water even deep below any apparent surface — but its arguments are still far from arbitrary. The effect of the predicate is that its arguments tend to be physical entities and to have surfaces. Similarly, the verb admire, interpreted in the particular sense “to have a high opinion of,” has an effect on what appears as its subject; these tend to be physical, animate, human, capable of the higher psychological functions, and so forth. In some cases the effect a predicate has on its argument is quite strong: one is unlikely to find the (numerical) predicate even applied to anything but positive integers. In other cases — e.g. the predicate smooth — the effect is less dramatic.

23 Bringing in Information Theory Entropy – how uncertain the outcome is (on ave) “The cook basted the which noun?” Entropy(which noun?) is low, since the word is likely to be one of a small set of words, such as “turkey” or “roast”. But the entropy is much higher in the following: “The cook enjoyed the which noun?” since a much wider range of words is likely. (The opera, the company of the butler, a certain book, a particular food, …)

24 Learning Selectional Preferences Word-to-class relations (Resnik 1993) –Quantify the contribution of a semantic class using all the concepts subsumed by that class –where

25 Learning Selectional Preferences Determine the contribution of a word sense based on the assumption of equal sense distributions: –e.g. “plant” has two senses  50% occurrences are sense 1, 50% are sense 2 –That is, when you count co-occurrences in a corpus, count a word with 3 senses as 1/3, and a word with 5 senses as 1/5 Example: learning restrictions for the verb “to drink” –Find high-scoring verb-object pairs –Find “prototypical” object classes (high association score) These are synsets in WN; i.e., lists of words but also a sense. They are hypernymes of the words above. Lookup in wordnet in class.

26 Learning Selectional Preferences (3) Other algorithms –Learn class-to-class relations (Agirre and Martinez, 2002) E.g.: “ingest food” is a class-to-class relation for “eat chicken” –Bayesian networks (Ciaramita and Johnson, 2000) –Tree cut model (Li and Abe, 1998)

27 Using Selectional Preferences for WSD Algorithm: 1.Learn a large set of selectional preferences for a given syntactic relation R 2. Given a pair of words W 1 – W 2 connected by a relation R 3. Find all selectional preferences W 1 – C (word-to-class) or C 1 – C 2 (class-to-class) that apply 4. Select the meanings of W 1 and W 2 based on the selected semantic class Example: disambiguate coffee in “drink coffee” 1. (beverage) a beverage consisting of an infusion of ground coffee beans 2. (tree) any of several small trees native to the tropical Old World 3. (color) a medium to dark brown color Given the selectional preference “DRINK BEVERAGE” : coffee#1