1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations.

Slides:



Advertisements
Similar presentations
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Advertisements

Lexical Semantics and Word Senses Hongning Wang
SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill MacCartney.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Lexical Semantics Chapter 19
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Chapter 17. Lexical Semantics From: Chapter 17 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 28, 2011.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.
CS 4705 Relations Between Words. Today Word Clustering Words and Meaning Lexical Relations WordNet Clustering for word sense discovery.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Discussion of assigned readings Lecture 13
1 Wordnet and word similarity Lectures 11 and 12.
I256 Applied Natural Language Processing Fall 2009 Lecture 8 Words – Lexical acquisition – Collocations – Similarity – Selectional preferences Barbara.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
1 LING 62n Winter 2008 LINGUIST 62n Language and Food Background Lecture: Word Meaning Jan 17, 2008 Dan Jurafsky.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Meaning and Language Part 1.
Word Meaning and Similarity
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic.
Instructor: Nick Cercone CSEB -
SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Computational Lexical Semantics Lecture 8: Selectional Restrictions Linguistic Institute 2005 University of Chicago.
Computing with a Thesaurus Word Senses and Word Relations.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Query Operations Relevance Feedback & Query Expansion.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Lexical Semantics Chapter 16
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Lecture 22 Word Similarity Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarityReadings: NLTK book Chapter.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Wordnet - A lexical database for the English Language.
1 LING 62n Autumn 2008 LINGUIST 62n Language and Food Background Lecture: Word Meaning Oct 2, 2008 Dan Jurafsky.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Using Semantic Relatedness for Word Sense Disambiguation
Word Meaning and Similarity
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Lecture 24 Distributional Word Similarity II Topics Distributional based word similarity example PMI context = syntactic dependenciesReadings: NLTK book.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
NLP.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Word Meaning and Similarity Word Senses and Word Relations.
CSE Introduction to Computational Linguistics Tuesdays, Thursdays 14:30-16:00 – South Ross 101 Fall Semester, 2011 Instructor: Nick Cercone
Slang. Informal verbal communication that is generally unacceptable for formal writing.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Lexical Semantics and Word Senses Hongning Wang
Lecture 2: WordNet, word similarity, and sense relations
Word Meaning and Similarity
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Vector-Space (Distributional) Lexical Semantics
CSC 594 Topics in AI – Applied Natural Language Processing
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Lecture 22 Word Similarity
Word embeddings (continued)
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Relations Between Words
Information Retrieval
Statistical NLP: Lecture 10
Presentation transcript:

1 CS 224U Autumn 2007 CS 224U LINGUIST 288/188 Natural Language Understanding Jurafsky and Manning Lecture 2: WordNet, word similarity, and sense relations Sep 27, 2007 Dan Jurafsky

2 CS 224U Autumn 2007 Outline: Mainly useful background for today’s papers 1)Lexical Semantics, word-word-relations 2)WordNet 3)Word Similarity: Thesaurus-based Measures 4)Word Similarity: Distributional Measures 5)Background: Dependency Parsing

3 CS 224U Autumn 2007 Three Perspectives on Meaning 1.Lexical Semantics The meanings of individual words 2.Formal Semantics (or Compositional Semantics or Sentential Semantics) How those meanings combine to make meanings for individual sentences or utterances 3.Discourse or Pragmatics How those meanings combine with each other and with other facts about various kinds of context to make meanings for a text or discourse – Dialog or Conversation is often lumped together with Discourse

4 CS 224U Autumn 2007 Relationships between word meanings Homonymy Polysemy Synonymy Antonymy Hypernomy Hyponomy Meronomy

5 CS 224U Autumn 2007 Homonymy Homonymy: Lexemes that share a form –Phonological, orthographic or both But have unrelated, distinct meanings Clear example: – Bat (wooden stick-like thing) vs – Bat (flying scary mammal thing) –Or bank (financial institution) versus bank (riverside) Can be homophones, homographs, or both: –Homophones:  Write and right  Piece and peace

6 CS 224U Autumn 2007 Homonymy causes problems for NLP applications Text-to-Speech Same orthographic form but different phonological form –bass vs bass Information retrieval Different meanings same orthographic form –QUERY: bat care Machine Translation Speech recognition Why?

7 CS 224U Autumn 2007 Polysemy The bank is constructed from red brick I withdrew the money from the bank Are those the same sense? Or consider the following WSJ example While some banks furnish sperm only to married women, others are less restrictive Which sense of bank is this? –Is it distinct from (homonymous with) the river bank sense? –How about the savings bank sense?

8 CS 224U Autumn 2007 Polysemy A single lexeme with multiple related meanings (bank the building, bank the financial institution) Most non-rare words have multiple meanings The number of meanings is related to its frequency Verbs tend more to polysemy Distinguishing polysemy from homonymy isn’t always easy (or necessary)

9 CS 224U Autumn 2007 Metaphor and Metonymy Specific types of polysemy Metaphor: Germany will pull Slovenia out of its economic slump. I spent 2 hours on that homework. Metonymy The White House announced yesterday. This chapter talks about part-of-speech tagging Bank (building) and bank (financial institution)

10 CS 224U Autumn 2007 Synonyms Word that have the same meaning in some or all contexts. filbert / hazelnut couch / sofa big / large automobile / car vomit / throw up Water / H 2 0 Two lexemes are synonyms if they can be successfully substituted for each other in all situations If so they have the same propositional meaning

11 CS 224U Autumn 2007 Synonyms But there are few (or no) examples of perfect synonymy. Why should that be? Even if many aspects of meaning are identical Still may not preserve the acceptability based on notions of politeness, slang, register, genre, etc. Example: Water and H 2 0

12 CS 224U Autumn 2007 Some terminology Lemmas and wordforms A lexeme is an abstract pairing of meaning and form A lemma or citation form is the grammatical form that is used to represent a lexeme. –Carpet is the lemma for carpets –Dormir is the lemma for duermes. Specific surface forms carpets, sung, duermes are called wordforms The lemma bank has two senses: Instead, a bank can hold the investments in a custodial account in the client’s name But as agriculture burgeons on the east bank, the river will shrink even more. A sense is a discrete representation of one aspect of the meaning of a word

13 CS 224U Autumn 2007 Synonymy is a relation between senses rather than words Consider the words big and large Are they synonyms? How big is that plane? Would I be flying on a large or small plane? How about here: Miss Nelson, for instance, became a kind of big sister to Benjamin. ?Miss Nelson, for instance, became a kind of large sister to Benjamin. Why? big has a sense that means being older, or grown up large lacks this sense

14 CS 224U Autumn 2007 Antonyms Senses that are opposites with respect to one feature of their meaning Otherwise, they are very similar! dark / light short / long hot / cold up / down in / out More formally: antonyms can define a binary opposition or at opposite ends of a scale (long/short, fast/slow) Be reversives: rise/fall, up/down

15 CS 224U Autumn 2007 Hyponymy One sense is a hyponym of another if the first sense is more specific, denoting a subclass of the other car is a hyponym of vehicle dog is a hyponym of animal mango is a hyponym of fruit Conversely vehicle is a hypernym/superordinate of car animal is a hypernym of dog fruit is a hypernym of mango superordinatevehiclefruitfurnituremammal hyponymcarmangochairdog

16 CS 224U Autumn 2007 Hypernymy more formally Extensional: The class denoted by the superordinate extensionally includes the class denoted by the hyponym Entailment: A sense A is a hyponym of sense B if being an A entails being a B Hyponymy is usually transitive (A hypo B and B hypo C entails A hypo C)

17 CS 224U Autumn 2007 II. WordNet A hierarchically organized lexical database On-line thesaurus + aspects of a dictionary –Versions for other languages are under development Category Unique Forms Noun117,097 Verb11,488 Adjective22,141 Adverb4,601

18 CS 224U Autumn 2007 WordNet Where it is:

19 CS 224U Autumn 2007 Format of Wordnet Entries

20 CS 224U Autumn 2007 WordNet Noun Relations

21 CS 224U Autumn 2007 WordNet Verb Relations

22 CS 224U Autumn 2007 WordNet Hierarchies

23 CS 224U Autumn 2007 How is “sense” defined in WordNet? The set of near-synonyms for a WordNet sense is called a synset (synonym set); it’s their version of a sense or a concept Example: chump as a noun to mean ‘a person who is gullible and easy to take advantage of’ Each of these senses share this same gloss Thus for WordNet, the meaning of this sense of chump is this list.

24 CS 224U Autumn 2007 Word Similarity Synonymy is a binary relation Two words are either synonymous or not We want a looser metric Word similarity or Word distance Two words are more similar If they share more features of meaning Actually these are really relations between senses: Instead of saying “bank is like fund” We say –Bank1 is similar to fund3 –Bank2 is similar to slope5 We’ll compute them over both words and senses

25 CS 224U Autumn 2007 Why word similarity Spell Checking Information retrieval Question answering Machine translation Natural language generation Language modeling Automatic essay grading

26 CS 224U Autumn 2007 Two classes of algorithms Thesaurus-based algorithms Based on whether words are “nearby” in Wordnet Distributional algorithms By comparing words based on their distributional context in corpora

27 CS 224U Autumn 2007 Thesaurus-based word similarity We could use anything in the thesaurus Meronymy, hyponymy, troponymy Glosses and example sentences Derivational relations and sentence frames In practice By “thesaurus-based” we often mean these 2 cues: –the is-a/subsumption/hypernym hierarchy –Sometimes using the glosses too Word similarity versus word relatedness Similar words are near-synonyms Related could be related any way –Car, gasoline: related, not similar –Car, bicycle: similar

28 CS 224U Autumn 2007 Path based similarity Two words are similar if nearby in thesaurus hierarchy (i.e. short path between them)

29 CS 224U Autumn 2007 Refinements to path-based similarity pathlen(c1,c2) = number of edges in the shortest path in the thesaurus graph between the sense nodes c1 and c2 simpath(c1,c2) = -log pathlen(c1,c2) wordsim(w1,w2) = max c1senses(w1),c2senses(w2) sim(c1,c2)

30 CS 224U Autumn 2007 Problem with basic path-based similarity Assumes each link represents a uniform distance Nickel to money seem closer than nickel to standard Instead: Want a metric which lets us Represent the cost of each edge independently

31 CS 224U Autumn 2007 Information content similarity metrics Let’s define P(C) as: The probability that a randomly selected word in a corpus is an instance of concept c Formally: there is a distinct random variable, ranging over words, associated with each concept in the hierarchy P(root)=1 The lower a node in the hierarchy, the lower its probability

32 CS 224U Autumn 2007 Information content similarity Train by counting in a corpus 1 instance of “dime” could count toward frequency of coin, currency, standard, etc More formally:

33 CS 224U Autumn 2007 Information content similarity WordNet hierarchy augmented with probabilities P(C)

34 CS 224U Autumn 2007 Information content: definitions Information content: IC(c)=-logP(c) Lowest common subsumer LCS(c1,c2) = the lowest common subsumer –I.e. the lowest node in the hierarchy –That subsumes (is a hypernym of) both c1 and c2 We are now ready to see how to use information content IC as a similarity metric

35 CS 224U Autumn 2007 Resnik method The similarity between two words is related to their common information The more two words have in common, the more similar they are Resnik: measure the common information as: The info content of the lowest common subsumer of the two nodes sim resnik (c1,c2) = -log P(LCS(c1,c2))

36 CS 224U Autumn 2007 Dekang Lin method Similarity between A and B needs to do more than measure common information The more differences between A and B, the less similar they are: Commonality: the more info A and B have in common, the more similar they are Difference: the more differences between the info in A and B, the less similar Commonality: IC(Common(A,B)) Difference: IC(description(A,B)-IC(common(A,B))

37 CS 224U Autumn 2007 Dekang Lin method Similarity theorem: The similarity between A and B is measured by the ratio between the amount of information needed to state the commonality of A and B and the information needed to fully describe what A and B are sim Lin (A,B)= log P(common(A,B)) _______________ log P(description(A,B)) Lin furthermore shows (modifying Resnik) that info in common is twice the info content of the LCS

38 CS 224U Autumn 2007 Lin similarity function SimLin(c1,c2) = 2 x log P (LCS(c1,c2)) ________________ log P(c1) + log P(c2) SimLin(hill,coast) = 2 x log P (geological-formation)) ________________ log P(hill) + log P(coast) =.59

39 CS 224U Autumn 2007 Extended Two concepts are similar if their glosses contain similar words Drawing paper: paper that is specially prepared for use in drafting Decal: the art of transferring designs from specially prepared paper to a wood or glass or metal surface For each n-word phrase that occurs in both glosses Add a score of n 2 Paper and specially prepared for = 5…

40 CS 224U Autumn 2007 Summary: thesaurus-based similarity

41 CS 224U Autumn 2007 Evaluating thesaurus-based similarity Intrinsic Evaluation: Correlation coefficient between –algorithm scores –word similarity ratings from humans Extrinsic (task-based, end-to-end) Evaluation: Embed in some end application –Malapropism (spelling error) detection –WSD –Essay grading –Language modeling in some application

42 CS 224U Autumn 2007 Problems with thesaurus-based methods We don’t have a thesaurus for every language Even if we do, many words are missing They rely on hyponym info: Strong for nouns, but lacking for adjectives and even verbs Alternative Distributional methods for word similarity

43 CS 224U Autumn 2007 Distributional methods for word similarity Firth (1957): “You shall know a word by the company it keeps!” Nida example noted by Lin: A bottle of tezgüino is on the table Everybody likes tezgüino Tezgüino makes you drunk We make tezgüino out of corn. Intuition: just from these contexts a human could guess meaning of tezguino So we should look at the surrounding contexts, see what other words have similar context.

44 CS 224U Autumn 2007 Context vector Consider a target word w Suppose we had one binary feature f i for each of the N words in the lexicon v i Which means “word v i occurs in the neighborhood of w” w=(f1,f2,f3,…,fN) If w=tezguino, v1 = bottle, v2 = drunk, v3 = matrix: w = (1,1,0,…)

45 CS 224U Autumn 2007 Intuition Define two words by these sparse features vectors Apply a vector distance metric Say that two words are similar if two vectors are similar

46 CS 224U Autumn 2007 Distributional similarity So we just need to specify 3 things 1.How the co-occurrence terms are defined 2.How terms are weighted –(frequency? Logs? Mutual information?) 3.What vector distance metric should we use? –Cosine? Euclidean distance?

47 CS 224U Autumn 2007 Defining co-occurrence vectors We could have windows of neighboring words Bag-of-words We generally remove stopwords But the vectors are still very sparse So instead of using ALL the words in the neighborhood Let’s just the words occurring in particular relations

48 CS 224U Autumn 2007 Defining co-occurrence vectors Zellig Harris (1968) The meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entitites relative to other entities Idea: parse the sentence, extract syntactic dependencies:

49 CS 224U Autumn 2007 Quick background: Dependency Parsing Among the earliest kinds of parsers in NLP Drew linguistic insights from the work of L. Tesniere (1959) David Hays, one of the founders of computational linguistics, built early (first?) dependency parser (Hays 1962) The idea dates back to the ancient Greek and Indian grammarians of “parsing” into subject and predicate A sentence is parsed by relating each word to other words in the sentence which depend on it.

50 CS 224U Autumn 2007 A sample dependency parse

51 CS 224U Autumn 2007 Dependency parsers MINIPAR is Lin’s parser Another one is the Link Grammar parser: Standard “CFG” parsers like the Stanford parser parser.shtml can also produce dependency representations, as follows

52 CS 224U Autumn 2007 The relationship between a CFG parse and a dependency parse (1)

53 CS 224U Autumn 2007 The relationship between a CFG parse and a dependency parse (2)

54 CS 224U Autumn 2007 Conversion from CFG to dependency parse CFG’s include “head rules” The head of a Noun Phrase is a noun The head of a Verb Phrase is a verb. Etc. The head rules can be used to extract a dependency parse from a CFG parse (follow the heads).

55 CS 224U Autumn 2007 Popping back: Co-occurrence vectors based on dependencies For the word “cell”: vector of NxR features R is the number of dependency relations

56 CS 224U Autumn Weighting the counts (“Measures of association with context”) We have been using the frequency of some feature as its weight or value But we could use any function of this frequency Let’s consider one feature f=(r,w’) = (obj-of,attack) P(f|w)=count(f,w)/count(w) Assoc prob (w,f)=p(f|w)

57 CS 224U Autumn 2007 Intuition: why not frequency “drink it” is more common than “drink wine” But “wine” is a better “drinkable” thing than “it” Idea: We need to control for change (expected frequency) We do this by normalizing by the expected frequency we would get assuming independence

58 CS 224U Autumn 2007 Weighting: Mutual Information Mutual information: between 2 random variables X and Y Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent:

59 CS 224U Autumn 2007 Weighting: Mutual Information Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent: PMI between a target word w and a feature f :

60 CS 224U Autumn 2007 Mutual information intuition Objects of the verb drink

61 CS 224U Autumn 2007 Lin is a variant on PMI Pointwise mutual information: measure of how often two events x and y occur, compared with what we would expect if they were independent: PMI between a target word w and a feature f : Lin measure: breaks down expected value for P(f) differently:

62 CS 224U Autumn 2007 Summary: weightings See Manning and Schuetze (1999) for more

63 CS 224U Autumn Defining similarity between vectors

64 CS 224U Autumn 2007 Summary of similarity measures

65 CS 224U Autumn 2007 Evaluating similarity Intrinsic Evaluation: Correlation coefficient between algorithm scores –And word similarity ratings from humans Extrinsic (task-based, end-to-end) Evaluation: –Malapropism (spelling error) detection –WSD –Essay grading –Taking TOEFL multiple-choice vocabulary tests –Language modeling in some application

66 CS 224U Autumn 2007 Part III: Natural Language Processing An example of detected plagiarism

67 CS 224U Autumn 2007 What about other relations? Similarity can be used for adding new links to a thesaurus, and Lin used thesaurus induction as his motivation But thesauruses have more structure than just similarity In particular, hyponym/hypernym structure

68 CS 224U Autumn 2007 Detecting hyponymy and other relations Could we discover new hyponyms, and add them to a taxonomy under the appropriate hypernym? Why is this important? Some examples from Rion Snow: “insulin” and “progesterone are in WN 2.1, but “leptin” and “pregnenolone” are not. “combustibility” and “navigability”, but not “affordability”, “reusability”, or “extensibility”. “HTML” and “SGML”, but not “XML” or “XHTML”. “Google” and “Yahoo”, but not “Microsoft” or “IBM ”. This unknown word problem occurs throughout NLP

69 CS 224U Autumn 2007 Hearst Approach Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use. What does Gelidium mean? How do you know?

70 CS 224U Autumn 2007 Hearst’s hand-built patterns

71 CS 224U Autumn 2007 What to do for the data assignments Some things people did last year on the wordnet assignment Notice interesting inconsistencies or incompleteness in Wordnet There is no link in the WordNet synset between "kitten" or "kitty" and "cat”. –But the entry for "puppy" lists "dog" as a direct hypernym but does not list "young mammal" as one. “Sister term” relation is nontransitive and nonsymmetric “entailment” relation incomplete; "Snore" entails "sleep," but "die"doesn't entail "live.” antonymy is not a reflexive relation in WordNet Notice potential problems in wordnet Lots of rare senses Lots of senses are very very similar, hard to distinguish Lack of rich detail about each entry (focus only on rich relational info)

72 CS 224U Autumn 2007 Notice interesting things It appears that WordNet verbs do not follow as strict a hierarchy as the nouns. What percentage of words have one sense?