Lecture 26 Lexical Semantics

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation.
Advertisements

Statistical NLP: Lecture 3
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Chapter 17. Lexical Semantics From: Chapter 17 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
CS 4705 Relations Between Words. Today Word Clustering Words and Meaning Lexical Relations WordNet Clustering for word sense discovery.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
November 20, 2003 Chapter 16 Lexical Semantics. Words have structured meanings Lexeme – a pairing of a form with a sense Orthographic form – the way the.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Meaning and Language Part 1.
Natural Language Processing Lecture 22—11/14/2013 Jim Martin.
9/8/20151 Natural Language Processing Lecture Notes 1.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Lexical Semantics Chapter 16
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Linguistic Essentials
Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.
Rules, Movement, Ambiguity
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
The meaning of Language Chapter 5 Semantics and Pragmatics Week10 Nov.19 th -23 rd.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
Word Meaning and Similarity
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Slang. Informal verbal communication that is generally unacceptable for formal writing.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
SEMANTICS Chapter 10 Ms. Abrar Mujaddidi. What is semantics?  Semantics is the study of the conventional meaning conveyed by the use of words, phrases.
Chapter 4 Syntax a branch of linguistics that studies how words are combined to form sentences and the rules that govern the formation of sentences.
Natural Language Processing Vasile Rus
Natural Language Processing Vasile Rus
Lesson 11 Lexical semantics 1
Syntax Lecture 9: Verb Types 1.
Statistical NLP: Lecture 3
SYNTAX.
Ontology Engineering: from Cognitive Science to the Semantic Web
Natural Language Processing (NLP)
Part I: Basics and Constituency
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Semantics September 8, /17/2018.
Lecture 24 Semantics September 28, /20/2018.
Lexical Semantics September 13, /20/2018.
Structural relations Carnie 2013, chapter 4 Kofi K. Saah.
CSCI 5832 Natural Language Processing
Lecture 16: Lexical Semantics, Wordnet, etc
CSC 594 Topics in AI – Applied Natural Language Processing
Semantics September 8, /16/2018.
Machine Learning in Natural Language Processing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
Machine Learning in Practice Lecture 11
Representing Meaning Lecture Sep 2007.
Lesson 11 Lexical semantics 1
Lecture 26 Semantics September 30, /31/2018.
Reference & Denotation Connotation Sense Relations
CSCI 5832 Natural Language Processing
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Linguistic Essentials
The. the of and a to in is you that with.
Natural Language Processing (NLP)
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Relations Between Words
Information Retrieval
Statistical NLP: Lecture 10
CPSC 503 Computational Linguistics
Natural Language Processing (NLP)
Presentation transcript:

Lecture 26 Lexical Semantics October 6, 2005 11/28/2018

Meaning The meaning of a text or discourse Traditionally, meaning in language has been studied from three perspectives The meaning of a text or discourse The meanings of individual sentences or utterances The meanings of individual words We started in the middle, now we’ll look at the meanings of individual words. 11/28/2018

Word Meaning We didn’t assume much about the meaning of words when we talked about sentence meanings Verbs provided a template-like predicate argument structure Nouns were practically meaningless constants There has be more to it than that The internal structure of words that determines where they can go and what they can do (syntagmatic) 11/28/2018

What’s a word? Words?: Types, tokens, stems, roots, inflected forms? Lexeme – An entry in a lexicon consisting of a pairing of a form with a single meaning representation Lexicon - A collection of lexemes 11/28/2018

Lexical Semantics The linguistic study of systematic meaning related structure of lexemes is called Lexical Semantics. A lexeme is an individual entry in the lexicon. A lexicon is meaning structure holding meaning relations of lexemes. A lexeme may have different meanings. A lexeme’s meaning component is known as one of its senses. Different senses of the lexeme duck. an animal, to lower the head, ... Different senses of the lexeme yüz face, to swim, to skin, the front of something, hundred, ... 11/28/2018

Outline: Lexical Semantics Concepts about word meaning including: Homonymy, Polysemy, Synonymy Thematic roles Two computational areas Enabling resource: WordNet Enabling technology: Word sense disambiguation 11/28/2018

Relations Among Lexemes and Their Senses Homonymy Polysemy Snonymy Hyponymy Hypernym 11/28/2018

Homonymy Homonymy is a relation holds between words have same form with unrelated meanings. Bank -- financial institution, river bank For this, we should create two senses of the lexeme bank. Bat -- (wooden stick-like thing) vs (flying scary mammal thing) The problematic part of understanding homonymy isn’t with the forms, it’s the meanings. Causes ambiguity. Nothing particularly important would happen to anything else in English if we used a different word for the little flying mammal things 11/28/2018

Homonymy causes problems for NLP applications Text-to-Speech Same orthographic form but different phonological form bass vs bass Information retrieval Different meanings same orthographic form QUERY: bat care Machine Translation Speech recognition Why? 11/28/2018

Polysemy Polysemy is the phenomenon of multiple related meanings in a same lexeme. Bank -- financial institution, blood bank these senses are related. Are we going to create a single sense or two different senses? While some banks furnish sperm only to married women, others are less restrictive Serve Which flights serve breakfast? Does America West serve Philadelphia? Does United serve breakfast and San Jose? Most non-rare words have multiple meanings The number of meanings is related to its frequency Verbs tend more to polysemy Distinguishing polysemy from homonymy isn’t always easy (or necessary) 11/28/2018

Synonymy Synonymy is the phenomenon of two different lexemes having the same meaning. Big and large In fact, one of the senses of two lexemes are same. There aren’t any true synonyms. Two lexemes are synonyms if they can be successfully substituted for each other in all situations What does successfully mean? Preserves the meaning But may not preserve the acceptability based on notions of politeness, slang, ... Example - Big and large? That’s my big sister a big plane That’s my large sister a large plane 11/28/2018

Hyponymy and Hypernym Hyponymy: one lexeme denotes a subclass of the other lexeme. The more specific lexeme is a hyponymy of the more general lexeme. The more general lexeme is a hypernym of the more specific lexeme. A hyponymy relation can be asserted between two lexemes when the meanings of the lexemes entail a subset relation Since dogs are canids Dog is a hyponym of canid and Canid is a hypernym of dog Car is a hyponymy of vehicle, vehicle is a hypernym of car. 11/28/2018

Metaphor and Metonymy Specific types of polysemy Metaphor: Metonymy Germany will pull Slovenia out of its economic slump. I spent 2 hours on that homework. Metonymy The White House announced yesterday. This chapter talks about part-of-speech tagging Bank (building) and bank (financial institution) 11/28/2018

Antonyms Words that are opposites with respect to one feature of their meaning Otherwise, they are very similar! Dark light Boy girl Hot cold Up down In out 11/28/2018

Ontology The term ontology refers to a set of distinct objects resulting from analysis of a domain. A taxonomy is a particular arrangements of the elements of an ontology into a tree-like class inclusion structure. A lexicon holds different senses of lexemes together with other relations among lexemes. 11/28/2018

Lexical Resourses There are lots of lexical resources available Word lists On-line dictionaries Corpora The most ambitious one is WordNet A database of lexical relations for English Versions for other languages are under development 11/28/2018

WordNet WordNet is widely used lexical database for English. WebPage: http://www.cogsci.princeton.edu/~wn/ It holds: The senses of the lexemes holds relations among nouns such as hypernym, hyponym, MemberOf, .. Holds relations among verbs such as hypernym, … Relations are held for each different senses of a lexeme. 11/28/2018

WordNet Demo: http://www.cogsci.princeton.edu/cgi-bin/webwn 11/28/2018

Format of Wordnet Entries 11/28/2018

WordNet Relations Some of WordNet Relations (for nouns) 11/28/2018

WordNet Hierarchies Hyponymy chains for the senses of the lexeme bass 11/28/2018

WordNet - bass The noun "bass" has 8 senses in WordNet. 1. bass -- (the lowest part of the musical range) 2. bass, bass part -- (the lowest part in polyphonic music) 3. bass, basso -- (an adult male singer with the lowest voice) 4. sea bass, bass -- (the lean flesh of a saltwater fish of the family Serranidae) 5. freshwater bass, bass -- (any of various North American freshwater fish with lean flesh (especially of the genus Micropterus)) 6. bass, bass voice, basso -- (the lowest adult male singing voice) 7. bass -- (the member with the lowest range of a family of musical instruments) 8. bass -- (nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) The adjective "bass" has 1 sense in WordNet. 1. bass, deep -- (having or denoting a low vocal or instrumental range; "a deep voice"; "a bass voice is lower than a baritone voice"; "a bass clarinet") 11/28/2018

WordNet –bass Hyponyms Results for "Hyponyms (...is a kind of this), full" search of noun "bass" 6 of 8 senses of bass                                                    Sense 2 bass, bass part -- (the lowest part in polyphonic music)        => ground bass -- (a short melody in the bass that is constantly repeated)        => thorough bass, basso continuo -- (a bass part written out in full and accompanied by figures for successive chords)        => figured bass -- (a bass part in which the notes have numbers under them to indicate the chords to be played) Sense 4 sea bass, bass -- (the lean flesh of a saltwater fish of the family Serranidae)        => striped bass, striper -- (caught along the Atlantic coast of the United States) Sense 5 freshwater bass, bass -- (any of various North American freshwater fish with lean flesh (especially of the genus Micropterus))        => largemouth bass -- (flesh of largemouth bass)        => smallmouth bass -- (flesh of smallmouth bass) Sense 6 bass, bass voice, basso -- (the lowest adult male singing voice)        => basso profundo -- (a very deep bass voice) Sense 7 bass -- (the member with the lowest range of a family of musical instruments)        => bass fiddle, bass viol, bull fiddle, double bass, contrabass, string bass -- (largest and lowest member of the violin family)        => bass guitar -- (the lowest six-stringed guitar)        => bass horn, sousaphone, tuba -- (the lowest brass wind instrument)            => euphonium -- (a bass horn (brass wind instrument) that is the tenor of the tuba family)            => helicon, bombardon -- (a tuba that coils over the shoulder of the musician)        => bombardon, bombard -- (a large shawm; the bass member of the shawm family) Sense 8 bass -- (nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)        => freshwater bass -- (North American food and game fish) 11/28/2018

WordNet – bass Synonyms Results for "Synonyms, ordered by estimated frequency" search of noun "bass" 8 senses of bass                                                         Sense 1 bass -- (the lowest part of the musical range)        => low pitch, low frequency -- (a pitch that is perceived as below other pitches) Sense 2 bass, bass part -- (the lowest part in polyphonic music)        => part, voice -- (the melody carried by a particular voice or instrument in polyphonic music; "he tried to sing the tenor part") Sense 3 bass, basso -- (an adult male singer with the lowest voice)        => singer, vocalist, vocalizer, vocaliser -- (a person who sings) Sense 4 sea bass, bass -- (the lean flesh of a saltwater fish of the family Serranidae)        => saltwater fish -- (flesh of fish from the sea used as food) Sense 5 freshwater bass, bass -- (any of various North American freshwater fish with lean flesh (especially of the genus Micropterus))        => freshwater fish -- (flesh of fish from fresh water used as food) Sense 6 bass, bass voice, basso -- (the lowest adult male singing voice)        => singing voice -- (the musical quality of the voice while singing) Sense 7 bass -- (the member with the lowest range of a family of musical instruments)        => musical instrument, instrument -- (any of various devices or contrivances that can be used to produce musical tones or sounds) Sense 8 bass -- (nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes)        => percoid fish, percoid, percoidean -- (any of numerous spiny-finned fishes of the order Perciformes) WordNet – bass Synonyms 11/28/2018

Internal Structure of Words Paradigmatic relations connect lexemes together in particular ways but don’t say anything about what the meaning representation of a particular lexeme should consist of. Various approaches have been followed to describe the semantics of lexemes. Thematic roles in predicate-bearing lexemes Selection restrictions on thematic roles Decompositional semantics of predicates Feature-structures for nouns 11/28/2018

Thematic Roles Thematic roles provide a shallow semantic language for characterizing certain arguments of verbs. For example: Ali broke the glass. Veli opened the door. Ali is Breaker and the glass is BrokenThing of Breaking event; Veli is Opener and the door is OpenedThing of Opening event. These are deep roles of arguments of events. Both of these events have actors which are doer of a volitional event, and things affected by this action. A thematic role is a way of expressing this kind of commonality. AGENT and THEME are thematic roles. 11/28/2018

Some Thematic Roles AGENT --The volitional causer of an event -- She broke the door EXPERIENCER -- The experiencer of an event -- Ali has a headache. FORCE - The non-volitional causer of an event --The wind blows debris. THEME -- The participant most directly effected by an event -- She broke the door. INSTRUMENT -- An instrument used in an event -- He opened it with a knife. BENEFICIARY -- A beneficiary of an event -- I bought it for her. SOURCE -- The origin of the object of a transfer event -- I flew from Rome. GOAL -- The destination of the object of a transfer event -- I flew to Ankara. 11/28/2018

Thematic Roles (cont.) Takes some of the work away from the verbs. It’s not the case that every verb is unique and has to completely specify how all of its arguments uniquely behave. Provides a mechanism to organize semantic processing It permits us to distinguish near surface-level semantics from deeper semantics 11/28/2018

Linking AGENTS are often subjects Thematic roles, syntactic categories and their positions in larger syntactic structures are all intertwined in complicated ways. For example… AGENTS are often subjects In a VP->V NP NP rule, the first NP is often a GOAL and the second a THEME 11/28/2018

More on Linking John opened the door AGENT THEME The door was opened by John THEME AGENT The door opened THEME John opened the door with the key AGENT THEME INSTRUMENT 11/28/2018

Deeper Semantics He melted her reserve with a husky-voiced paean to her eyes. If we label the constituents He and her reserve as the Melter and Melted, then those labels lose any meaning they might have had. If we make them Agent and Theme then we don’t have the same problems 11/28/2018

Selectional Restrictions A selectional restriction augments thematic roles by allowing lexemes to place certain semantic restrictions on the lexemes and phrases can accompany them in a sentence. I want to eat someplace near Bilkent. Now we can say that eat is a predicate that has an AGENT and a THEME And that the AGENT must be capable of eating and the THEME must be capable of being eaten Each sense of a verb can be associated with selectional restrictions. THY serves NewYork. -- direct object (theme) is a place THY serves breakfast. -- direct object (theme) is a meal. We may use these selectional restrictions to disambiguate a sentence. 11/28/2018

As Logical Statements For eat… Eating(e) ^Agent(e,x)^ Theme(e,y)^Isa(y, Food) (adding in all the right quantifiers and lambdas) 11/28/2018

WordNet Use WordNet hyponyms (type) to encode the selection restrictions 11/28/2018

Specificity of Restrictions What can you say about THEME in each with respect to the verb? Some will be high up in the WordNet hierarchy, others not so high… PROBLEMS Unfortunately, verbs are polysemous and language is creative… … ate glass on an empty stomach accompanied only by water and tea you can’t eat gold for lunch if you’re hungry … get it to try to eat Afghanistan 11/28/2018

Discovering the Restrictions Instead of hand-coding the restrictions for each verb, can we discover a verb’s restrictions by using a corpus and WordNet? Parse sentences and find heads Label the thematic roles Collect statistics on the co-occurrence of particular headwords with particular thematic roles Use the WordNet hypernym structure to find the most meaningful level to use as a restriction 11/28/2018

Motivation Find the lowest (most specific) common ancestor that covers a significant number of the examples 11/28/2018

Word-Sense Disambiguation Word sense disambiguation refers to the process of selecting the right sense for a word from among the senses that the word is known to have Semantic selection restrictions can be used to disambiguate Ambiguous arguments to unambiguous predicates Ambiguous predicates with unambiguous arguments Ambiguity all around 11/28/2018

Word-Sense Disambiguation We can use selectional restrictions for disambiguation. He cooked simple dishes. He broke the dishes. But sometimes, selectional restrictions will not be enough to disambiguate. What kind of dishes do you recommend? -- we cannot know what sense is used. There can be two lexemes (or more) with multiple senses. They serve vegetarian dishes. Selectional restrictions may block the finding of meaning. If you want to kill Turkey, eat its banks. Kafayı yedim. These situations leave the system with no possible meanings, and they can indicate a metaphor. 11/28/2018

WSD and Selection Restrictions Ambiguous arguments Prepare a dish Wash a dish Ambiguous predicates Serve Denver Serve breakfast Both Serves vegetarian dishes 11/28/2018

WSD and Selection Restrictions This approach is complementary to the compositional analysis approach. You need a parse tree and some form of predicate-argument analysis derived from The tree and its attachments All the word senses coming up from the lexemes at the leaves of the tree Ill-formed analyses are eliminated by noting any selection restriction violations 11/28/2018

Problems As we saw last time, selection restrictions are violated all the time. This doesn’t mean that the sentences are ill-formed or preferred less than others. This approach needs some way of categorizing and dealing with the various ways that restrictions can be violated 11/28/2018

WSD Tags A dictionary sense? What’s a tag? For example, for WordNet an instance of “bass” in a text has 8 possible tags or labels (bass1 through bass8). 11/28/2018

WordNet Bass The noun ``bass'' has 8 senses in WordNet bass - (the lowest part of the musical range) bass, bass part - (the lowest part in polyphonic music) bass, basso - (an adult male singer with the lowest voice) sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) freshwater bass, bass - (any of various North American lean-fleshed freshwater fishes especially of the genus Micropterus) bass, bass voice, basso - (the lowest adult male singing voice) bass - (the member with the lowest range of a family of musical instruments) bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) 11/28/2018

Representations Vectors of sets of feature/value pairs Most supervised ML approaches require a very simple representation for the input training data. Vectors of sets of feature/value pairs I.e. files of comma-separated values So our first task is to extract training data from a corpus with respect to a particular instance of a target word This typically consists of a characterization of the window of text surrounding the target 11/28/2018

Representations This is where ML and NLP intersect If you stick to trivial surface features that are easy to extract from a text, then most of the work is in the ML system If you decide to use features that require more analysis (say parse trees) then the ML part may be doing less work (relatively) if these features are truly informative 11/28/2018

Surface Representations Collocational and co-occurrence information Collocational Encode features about the words that appear in specific positions to the right and left of the target word Often limited to the words themselves as well as they’re part of speech Co-occurrence Features characterizing the words that occur anywhere in the window regardless of position Typically limited to frequency counts 11/28/2018

Collocational [guitar, NN, and, CJC, player, NN, stand, VVB] Position-specific information about the words in the window guitar and bass player stand [guitar, NN, and, CJC, player, NN, stand, VVB] In other words, a vector consisting of [position n word, position n part-of-speech…] 11/28/2018

Co-occurrence Information about the words that occur within the window. First derive a set of terms to place in the vector. Then note how often each of those terms occurs in a given window. 11/28/2018

Classifiers Naïve Bayes (the right thing to try first) Decision lists Once we cast the WSD problem as a classification problem, then all sorts of techniques are possible Naïve Bayes (the right thing to try first) Decision lists Decision trees Neural nets Support vector machines Nearest neighbor methods… 11/28/2018

Classifiers The choice of technique, in part, depends on the set of features that have been used Some techniques work better/worse with features with numerical values Some techniques work better/worse with features that have large numbers of possible values For example, the feature the word to the left has a fairly large number of possible values 11/28/2018

Statistical Word-Sense Disambiguation Where s is a vector of senses, V is the vector representation of the input By Bayesian rule By making independence assumption of meanings. This means that the result is the product of the probabilities of its individual features given that its sense 11/28/2018

Problems One for each ambiguous word in the language Given these general ML approaches, how many classifiers do I need to perform WSD robustly One for each ambiguous word in the language How do you decide what set of tags/labels/senses to use for a given word? Depends on the application 11/28/2018