CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation.

Advertisements

1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.

Word Relations and Word Sense Disambiguation

Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -

5/16/2015CPSC503 Winter CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini.

 Aim to get back on Tuesday  I grade on a curve ◦ One for graduate students ◦ One for undergraduate students  Comments?

Chapter 17. Lexical Semantics From: Chapter 17 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.

Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.

CS 4705 Relations Between Words. Today Word Clustering Words and Meaning Lexical Relations WordNet Clustering for word sense discovery.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

1 Wordnet and word similarity Lectures 11 and 12.

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

CS 4705 Semantic Roles and Disambiguation. Today Semantic Networks: Wordnet Thematic Roles Selectional Restrictions Selectional Association Conceptual.

CS 4705 Lecture Lexical Semantics. What is lexical semantics? Meaning of Words Lexical Relations WordNet Thematic Roles Selectional Restrictions Conceptual.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.

Lecture Lexical Semantics CS 4705.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

LSA 311 Computational Lexical Semantics Dan Jurafsky Stanford University Lecture 2: Word Sense Disambiguation.

CS 4705 Lexical Semantics. Today Words and Meaning Lexical Relations WordNet Thematic Roles Selectional Restrictions Conceptual Dependency.

Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.

Bayesian Decision Theory Making Decisions Under uncertainty 1.

SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney.

Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.

Lexical Semantics CSCI-GA.2590 – Lecture 7A

Ontology Learning from Text: A Survey of Methods Source: LDV Forum,Volume 20, Number 2, 2005 Authors: Chris Biemann Reporter:Yong-Xiang Chen.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Text Classification, Active/Interactive learning.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

1 Query Operations Relevance Feedback & Query Expansion.

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

Lexical Semantics Chapter 16

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.

Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.

Using Semantic Relatedness for Word Sense Disambiguation

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Word Meaning and Similarity

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Word Meaning and Similarity Word Senses and Word Relations.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Statistical NLP: Lecture 3

Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.

Lecture 21 Computational Lexical Semantics

Lecture 16: Lexical Semantics, Wordnet, etc

Statistical NLP: Lecture 9

WordNet WordNet, WSD.

Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.

Unsupervised Word Sense Disambiguation Using Lesk algorithm

Relations Between Words

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation

Today Lexical Relations –Wordnet Semantic Role –Review: Semantic Roles –Selectional Restrictions –Selectional Association Word-Sense Disambiguation –Supervised –Unsupervised –Evaluation

Lexical Relations Semantic Networks: Used to represent lexical relationships –e.g. WordNet (George Miller et al) –Most widely used hierarchically organized lexical database for English –Synset: set of synonyms, a dictionary-style definition (or gloss), and some examples of uses --> a concept –Databases for nouns, verbs, and modifiers Applications can traverse network to find synonyms, antonyms, hyper- and hyponyms… –Available for download or online use –

Homonymy Homonyms: Words with same form – orthography and pronunciation -- but different, unrelated meanings, or senses –A bank 1 holds investments in a custodial account in the client’s name. –As agriculture is burgeoning on the east bank 2, the river will shrink even more

bank 1 "financial institution," 1474, from either O.It. banca or M.Fr. banque (itself from the O.It. term), both meaning "table" (the notion is of the moneylender's exchange table), from a Gmc. source (cf. O.H.G. bank "bench"); see bank (2). The verb meaning "to put confidence in" (U.S. colloquial) is attested from Bank holiday is from 1871, though the tradition is as old as the Bank of England. Bankroll (v.) "to finance" is 1920s. To cry all the way to the bank was coined 1956 by flamboyant pianist Liberace, after a Madison Square Garden concert that was packed with patrons but panned by critics.bank bank 2 "earthen incline, edge of a river," c.1200, probably in O.E., from O.N. banki, from P.Gmc. *bangkon "slope," cognate with P.Gmc. *bankiz "shelf."

Related Phenomena Homophones (same pron/different orth) Read/red Homographs (same orth/different pron) Bass/bass

Polysemy Words with multiple but related meanings –They rarely serve red meat. –He served as U.S. ambassador. –He might have served his time in prison. –idea bank, sperm bank, blood bank, bank bank –Can the two candidate senses be conjoined? ?He served his time and as ambassador to Norway. –Same etymology –Often a domain-dependent specialization

Synonymy Substitutability: different words, same meaning –Old/aged, pretty/attractive, food/sustenance, money How big is that plane? How large is that plane? How big are you? How large are you? What makes words substitutable – and not? –Polysemy (large vs. old sense) –register: He’s really cheap/?parsimonious. –collocational constraints: roast beef, ?baked beef economy fare ?economy price

How could we find Synonyms and Collocations automatically? Synonyms: Identify words appearing frequently in similar contexts Blast victims were helped by civic-minded passersby. Public-spirited passersby came to the aid of this bombing victim. Collocations: Identify synonyms or closely related words that do and don’t appear in similar contexts Flu victims, flu sufferers vs. ?Cold victims, cold sufferers… Roast turkey vs. Baked turkey

Hyponomy General: hypernym (super…ordinate) –dog is a hypernym of poodle –Test: ‘That is a poodle’ implies ‘that is a dog’ Specific: hyponym (under..neath) –poodle is a hyponym of dog –Test: ‘That is a poodle’ implies ‘that is a dog’ Ontology: set of domain objects Taxonomy: Specification of relations between those objects Object hierarchy: Structured hierarchy that supports feature inheritance (e.g. poodle inherits some properties of dog)

Tropes, or Figures of Speech Metaphor: one entity is given the attributes of another (tenor/vehicle/ground)Metaphor –Life is a bowl of cherries. Don’t take it serious…. –We are the eyelids of defeated caves. ?? –GM killed the Fiero. (conventional metaphor: corp. as person) Metonymy: one entity used to stand for another (replacive) –GM killed the Fiero. –The ham sandwich wants his check. (deferred reference) Both extend existing sense to new meaning –Metaphor: completely different concept –Metonymy: related concepts

Sum Many definable word relations useful to NLP in different ways –Homonymy, polysemy, synonymy, hypernymy –Homography, homophony –Metaphor, metonymy –Collocations Resources available to aid in processing –WordNet, FrameNet, online dictionaries,…. A Huge Problem for NLP?

Ambiguity and Word Sense Disambiguation Recall: For semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’? Flies [V] vs. Flies [N] He robbed the bank. He sat on the bank. How do we determine the correct sense of the word? Machine Learning –Supervised methods –Lightly supervised and Unsupervised Methods Bootstrapping Dictionary-based techniques Selectional Association

Supervised WSD Approaches: –Tag a corpus with correct senses of particular words (lexical sample) or all words (all-words task) E.g. SENSEVAL corpora –Lexical sample: Extract features which might predict word sense –POS? Word identity? Punctuation after? Previous word? Its POS? Use Machine Learning algorithm to produce a classifier which can predict the senses of one word or many –All-words Use semantic concordance: each open class word labeled with sense from dictionary or thesaurus

–E.g. SemCor (Brown Corpus), tagged with WordNet senses

What Features Are Useful? “Words are known by the company they keep” –How much ‘company’ do we need to look at? –What do we need to know about the ‘friends’? POS, lemmas/stems/syntactic categories,… Collocations: words that frequently appear with the target, identified from large corpora federal government, honor code, baked potato –Position is key Bag-of-words: words that appear somewhere in a context window I want to play a musical instrument so I chose the bass. –Ordering/proximity not critical

Punctuation, capitalization, formatting

Rule Induction Learners and WSD Given a feature vector of values for independent variables associated with observations of values for the training set Top-down greedy search driven by information gain: how will entropy of (remaining) data be reduced if we split on this feature? Produce a set of rules that perform best on the training data, e.g. –bank 2 if w-1==‘river’ & pos==NP & src==‘Fishing News’… –… Easy to understand result but many passes to achieve each decision, susceptible to over-fitting

Naïve Bayes ŝ = p(s|V), or Where s is one of the senses S possible for a word w and V the input vector of feature values for w Assume features independent, so probability of V is the product of probabilities of each feature, given s, so p(V) same for any ŝ Then

How do we estimate p(s) and p(v j |s)? –p(s i ) is max. likelihood estimate from a sense-tagged corpus (count(s i,w j )/count(w j )) – how likely is bank to mean ‘financial institution’ over all instances of bank? –P(v j |s) is max. likelihood of each feature given a candidate sense (count(v j,s)/count(s)) – how likely is the previous word to be ‘river’ when the sense of bank is ‘financial institution’ Calculate for each possible sense and take the highest scoring sense as the most likely choice

Transparent Like case statements applying tests to input in turn fish within window--> bass 1 striped bass--> bass 1 guitar within window--> bass 2 bass player--> bass 1 –Yarowsky ‘96’s approach orders tests by individual accuracy on entire training set based on log-likelihood ratio Decision List Classifiers

Bootstrapping I –Start with a few labeled instances of target item as seeds to train initial classifier, C –Use high confidence classifications of C on unlabeled data as training data –Iterate Bootstrapping II –Start with sentences containing words strongly associated with each sense (e.g. sea and music for bass), either intuitively or from corpus or from dictionary entries, and label those automatically –One Sense per Discourse hypothesis Lightly Supervised Methods: Bootstrapping

Dictionary Approaches Problem of scale for all ML approaches –Building a classifier for each word with multiple senses Machine-Readable dictionaries with senses identified and examples –Simplified Lesk: Retrieve all content words occurring in context of target (e.g. Sailors love to fish for bass.)bass –Compute overlap with sense definitions of target entry »bass1: a musical instrument… »bass2: a type of fish that lives in the sea…

bass1 /be ɪ s/ Pronunciation Key - Show Spelled Pronunciation[beys] Pronunciation Key - Show IPA Pronunciation Music.Pronunciation KeyShow Spelled Pronunciation Pronunciation KeyShow IPA Pronunciation –adjective 1.low in pitch; of the lowest pitch or range: a bass voice; a bass instrument. 2.of or pertaining to the lowest part in harmonic music. – noun 3.the bass part. 4.a bass voice, singer, or instrument. 5.double bass.double bass. [Origin: 1400–50; late ME, var. of base2 with ss of basso ]basebasso bass2 /bæs/ Pronunciation Key - Show Spelled Pronunciation[bas] Pronunciation Key - Show IPA PronunciationPronunciation KeyShow Spelled Pronunciation Pronunciation KeyShow IPA Pronunciation –noun, plural (especially collectively ) bass, (especially referring to two or more kinds or species ) bass·es. 1.any of numerous edible, spiny- finned, freshwater or marine fishes of the families Serranidae and Centrarchidae. 2.(originally) the European perch, Perca fluviatilis. [Origin: 1375–1425; late ME bas, earlier bærs, OE bærs (with loss of r before s as in ass2, passel, etc.); c. D baars, G Barsch, OSw agh-borre ]asspassel

–Choose sense with most content-word overlap –Original Lesk: Compare dictionary entries of all content-words in context with entries for each sense –But….dictionary entries are short Expand with entries of ‘related’ words that appear in the original entry If tagged corpus available, collect all the words appearing in context of each sense of target word –e.g. all words appearing in sentences with bass1 added to signature for bass1 –Weight each by frequency of occurrence of word with that sense tagged in corpus (e.g. all senses of bass) to capture how discriminating a word is for the target word’s senses –Corpus Lesk performs best of all Lesk approaches

Disambiguation via Selectional Restrictions “Verbs are known by the company they keep” –Different verbs select for different thematic roles wash the dishes (takes washable-thing as patient) serve delicious dishes (takes food-type as patient) Method: another semantic attachment in grammar –Semantic attachment rules are applied as sentences are syntactically parsed, e.g. VP --> V NP V  serve {theme:food-type} –Selectional restriction violation: no parse

But this means we must: –Write selectional restrictions for each sense of each predicate – or use FrameNetFrameNet Serve alone has 15 verb senses –Obtain hierarchical type information about each argument (using WordNet)WordNet How many hypernyms does dish have? How many words are hyponyms of dish?hyponyms But also: –Sometimes selectional restrictions don’t restrict enough (Which dishes do you like?) –Sometimes they restrict too much (Eat dirt, worm! I’ll eat my hat!) Can we take a statistical approach?

Selectional Association (Resnik ‘97) Selectional Preference Strength: how much does a predicate tell us about the word class of its argument? George is a monster, George cooked a steak –S R (v): How different is p(c), the probability that any direct object will be a member of some class c, from p(c|v), the probability that a direct object of a specific verb will fall into that class? 1.Estimate conditional probabilities of word senses from a parsed corpus, counting how often each predicate occurs with an object argument 1.e.g. How likely is dish to be an object of served? 1.Jane served/V the dish/Obj 2.Then estimate the strength of association between each predicate and the super-class (hypernym) of the argument in Wordnet

–E.g. For each object x of serve (e.g. ragout, Mary, dish) Look up all x’s hypernym classes in WordNet (e.g dish isa piece of crockery, dish isa food item, ragout isa food item, Mary isa person…) Distribute “credit” for each of x’s senses occurring with serve among all hypernym classes (≈sense) to which x belongs (1/n for n classes) –Pr(c|v) is estimated at count(c,v)/count(v) –Why does this work? Ambiguous words have many superordinate classes John served food/the dish/tuna/curry The most common sense across all objects of the verb should eventually dominate the likelihood score

–How can we use this in wsd? Choose the class (sense) of the direct object with the highest probability, given the verb Mary served the dish proudly. Results: –Baselines: random choice of word sense is 26.8% choose most frequent sense (NB: requires sense- labeled training corpus) is 58.2% –Resnik’s: 44% correct from corpus only pred/arg relations labeled

Evaluating WSD In vivo/end-to-end/task-based/extrinsic vs. in vitro/stand- alone/intrinsic: evaluation in some task (parsing? q/a? IVR system?) vs. application independent –In vitro metrics: classification accuracy on held-out test set or precision/recall/f-measure if not all instances must be labeled Baseline: –Most frequent sense? –Lesk algorithms Ceiling: human annotator agreement

Summing Up Word relations: how can we identify different types? Disambiguating among word senses Next time: Ch 17: 3-5