11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.

Slides:



Advertisements
Similar presentations
Semantics (Representing Meaning)
Advertisements

Computational Lexicography Frank Van Eynde Centre for Computational Linguistics.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
WSD for Applications Bill Dolan SenseEval Where is WSD useful?  Lots of work in the field, but still no clear answer Where WSD = classical, dictionary-sense.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Section 4: Language and Intelligence Overview Instructor: Sandiway Fong Department of Linguistics Department of Computer Science.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Linguistic Theory Lecture 8 Meaning and Grammar. A brief history In classical and traditional grammar not much distinction was made between grammar and.
Chapter 17. Lexical Semantics From: Chapter 17 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
ELN – Natural Language Processing Giuseppe Attardi
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Syntax Lecture 8: Verb Types 1. Introduction We have seen: – The subject starts off close to the verb, but moves to specifier of IP – The verb starts.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Word Sense Disambiguation (WSD)
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool) Ján GENČI Technical University of Košice, Slovakia
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
1 Query Operations Relevance Feedback & Query Expansion.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Lexical Semantics Chapter 16
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
WordNet: Connecting words and concepts Peng.Huang.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Sense clusters versus sense relations Irina Chugur, Julio Gonzalo UNED (Spain)
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Lecture 19 Word Meanings II Topics Description Logic III Overview of MeaningReadings: Text Chapter 189NLTK book Chapter 10 March 27, 2013 CSCE 771 Natural.
Zdroje jazykových dat Word senses Sense tagged corpora.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Approaches to Machine Translation
SENSEVAL: Evaluating WSD Systems
Semantics (Representing Meaning)
CS 388: Natural Language Processing: Word Sense Disambiguation
Rapidly Retargetable Translingual Detection
CSC 594 Topics in AI – Applied Natural Language Processing
Unit-4 Lexical Semantics M.B.Chandak, HoD CSE,
WordNet: A Lexical Database for English
WordNet WordNet, WSD.
Approaches to Machine Translation
Lecture 19 Word Meanings II
CS224N Section 3: Corpora, etc.
CS224N Section 3: Project,Corpora
Information Retrieval
Presentation transcript:

11 Chapter 19 Lexical Semantics

2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the pen. The ink is in the pen. –“take” (verb) Take one pill every morning. Take the first right past the stoplight. Syntax helps distinguish meanings for different parts of speech of an ambiguous word. –“conduct” (noun or verb) John’s conduct in class is unacceptable. John will conduct the orchestra on Thursday.

3 Motivation for Word Sense Disambiguation (WSD) Many tasks in natural language processing require disambiguation of ambiguous words. –Question Answering –Information Retrieval –Machine Translation –Text Mining –Phone Help Systems

4 Sense Inventory What is a “sense” of a word? –Homonyms (disconnected meanings) bank: financial institution bank: sloping land next to a river –Polysemes (related meanings with joint etymology) bank: financial institution as corporation bank: a building housing such an institution Sources of sense inventories –Dictionaries –Lexical databases

5 WordNet A detailed database of semantic relationships between English words. Developed by famous cognitive psychologist George Miller and a team at Princeton University. About 144,000 English words. Nouns, adjectives, verbs, and adverbs grouped into about 109,000 synonym sets called synsets.

6 WordNet Synset Relationships See Figures 19.2, 19.3, and WordNet (in lecture)

7 EuroWordNet WordNets for –Dutch –Italian –Spanish –German –French –Czech –Estonian

8 WordNet Senses WordNets senses (like many dictionary senses) tend to be very fine-grained. “play” as a verb has 35 senses, including –play a role or part: “Gielgud played Hamlet” –pretend to have certain qualities or state of mind: “John played dead.” Difficult to disambiguate to this level for people and computers. Not clear such fine-grained senses are useful for NLP. Several proposals for grouping senses into coarser, easier to identify senses (e.g. homonyms only).

9 Senses Based on Needs of Translation Only distinguish senses that translate to different words in some other language. –play: tocar vs. jugar –know: conocer vs. saber –be: ser vs. estar –leave: salir vs dejar –take: llevar vs. tomar vs. sacar

Cross-Lingual Sense Distinctions: A specific idea See Section 3.4 : Resnik & Yarowsky (2000), Natural Language Engineering 5(3) (on the schedule) Preliminaries: “Parallel” corpora are texts that are translations of each other In order to exploit a parallel corpus, the sentences must be aligned; that is, the sentences that are translations of each other must be matched up. If they are not already aligned after the translation process, automatic methods are used to align the sentences. –A search for “alignment” in the ACL anthology, which contains all ACL publications, gives 4,440 results. ACL are the main publications in Natural Language Processing (NLP). Several aligned parallel corpora are available and used by the NLP research community. One large repository of NLP data, tools and standards: Linguistic Data Consortium. Click on catalog; by type and source; search for “parallel”: 34 hits. Linguistic Data Consortium 10

Cross-Lingual Sense Distinctions: A specific idea Preliminaries (continued): Table 4 is the result of combining information from electronic bilingual dictionaries, with mappings to English WordNet senses. Resources to create a resource such as Table 4 are made available by the NLP research community. 11

Cross-Lingual Sense Distinctions: A specific idea This specific case study involves creating word-sense training data. Suppose our target word is “interest”. The training data is a corpus of sentences which contain “interest” + sense tags. Where will these sentences come from? From the English sides of the parallel corpora. 12

Cross-Lingual Sense Distinctions: A specific idea See Section 3.4 : Resnik & Yarowsky (2000), Natural Language Engineering 5(3) (on the schedule) –Restrict sense inventory to distinctions that are lexicalized cross-lingually –Middle ground between homographs within a single language (too course-grained) and all the fine-grained distinctions found in monolingual dictionaries.. Table 4: linking existing lexical resources such as WordNet with cross-lingual information from bilingual dictionaries. –“This table suggests how multiple parallel bilingual corpora can be used to yield sets of training data [i.e., training data for English word senses without requiring human manual annotation] covering different subsets of the English sense inventory, that in aggregate may yield tagged data for all given sense distinctions when any one language alone may not be adequate” 13

Cross-Lingual Sense Distinctions: A specific idea A lexicon such as Table 4 can help us refine the English sense inventory for the purpose of translation between English and these languages: Senses maybe dropped, split, or eliminated (which in Table 4?) 14

Semantic Roles Also called Thematic Roles We recently looked at PropBank again Revisit FrameNet 15

Semantic Roles: Example of Usefulness Shallow meaning representation that support simple inferences Can generalize over different surface realizations of predicate arguments. For example, the subject fill different roles: –John (agent) broke the window (theme) –John (agent) broke the window (theme) with a rock (instrument) –The rock (instrument) broke the window (theme) –The window (theme) broke –The window (theme) was broken by John (agent) 16

Selectional Restrictions Restrictions placed by a verb on the fillers of its semantic roles “I want to eat X” (X is the theme) Selectional restriction? Theme is edible. “I want to eat someplace that’s close to ICSI” (see Lecture) 17

Selectional Restrictions Associated with senses, not words: –“They served apple pie” (THEME: edible) –“Which airlines serve Denver?” (THEME: city) Selectional restrictions vary widely. –“hate”: Experiencer: person; THEME: could be almost anything –“multiply” (sense 1): THEME: numbers We will use selectional restictions later to help us with Word-Sense Diambiguation 18

Selectional Restrictions Can sometimes identify metaphor –My computer died –The computer ate my disk –Thoughts danced in my head –The death of democracy was imminent –John choked on the exam –He was drowning in self pity Metaphor’s relationship to word sense? 19