CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 2– Wordnet and Word Sense Disambiguation) Pushpak Bhattacharyya CSE Dept., IIT.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

Ontological Resources and Top-Level Ontologies Nicola Guarino LADSEB-CNR, Padova, Italy
User Interface Design for Wordnet Website Indowordnet Workshop(1-2 nd Jan 2011 IITB, Mumbai)
Corpus Processing and NLP
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 3– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept.,
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
CSE Department, I.I.T. Bombay Automatic Lexicon Generation through WordNet by Nitin Verma and Pushpak Bhattacharyya Jan 21, 2004.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29– AI and Probability (exemplified through NLP) 4 th Oct, 2010.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
1 Lecture 35 Brief Introduction to Main AI Areas (cont’d) Overview  Lecture Objective: Present the General Ideas on the AI Branches Below  Introduction.
1/27 Semantics Going beyond syntax. 2/27 Semantics Relationship between surface form and meaning What is meaning? Lexical semantics Syntax and semantics.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Hindi Wordnet at IIT Bombay Current Team: Pushpak Bhattacharyya, Prabhakar Pandey, Laxmi Kashyap, Salil Joshi, Arun Karthikeyan, Prachur Goel and many.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Topic: Hindi Wordnet, Formalization.
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Pushpak Bhattacharyya CSE Dept., IIT Bombay
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 37– Semantics; Universal Networking Language) Pushpak Bhattacharyya CSE Dept.,
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Suléne Pilon & Danie Prinsloo Overview: Teaching and Training in South Africa 25 November 2008;
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 1 – Introduction) Pushpak Bhattacharyya CSE Dept., IIT Bombay.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Semantic Relations; UNL; Towards Dependency Parsing.
CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 1 - Introduction.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture–39: Recap.
What is Wordnet Coimbatore Workshop at Amrita University Pushpak Bhattacharyya CSE Dept., IIT Bombay.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Wordnet - A lexical database for the English Language.
Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.
IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design 1.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Zdroje jazykových dat Word senses Sense tagged corpora.
Semantics Lecture 5. Semantics Language uses a system of linguistic signs, each of which is a combination of meaning and phonological and/or orthographic.
In this lecture, we will learn about: Translation.
Word Sense Disambiguation Algorithms in Hindi
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Lecture 21 Computational Lexical Semantics
CSC 594 Topics in AI – Applied Natural Language Processing
Technology Development
WordNet: A Lexical Database for English
WordNet WordNet, WSD.
Language and Statistics
Shraddha Kalele MARATHI WORDNET Presented by: Madhu Prasad Sharma
Knowledge Representation for Natural Language Understanding
Semantics Going beyond syntax.
Information Retrieval
Presentation transcript:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 2– Wordnet and Word Sense Disambiguation) Pushpak Bhattacharyya CSE Dept., IIT Bombay 6 th Jan, 2011

Perpectivising NLP: Areas of AI and their inter-dependencies Search Vision Planning Machine Learning Knowledge Representation Logic Expert Systems RoboticsNLP

Books etc. Main Text(s): Natural Language Understanding: James Allan Speech and NLP: Jurafsky and Martin Foundations of Statistical NLP: Manning and Schutze Other References: NLP a Paninian Perspective: Bharati, Cahitanya and Sangal Statistical NLP: Charniak Journals Computational Linguistics, Natural Language Engineering, AI, AI Magazine, IEEE SMC Conferences ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT, ICON, SIGIR, WWW, ICML, ECML

Wordnet A lexical knowledgebase based on conceptual lookup Organizing concepts in a semantic network. Organize lexical information in terms of word meaning, rather than word form Wordnet can also be used as a thesaurus.

Psycholinguistic Theory Human lexical memory for nouns as a hierarchy. Can canary sing? - Pretty fast response. Can canary fly? - Slower response. Does canary have skin? – Slowest response. (can move, has skin) (can fly) (can sing) Wordnet - a lexical reference system based on psycholinguistic theories of human lexical memory. Animal Bird canary

Lexical Matrix

Wordnet Wordnet is a network of words linked by lexical and semantic relations. The first wordnet in the world was for English developed at Princeton over 15 years. The Eurowordnet- linked structure of European language wordnets was built in 1998 over 3 years with funding from the EC as a a mission mode project. Wordnets for Hindi and Marathi being built at IIT Bombay are amongst the first IL wordnets. All these are proposed to be linked into the IndoWordnet which eventually will be linked to the English and the Euro wordnets.

Hindi Wordnet Dravidian Language Wordnet North East Language Wordnet Marathi Wordnet Sanskrit Wordnet English Wordnet Bengali Wordnet Punjabi Wordnet Konkani Wordnet Urdu Wordnet INDOWORDNET

Fundamental Design Question Syntagmatic vs. Paradigmatic realtions? Psycholinguistics is the basis of the design. When we hear a word, many words come to our mind by association. For English, about half of the associated words are syntagmatically related and half are paradignatically related. For cat animal, mammal- paradigmatic mew, purr, furry- syntagmatic

Stated Fundamental Application of Wordnet: Sense Disambiguation Determination of the correct sense of the word The crane ate the fish vs. The crane was used to lift the load bird vs. machine

The problem of Sense tagging Given a corpora To Assign correct sense to the words. This is sense tagging. Needs Word Sense Disambiguation (WSD) Highly important for Question Answering, Machine Translation, Text Mining tasks.

Classification of Words Word Content Word Function Word VerbNounAdjectiveAdverb Prepo sition Conjun ction Pronoun Interjection

Example of sense marking: its need _4187 _1138 _3123 _1189 _43540 _ _48029 _16168 _4187 _ _42403 _ (According to a new research, those people who have a busy social life, have larger space in a part of their brain). _4187 _1138 _3123 _4118 _1189 _16168 _11431 _16168 _4187 _ _43540 _1438 _ _166 _38861 _25368 _ _1189 _13159 _16168 _ _ _14077 _ _1189 _42403 _16168 _ _ _1189 _ _38220 _42403 _ _16168 _ _1912 _42151 _1652 _212436

Ambiguity of (People),,,, - " " (English synset) multitude, masses, mass, hoi_polloi, people, the_great_unwashed - the common people generally "separate the warriors from the mass" "power to the people",,,,,,,,,,,, - " / / " (English synset) populace, public, world - people in general considered as a whole "he is a hero in the eyes of the public

Basic Principle Words in natural languages are polysemous. However, when synonymous words are put together, a unique meaning often emerges. Use is made of Relational Semantics. Componential Semantics where each word is a bundle of semantic features (as in the Schankian Conceptual Dependency system or Lexical Componential Semantics) is to be examined as a viable alternative.

Componential Semantics Consider cat and tiger. Decide on componential attributes. For cat (Y, Y, N, Y) For tiger (Y,Y,Y,N) Complete and correct Attributes are difficult to design. FurryCarnivorousHeavyDomesticable

Semantic relations in wordnet 1. Synonymy 2. Hypernymy / Hyponymy 3. Antonymy 4. Meronymy / Holonymy 5. Gradation 6. Entailment 7. Troponymy 1, 3 and 5 are lexical (word to word), rest are semantic (synset to synset).

Synset: the foundation (house) 1. house -- (a dwelling that serves as living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house") 2. house -- (an official assembly having legislative powers; "the legislature has two houses") 3. house -- (a building in which something is sheltered or located; "they had a large carriage house") 4. family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household"; "I waited until the whole house was asleep"; "the teacher asked how many people made up his home") 5. theater, theatre, house -- (a building where theatrical performances or motion-picture shows can be presented; "the house was full") 6. firm, house, business firm -- (members of a business organization that owns or operates one or more establishments; "he worked for a brokerage house") 7. house -- (aristocratic family line; "the House of York") 8. house -- (the members of a religious community living together) 9. house -- (the audience gathered together in a theatre or cinema; "the house applauded"; "he counted the house") 10. house -- (play in which children take the roles of father or mother or children and pretend to interact like adults; "the children were playing house") 11. sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided) 12. house -- (the management of a gambling house or casino; "the house gets a percentage of every bet")

Creation of Synsets Three principles: Minimality Coverage Replacability

Synset creation ( continued) Home Johns home was decorated with lights on the occasion of Christmas. Having worked for many years abroad, John Returned home. House Johns house was decorated with lights on the occasion of Christmas. Mercury is situated in the eighth house of Johns horoscope.

Synsets ( continued) {house} is ambiguous. {house, home} has the sense of a social unit living together; Is this the minimal unit? {family, house, home} will make the unit completely unambiguous. For coverage: {family, household, house, home} ordered according to frequency. Replacability of the most frequent words is a requirement.

Synset creation From first principles Pick all the senses from good standard dictionaries. Obtain synonyms for each sense. Needs hard and long hours of work.

Synset creation (continued) From the wordnet of another language in the same family Pick the synset and obtain the sense from the gloss. Get the words of the target language. Often same words can be used- especially for t%sama words. Translation, Insertion and deletion. Hindi Synset: AnauBavaI jaanakar maMjaa huAa (experienced person) Marathi Synset: AnauBavaI t& jaaNata &ata