Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.
The Meaning of Language
SEMANTICS.
Statistical NLP: Lecture 3
Creating a Similarity Graph from WordNet
Introduction to Computational Linguisitics The Lexicon.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
1/27 Semantics Going beyond syntax. 2/27 Semantics Relationship between surface form and meaning What is meaning? Lexical semantics Syntax and semantics.
A STUDY ON THE KNOWLEDGE SOURCES OF TURKISH EFL LEARNERS IN LEXICAL INFERENCING İlknur İSTİFÇİ Anadolu University Eskişehir, TURKEY Eskişehir, TURKEY.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
From Semantic Similarity to Semantic Relations Georgeta Bordea, November 25 Based on a talk by Alessandro Lenci titled “Will DS ever become Semantic?”,
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Course G Web Search Engines 3/9/2011 Wei Xu
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
BT Exact Technologies - Adastral Park, Ipswich July - October 2003 Linguistic Web Services for Semantic Web Dr. Vassil T. Vassilev London Metropolitan.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
1 Natural Language Processing (2a) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
Wordnet, Raw Text Pinker, continuing Chapter 2
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Intelligent Systems Lecture 20 Examples of NLP in searching systems.
ArchiWordNet Integrating WordNet with Domain-Specific Knowledge Luisa Bentivogli 1, Andrea Bocco 2, Emanuele Pianta 1 1 ITC-irst Trento, Italy 2 Politecnico.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Meaning. Semantics (the study of meaning) Semantics: the study of meaning, or to be more specific, the study of the meaning of linguistic units, words.
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Lexical Semantics Chapter 16
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
WordNet: Connecting words and concepts Peng.Huang.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Linguistic Essentials
LECTURE 2: SEMANTICS IN LINGUISTICS
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Wordnet - A lexical database for the English Language.
Ontology Engineering: from Cognitive Science to the Semantic Web Maria Teresa Pazienza University of Roma Tor Vergata, Italy 1.
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Word Meaning and Similarity
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Information Retrieval Search Engine Technology (7) Prof. Dragomir R. Radev.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Lexical Semantics and Word Senses Hongning Wang
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
SEMANTICS Chapter 10 Ms. Abrar Mujaddidi. What is semantics?  Semantics is the study of the conventional meaning conveyed by the use of words, phrases.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Introduction to Computational Linguisitics The Lexicon.
Lexicons, Concept Networks, and Ontologies
Statistical NLP: Lecture 3
Generating sets of synonyms between languages
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Comparing Two Thesaurus Representations for Russian
What is Linguistics? The scientific study of human language
CSC 594 Topics in AI – Applied Natural Language Processing
WordNet: A Lexical Database for English
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Linguistic Essentials
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Text Mining Application Programming Chapter 3 Explore Text
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Using resources WordNet and the BNC

WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University –theoretical basis: results from psycholinguistics and psycholexicology –What are properties of the “mental lexicon”?

Global organisation division of the lexicon into five categories: –Nouns –Verbs –Adjectives –Adverbs –function words (“probably stored separately as part of the syntactic component of language” [Miller et al.]

Global organization nouns: organized as topical hierarchies verbs: entailment relations adjectives: N-dimensional hyperspaces adverbs: N-dimensional hyperspaces [Miller et al.]: “Each of these lexical structures reflects a different way of categorizing experience; attempts to impose a single organizing principle on all syntactic categories would badly misrepresent the psychological complexity of lexical knowledge.”

Basic principles organize lexical information in terms of word meaning, rather than word forms –“In this respect, WordNet resembles athesaurus more than a dictionary,...” [Miller et al.] “... a word is a conventional association between a lexicalized concept and an utterance that plays a syntactic role.” –word form: refers to physical utterance or inscription –word meaning: refers to the lexicalized concept that a form can be used to express

Lexical semantics How are word meanings represented in WordNet? –synsets (synonym sets) as basic units –a word meaning is represented by simply listing the word forms that can be used to express it example: senses of board –a piece of lumber vs. a group of people assembled for some purpose –synsets as unambiguous designators: –{board, plank} vs. {board, committee}

Synsets synsets often sufficient for differential purposes –if an appropriate synonym is not available a short gloss may be used –e.g. {board, (a person’s meals, provided regularly for money)}

Lexical Relations in WordNet “WordNet is organized by semantic relations.” –It is characteristic of semantic relations that they are reciprocated; –if there is a semantic relation R between meaning {x, x’,...} and meaning {y, y’,...}, then there is a relation R’ between {y,y’,...} and {x, x’,...}.

Lexical relations: synonymy similarity of meaning –Leibniz: two expressions are synonymous if the substitution of one for the other never changes the truth value of a sentence in which the substitution is made such global synonymy is rare (it would be redundant) –synonymy relative to a context: two expressions are synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value –consequence of this synonymy in terms of substitutability: words in different syntactic categories cannot be synonyms

Lexical relations: antonymy antonym of a word x is sometimes not-x, but not always –rich and poor are antonyms –but: not rich does not imply poor –(because many people consider them neither rich nor poor) antonymy is a lexical relation between word forms, not a semantic relation between word meanings –meanings {rise,ascend} and {fall, descend} are conceptual opposites, but they are not antonyms [rise/fall] and [ascend/descend] are pairs of antonyms –{w 1 w 2 }  S 1 & {w 3 w 4 }  S 2 & ant(w 1,w 3 )  ant(w 2,w 4 )

Lexcial relations: hyponymy hyponymy is a semantic relation between word meanings –{maple} is a hyponym of {tree} inverse: hypernymy –{tree} is a hypernym of {maple} also called: subordination/superordination; subset/superset; ISA relation test for hyponomy: –native speaker must accept sentences built from the frame “An x is a (kind of) y”

Lexcial relations: meronymy A concept represented by the synset {x, x’,...} is a meronym of a concept represented by the synset {y, y’,...} if native speakers of English accept sentences constructed from such frames as “A y has an x (as a part)”, “An x is a part of y”. inverse relation: holonymy HAS-AS-PART –part hierarchy –part-of is asymmetric and (with caution) transitive

Lexical relations: meronymy failures of transitivity caused by different part- whole relations, e.g. –A musician has an arm. –An orchestra has a musician. –but: ? An orchestra has an arm. Types of meronymy in WordNet: –component [most frequently found] –member –composition –phase process

WordNet’s noun hierarchy noun hierarchy partitioned into separate hierarchies with unique top hypernyms vague abstractions would be semantically empty, e.g. {entity} with immediate hyponyms {object, thing} and {idea}

{act,action,activity} {animal,fauna} {artifact} {attribute,property} {body,corpus} {cognition,knowledge} {communication} {event,happening} {feeling,emotion} {food} {group,collection} {location,place} {motive} {natural object} {natural phenomenon} {person,human being} {plant,flora} {possession} {process} {quantity,ammount} {relation} {shape} {state, condition} {substance} {time}

Nouns in WordNet noun hierarchy as lexical inheritance system –“... seldom goes more than ten levels deep, and the deepest examples usually contain technical levels that are not part of everyday vocabulary.” –Shetland pony → pony → horse → equid → odd-toed ungulate → herbivore → mammal → vertebrate → animal

Nouns in WordNet man-made artifacts: sometimes six or seven levels deep –roadster → car → motor vehicle → wheeled vehicle → vehicle → conveyance → artifact hierarchy of persons: about three or four levels –televangelist → evangelist → preacher → clergyman → spiritual leader → person Like all thesaurus structures, words can have multiple hypernyms

WordNets for other languages Idea has been widely copied Sometimes by “translating” Princeton WordNet –Lexical relations in general are universal... –But are they in practice? –Are synsets universal? EuroWordNet: combining multilingual WordNets to include cross-language equivalence –Inherent difficulties, as above

BNC One of the most widely used corpora (esp. in Britain, but also elsewhere) A balanced synchronic text corpus containing 100 million words (POS tagged) Collected in late 1980s 90% text, 10% transcribed speech Encoded according to TEI standards Associated tools (mainly for searching), but many users write their own (eg in Perl)

Using the BNC Just looking up words More interesting to construct queries that exploit the mark-up (see Allan’s slides) Already becoming dated (e.g. “numpty”) Results often contradict “authorities” such as dictionaries, especially in revealing primary senses/uses of words.

WWW as a corpus Standard Google search engine used with individual words does not always give good word collocations: after all, Google is document retrieval Try:

Lexical research Use corpus resource such as BNc together with WordNet to get interesting results → Allan’s slides