Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:
Building a Large- Scale Knowledge Base for Machine Translation Kevin Knight and Steve K. Luk Presenter: Cristina Nicolae.
Creating a Similarity Graph from WordNet
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Written by Alexander Budanitsky Graeme Hirst Retold by.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
1 Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay.
Integrating Greek and English Digital Resources Sean Boisen Computer Assisted Research Section, S Slides at:
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
ArchiWordNet Integrating WordNet with Domain-Specific Knowledge Luisa Bentivogli 1, Andrea Bocco 2, Emanuele Pianta 1 1 ITC-irst Trento, Italy 2 Politecnico.
Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Query Operations Relevance Feedback & Query Expansion.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
WordNet: Connecting words and concepts Peng.Huang.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
Wordnet - A lexical database for the English Language.
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Introduction to Computational Linguisitics The Lexicon.
Lexicons, Concept Networks, and Ontologies
Generating sets of synonyms between languages
Exploring and Navigating: Tools for GermaNet
Element Level Semantic Matching
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Comparing Two Thesaurus Representations for Russian
What is Linguistics? The scientific study of human language
CSC 594 Topics in AI – Applied Natural Language Processing
WordNet: A Lexical Database for English
Logics for Data and Knowledge Representation
WordNet WordNet, WSD.
Text Mining Application Programming Chapter 3 Explore Text
Presentation transcript:

Semantic distance & WordNet Serge B. Potemkin Moscow State University Philological faculty

Distance and metrics Fundamental concept = distance between entities under consideration Semantic distance between words or concepts Metrical space axioms?

Distance is needed for: word sense disambiguation, determining the structure of texts, text summarization and annotation, information extraction and retrieval, automatic indexing, lexical selection, the automatic correction of word errors in text …

Approaches to distance measuring: Corpora-based Dictionary-based Roget-structured thesauri WordNet and other semantic networks

WordNet Synonym sets (synsets) Subsumption hierarchy (hyponymy / hypernymy), 3 meronymic (PART-OF) relations COMPONENT-OF, MEMBER-OF, SUBSTANCE-OF and their inverses; Antonymy, COMPLEMENT-OF

WordNet shortcomings: synsets – inadequate coverage Non-English versions 20 – 70% of English ( synsets for Russian) Extension is hard Distance measuring is controversial

Corpora-based approach Two words wa and wb are as close as often their neighbors (+/- 5 words) coincide. Ex. (distributional profile of the word) star: space 0.28, movie 0.2, famous 0.13, light 0.09, rich 0.04,..

Dictionary-based approach Two words wa and wb are as close as often words in definitions coincide. Ex. wa=linguistics wb=stylistics {the, study, of, language, in, general, and, of, particular, languages, and, their, structure, and, grammar, and, history} {the, study, of, style, in, written, or, spoken, language}. 2 words coincide in definitions

Bilingual dictionary approach Two words wa and wb are as close as often their equivalents coincide. ρ(Wa, Wb) = 1/Σni, Where Σ is the sum over all coinciding Russian equivalents and ni is the number of dictionaries where an equivalent occurs Or ρ(Wa, Wb) = Σ nai nbi /(||aR|| ||bR||)

Multidimensional scaling Semantic network is a graph nodes -- words edges -- links between words via bilingual lexicon || edge || = ρ(Wa, Wb) Immersion of graph is possible to N-dimensional space where N=number of words in the lexicon (>100000) Multidimensional scaling for visualization

New synonyms

1-neighborhood of accolade Links between synonyms (black) Links between synonyms from the dictionary (green) 2 isolated clusters.

Dominant in acerbity neighborhood ascerbity (терпкость) excluded cluster (bold lines) derived by Markovian process asperity (резкость) is the centre of the cluster

2 dominants for bicycling (wheel+crook)

Adjustable parameters - space dimension; - minimal number of dictionaries linking synonyms; - maximal distance from the word under consideration - maximal number of displayed words - word excluded from clustering …

Compare LDB with WordNet (accolade) SynsetWordNet # of syn. LDB # of syn. Synonyms in LDB award3n+2v80 accolade1n8commendation, praise, approbation, applause, + honorable mention, mention, positive mention honor = honour 4n+3v>100 laurels2n15 n – noun, v - verb

Controversy 1 Immediate hyperonym for the accolade synset in WordNet is symbol -- (an arbitrary sign (written or printed) that has acquired a conventional significance). Immediate hyperonym for commendation, (more frequent than accolade) is accolade synset Actually accolade is hyponym for commendation It is impossible to disambiguate accolade (bracket) from accolade (praise)

Controversy 2 WordNet: dog 1 – «domestic dog» hyperonym - canine, canid. further – mammal, …, entity Nor animal, neither pet, are linked with dog as hyperonyms. Tree structure is inadequate for semantic coding.

Conclusion Each meaning of the polysemic word could be coded as pair (wE, wR) in contrast to synset coding. Metrics superimposed over LDB enables homograph disambiguation and extraction of dominants Network has particular advantages over hierarchical representation of semantic relations