ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Union Catalog and Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow.
CS 4705 Relations Between Words. Today Word Clustering Words and Meaning Lexical Relations WordNet Clustering for word sense discovery.
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Punjabi WordNet. Development of Punjabi WordNet User enter a word in textbox and click submit button. The system will display the corresponding result.
1 MODULE 2 Meaning and discourse in English LEXICAL RELATIONS Lesson 2.
1 Analysing and teaching meaning SSIS Lazio - Lesson 1 prof. Hugo Bowles January 2007.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
BT Exact Technologies - Adastral Park, Ipswich July - October 2003 Linguistic Web Services for Semantic Web Dr. Vassil T. Vassilev London Metropolitan.
Machine translation Context-based approach Lucia Otoyo.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
WORDNET. THE WORDNET SYSTEM  Lexicographer files  Code: Lexico files  database  Search Routines and Interfaces.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
WordNet: Connecting words and concepts Peng.Huang.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
SINGULAR VALUE DECOMPOSITION (SVD)
LEXICAL RELATIONS Presented by ‘the big family’ group 3 Rauwan Harahap (Opung) Riza Nirmala Putri Salmah Silih Warni Siti Anifah Siti Juariyah.
Wordnet - A lexical database for the English Language.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Basics Components of Web Design & Development Basics, Components, Design and Development.
Automatic Writing Evaluation
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Talp Research Center, UPC, Barcelona, Spain
Sentiment analysis algorithms and applications: A survey
Lesson 11 Lexical semantics 1
LEXICAL RELATIONS Presented by ‘the big family’ group 3
Generating sets of synonyms between languages
LEXICAL RELATIONS IN DISCOURSE
Exploring and Navigating: Tools for GermaNet

Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Vector-Space (Distributional) Lexical Semantics
European Network of e-Lexicography
WordNet: A Lexical Database for English
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
WordNet WordNet, WSD.
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Lesson 11 Lexical semantics 1
Word Relations Slides adapted from Dan Jurafsky, Jim Martin and Chris Manning.
Improved Word Alignments Using the Web as a Corpus
The ultimate in data organization
Relations Between Words
Unsupervised learning of visual sense models for Polysemous words
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
CH 4 - Language semantics
Presentation transcript:

ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network Ivanka Atanassova University of Veliko Tarnovo “St. Cyril and Methodius” Svetlin Nakov Sofia University “St. Kliment Ohridski” Preslav Nakov University of California at Berkley

Introduction Computer dictionaries replace the paper ones Electronic dictionaries Thesauri Semantic networks – the next evolutionary step Incorporate dictionaries and thesauri Our work progressed in the same fashion: We started with a bilingual electronic dictionary. Later we transformed it into a semantic network with various terminological relations – polysemy, homonymy, synonymy, antonymy and hyponymy.

ArtsDict: Bilingual Terminological Dictionary Tool for creation, editing and usage of parallel bilingual electronic terminological dictionaries Used to build domain-specific terminological dictionaries Languages supported: Bulgarian Russian Dictionary entries consist of: Terms Glosses explaining term sense(s) The most complete terminological dictionaries of fine arts (not only electronic but also in general)

ArtsDict: Structure ArtsDict workspace is split horizontally to separate the two parallel dictionaries Each of the dictionaries is split horizontally: Terms stay on the left side Glosses stay on the right side Terms are: Single-term words Multiple-term words Terms listed along with their: Origin, singular, doublets, variants, stylistic synonyms Support for polysemy

Screenshot From ArtsDict

ArtsSemNet – Terminological Semantic Network ArtsSemNet is a lexical reference system for the fine arts terminology in Bulgarian and Russian The terms are organized into a semantic network on the base of several important lexical relations: Provides a specialised term browser for search and navigation between the terms and the corresponding terminological relations Polysemy Homonymy Synonymy Antonymy Hyponymy

ArtsSemNet: Bulgarian Fine Arts Terminology The Bulgarian lexical information consists of: 2900 lexemes 276 hyponym chains 157 antonym chains 483 absolute synonym chains 136 relative synonym chains 14 homonyms Polysemous terms distribution: 2571 with 1 sense 273 with 2 senses 49 with 3 senses 4 with 4 senses 2 with 5 senses 1 with 6 senses

ArtsSemNet: Russian Fine Arts Terminology The Russian lexical information consists of: 2664 lexemes 283 hyponym chains 134 antonym chains 458 absolute synonym chains 114 relative synonym chains 6 homonyms Polysemous terms distribution: 2313 with 1 sense 263 with 2 senses 56 with 3 senses 9 with 4 senses 2 with 5 senses 1 with 7 senses

ArtsSemNet: Creation We used ArtsDict for the extraction of homonyms, synonyms and polysemous terms For the extraction of hyponyms and antonyms we used two techniques: A formal technique for extraction of hyponyms sharing a common term-element (suffix/stem, affix, …) Given a target term-element ArtsDict generates a list of terms from the dictionary that contain it The list is manually examined afterwards A semantic technique (based on LSA) for extraction of hyponyms/antonyms with no common term-element

Latent Semantic Analysis Latent Semantic Analysis (LSA) Popular statistic technique for indexing, retrieval and analysis of textual data Assumes mutual latent dependency between the term and its context LSA works in two stages Learning: Given a collection of texts LSA builds a real-valued vector for each term and for each text Analysis: Тhe proximity between two texts/terms is calculated as the dot product between their normalised vectors (i.e. cosine).

Finding Hyponyms and Antonyms: The Semantic Technique Semantic technique for hyponyms extraction LSA is given as target a hypernym or its glosses LSA produces a ranked list of terms sorted according to the semantic proximity to the target hypernym The list is manually examined for hyponyms Passing to LSA the terms only, the glosses only or both are several variants we found useful

ArtsSemNet: Functionality Searching for terms: Exact and inexact searching Browsing the terms: Term glosses list Homonyms list Absolute synonyms list, relative synonyms list Antonym chains Hyponym chains (with target term hypernym or cohyponym) Supports changes between languages: Russian and Bulgarian are supported

Screenshot from ArtsSemNet

Related Work WordNet EuroWordNet and BalkaNet WordNet is a general lexical reference system for English language organised as a semantic network The lexemes in WordNet are represented as synsets A synset groups a term along with some of its synonyms which as a whole represent some particular sense of that term. Different senses of the same term have different synsets Relations between synsets include: hyponymy, meronymy, synonymy, antonymy and others EuroWordNet and BalkaNet Similar systems for European and Balkan languages Support cross-language synset index

ArtsSemNet and WordNet Terms are represented as synsets, no polysemy General English lexica Supports more relations: e.g. meronymy ArtsSemNet: Term-centered, partial synset support Domain-specific terminology Supports polysemy Distinguishes between absolute and relative synonyms Supports homonymy The user interface supports hyperlink navigation

Availability of ArtsDict and ArtsSemNet Both ArtsDict and ArtsSemNet are freely available for download from: http://cs.berkeley.edu/~nakov/artssemnet/ The lexical database is available in two variants: As a Microsoft Access .mdb file As a set of SQL-scripts for creating the database schema and populating the data in the tables Future Work Create cross-language index More powerful user interface, e.g. hierarchical visualisation, editor for terms and relations, etc.