Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
Building Wordnets Piek Vossen, Irion Technologies.
ICE1341 Programming Languages Spring 2005 Lecture #6 Lecture #6 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
BalkaNet project overview Dan Tufiş Dan Cristea Sofia Stamou RACAI UAIC DBLAB.
ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
Improved TF-IDF Ranker
Logic.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
Section 4: Language and Intelligence Overview Instructor: Sandiway Fong Department of Linguistics Department of Computer Science.
Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02.
Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Course G Web Search Engines 3/9/2011 Wei Xu
Mining and Summarizing Customer Reviews
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Antonym Creation Tool Presented By Thapar University WordNet Development Team.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
CODD’s 12 RULES OF RELATIONAL DATABASE
Oana Adriana Şoica Building and Ordering a SenDiS Lexicon Network.
Building an Ontology of Semantic Web Techniques Utilizing RDF Schema and OWL 2.0 in Protégé 4.0 Presented by: Naveed Javed Nimat Umar Syed.
Configuration Management (CM)
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool) Ján GENČI Technical University of Košice, Slovakia
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Integrating lexical units, synsets and ontology in the Cornetto Database Piek Vossen 1, 2, Isa Maks 1, Roxane Segers 1, Hennie van der Vliet 1 1: Faculty.
Wordnet - A lexical database for the English Language.
Chapter 3 Part II Describing Syntax and Semantics.
Artificial Intelligence 2004 Ontology
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
Digital libraries and web- based information systems Mohsen Kamyar.
Understanding RDF. 2/30 What is RDF? Resource Description Framework is an XML-based language to describe resources. A common understanding of a resource.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Annotation Framework & ImageCLEF 2014 JAN BOTOREK, PETRA BUDÍKOVÁ
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Punjabi WordNet Development Thapar University & Punjabi University Patiala.
Detecting and Exploiting Figurative Language in WordNet Wim Peters Department of Computer Science University of Sheffield.
Extending Princeton WordNet withcompositional semantics Luchezar Jackov Institute for Bulgarian Language Bulgarian Academy of Sciences.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Ontologies Introduction to Computational Linguistics – 23 March 2016.
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Comparing Two Thesaurus Representations for Russian
CSC 594 Topics in AI – Applied Natural Language Processing
WordNet: A Lexical Database for English
CSc4730/6730 Scientific Visualization
WordNet WordNet, WSD.
Data Model.
A method for WSD on Unrestricted Text
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Presentation transcript:

Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language Bulgarian Academy of Sciences

Second Wordnet Conference, Brno Bulgarian WordNet The Bulgarian WordNet (BulNet) has been under development for two years within the framework of the BalkaNet project. The BalkaNet project (Multilingual Semantic Network for the Balkan Languages), aims to develop a multilingual resource representing semantic relationships in five Balkan languages (Bulgarian, Greek, Serbian, Romanian and Turkish). Each set of synonymous words in a given language is linked to the closest set in the Princeton WordNet2.0 via its ID number. 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno BulNet – DCMB team The partners from Bulgarian site are Bulgarian Academy of Sciences and Plovdiv University. The Bulgarian WordNet is being developed by the Department of Computer Modeling of Bulgarian Language within the Institute for Bulgarian language - Bulgarian Academy of Sciences. http://ibl.bas.bg/departments_en6.htm The DCMB BulNet team consists of small group of researchers – linguists, computational linguists, logicians and mathematicians. 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno BulNet – current state The Bulgarian WordNet models nouns, verbs, and adjectives, and contains already 17 291 word senses (towards 20.01.2003), where 31 164 literals have been included (the ratio is 1.8). The distribution of synsets into parts of speech: Nouns – 12 223 synsets Verbs – 3 408 synsets Adjectives – 1 656 synsets Adverbs – 4 synsets 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno BulNet – current state Hypernym – 14 999 Near_antonym – 1371 Holo_part – 989 Holo_member – 798 Derived – 778 Verb_group – 710 Also_see – 187 Subevent – 149 Be_in_state – 386 Cause – 105 Holo_portion – 63 Similar_to – 49 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Completeness Presence of all members from the chosen up to now Base Concepts within the framework of the BalkaNet project. Base Concepts 1 (1218 members) BC2 (3471 members) BC3 (4855 members) Lack of any "dangling relations" Lack of any “gaps” Presence of an appropriate interpretation definition for each synset 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Consistency The are no duplicated literals in a given synset. There are no identical or almost identical glosses of different synsets. There are no literals that coincide with their glosses. There are no duplicated relations between two synsets. Every difference in relations according to EWN is language specific and linguistically grounded. There are no hypernym cycles, as well as any relation loops inside BulNet. 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Main achievements Theoretical linguistic work Validation tests Dependencies between relations Combination of Bulgarian language resources Descriptive logic Design and development of tools WordNet Explorer WordNet Validator 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Validation tests Our approach to validation of WordNets includes three separate levels: Checking the syntax of the XML files Completeness checking of WordNets Checking for consistency in defining the semantic relations and glosses. Every level is distinguished with: Different degrees of complexity and significance Different possibilities for automatic data correction 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Validation tests The lowest level, which is also the easiest for processing and correction, is XML fails syntax. In the following cases automatic checking as well as automatic data correction is possible: Facultative empty tags Duplicated literals in a synset Sense numbers 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Validation tests In other cases where automatic correction is possible manual confirmation of replacements is necessary: Accepted ID standard Missing values of the obligatory tags Corespondence of BCS tags At least one literal in a synset 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Validation tests In some cases only validation is possible: No duplicated <ID> numbers No duplicated relations between two synsets No “gaps” No “dangling relations” No loops 2 December 2018 Second Wordnet Conference, Brno

Relations’ dependencies Description of the dependencies between the relations: Hyponyms of two antonyms (nouns) should also be antonyms (woman – man; female actor – actor) Antonyms (nouns) should have equivalent holo_parts: woman - arm, head; man – arm, head. Hyponym should have the same mero_parts (for concrete nouns} as its hypernym (man – head, arm,… ; woman – head, arm, ..) Collective nouns that are holo/mero_members should share the same hypernym, not necessarily the immediate one (football team is an organization, as well as football league) Nouns that are holo/mero_portions should share the same hypernym, not necessarily the immediate one (coffee – substance; caffeine - substance) 2 December 2018 Second Wordnet Conference, Brno

Combining language resources Three large Bulgarian resources: BulNet Bulgarian Syntax Dictionary – encoding the arguments of the verbs and their semantic features Bulgarian Grammatical Dictionary – encoding over 83 000 lemmas are their corresponding word forms Mutual supplement Expansion of the resources Validation of the resources Uniform grammatical characteristics 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno WordNet logic The DCMB team developed a uniform, efficient and powerful utility system for querying and exploring of WordNet – WordNet logic. Tailored for the WordNet developers needs Powerful enough for expressing complex statements and queries Fully decidable The formal background consists of WordNet Structure, WN Language, WN Semantics,WN Logic and WN Logic theorems. Tinko Tinchev, Stoyan Mihov, Svetla Koeva, Angel Genov: Logic for WordNet, Annual Journal of Sofia University, 2003 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno WordNet Validator The WordNet Validator (WNV) is a Web-based system for validation (and correction) of WordNets completeness and consistency The WordNet Validator has the following main functions: automatic correction of xml syntax, validation of WordNet completeness and consistency, search for a given synset and visualization of semantic trees. The WordNet Validator can be used for practical work during constructing monolingual WordNets of Balkan languages as well as for evaluation of the completeness and consistency of different WordNet. 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno 2 December 2018 Second Wordnet Conference, Brno

Second Wordnet Conference, Brno Future directions 2 December 2018 Second Wordnet Conference, Brno