Download presentation
Presentation is loading. Please wait.
Published byΛυσιμάχη Αγγελοπούλου Modified over 6 years ago
1
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
Bulgarian Academy of Sciences
2
Second Wordnet Conference, Brno
Bulgarian WordNet The Bulgarian WordNet (BulNet) has been under development for two years within the framework of the BalkaNet project. The BalkaNet project (Multilingual Semantic Network for the Balkan Languages), aims to develop a multilingual resource representing semantic relationships in five Balkan languages (Bulgarian, Greek, Serbian, Romanian and Turkish). Each set of synonymous words in a given language is linked to the closest set in the Princeton WordNet2.0 via its ID number. 2 December 2018 Second Wordnet Conference, Brno
3
Second Wordnet Conference, Brno
BulNet – DCMB team The partners from Bulgarian site are Bulgarian Academy of Sciences and Plovdiv University. The Bulgarian WordNet is being developed by the Department of Computer Modeling of Bulgarian Language within the Institute for Bulgarian language - Bulgarian Academy of Sciences The DCMB BulNet team consists of small group of researchers – linguists, computational linguists, logicians and mathematicians. 2 December 2018 Second Wordnet Conference, Brno
4
Second Wordnet Conference, Brno
BulNet – current state The Bulgarian WordNet models nouns, verbs, and adjectives, and contains already word senses (towards ), where literals have been included (the ratio is 1.8). The distribution of synsets into parts of speech: Nouns – synsets Verbs – synsets Adjectives – synsets Adverbs – 4 synsets 2 December 2018 Second Wordnet Conference, Brno
5
Second Wordnet Conference, Brno
BulNet – current state Hypernym – Near_antonym – 1371 Holo_part – 989 Holo_member – 798 Derived – 778 Verb_group – 710 Also_see – 187 Subevent – 149 Be_in_state – 386 Cause – 105 Holo_portion – 63 Similar_to – 49 2 December 2018 Second Wordnet Conference, Brno
6
Second Wordnet Conference, Brno
Completeness Presence of all members from the chosen up to now Base Concepts within the framework of the BalkaNet project. Base Concepts 1 (1218 members) BC2 (3471 members) BC3 (4855 members) Lack of any "dangling relations" Lack of any “gaps” Presence of an appropriate interpretation definition for each synset 2 December 2018 Second Wordnet Conference, Brno
7
Second Wordnet Conference, Brno
Consistency The are no duplicated literals in a given synset. There are no identical or almost identical glosses of different synsets. There are no literals that coincide with their glosses. There are no duplicated relations between two synsets. Every difference in relations according to EWN is language specific and linguistically grounded. There are no hypernym cycles, as well as any relation loops inside BulNet. 2 December 2018 Second Wordnet Conference, Brno
8
Second Wordnet Conference, Brno
Main achievements Theoretical linguistic work Validation tests Dependencies between relations Combination of Bulgarian language resources Descriptive logic Design and development of tools WordNet Explorer WordNet Validator 2 December 2018 Second Wordnet Conference, Brno
9
Second Wordnet Conference, Brno
Validation tests Our approach to validation of WordNets includes three separate levels: Checking the syntax of the XML files Completeness checking of WordNets Checking for consistency in defining the semantic relations and glosses. Every level is distinguished with: Different degrees of complexity and significance Different possibilities for automatic data correction 2 December 2018 Second Wordnet Conference, Brno
10
Second Wordnet Conference, Brno
Validation tests The lowest level, which is also the easiest for processing and correction, is XML fails syntax. In the following cases automatic checking as well as automatic data correction is possible: Facultative empty tags Duplicated literals in a synset Sense numbers 2 December 2018 Second Wordnet Conference, Brno
11
Second Wordnet Conference, Brno
Validation tests In other cases where automatic correction is possible manual confirmation of replacements is necessary: Accepted ID standard Missing values of the obligatory tags Corespondence of BCS tags At least one literal in a synset 2 December 2018 Second Wordnet Conference, Brno
12
Second Wordnet Conference, Brno
Validation tests In some cases only validation is possible: No duplicated <ID> numbers No duplicated relations between two synsets No “gaps” No “dangling relations” No loops 2 December 2018 Second Wordnet Conference, Brno
13
Relations’ dependencies
Description of the dependencies between the relations: Hyponyms of two antonyms (nouns) should also be antonyms (woman – man; female actor – actor) Antonyms (nouns) should have equivalent holo_parts: woman - arm, head; man – arm, head. Hyponym should have the same mero_parts (for concrete nouns} as its hypernym (man – head, arm,… ; woman – head, arm, ..) Collective nouns that are holo/mero_members should share the same hypernym, not necessarily the immediate one (football team is an organization, as well as football league) Nouns that are holo/mero_portions should share the same hypernym, not necessarily the immediate one (coffee – substance; caffeine - substance) 2 December 2018 Second Wordnet Conference, Brno
14
Combining language resources
Three large Bulgarian resources: BulNet Bulgarian Syntax Dictionary – encoding the arguments of the verbs and their semantic features Bulgarian Grammatical Dictionary – encoding over lemmas are their corresponding word forms Mutual supplement Expansion of the resources Validation of the resources Uniform grammatical characteristics 2 December 2018 Second Wordnet Conference, Brno
15
Second Wordnet Conference, Brno
WordNet logic The DCMB team developed a uniform, efficient and powerful utility system for querying and exploring of WordNet – WordNet logic. Tailored for the WordNet developers needs Powerful enough for expressing complex statements and queries Fully decidable The formal background consists of WordNet Structure, WN Language, WN Semantics,WN Logic and WN Logic theorems. Tinko Tinchev, Stoyan Mihov, Svetla Koeva, Angel Genov: Logic for WordNet, Annual Journal of Sofia University, 2003 2 December 2018 Second Wordnet Conference, Brno
16
Second Wordnet Conference, Brno
WordNet Validator The WordNet Validator (WNV) is a Web-based system for validation (and correction) of WordNets completeness and consistency The WordNet Validator has the following main functions: automatic correction of xml syntax, validation of WordNet completeness and consistency, search for a given synset and visualization of semantic trees. The WordNet Validator can be used for practical work during constructing monolingual WordNets of Balkan languages as well as for evaluation of the completeness and consistency of different WordNet. 2 December 2018 Second Wordnet Conference, Brno
17
Second Wordnet Conference, Brno
2 December 2018 Second Wordnet Conference, Brno
18
Second Wordnet Conference, Brno
2 December 2018 Second Wordnet Conference, Brno
19
Second Wordnet Conference, Brno
Future directions 2 December 2018 Second Wordnet Conference, Brno
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.