Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.

Similar presentations


Presentation on theme: "The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva."— Presentation transcript:

1 The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva Institute for Bulgarian Language, Bulgarian Academy of Sciences

2 The Bulgarian National Corpus (BulNC) – General Information The BulNC is developed by the Institute for Bulgarian Language at the Bulgarian Academy of Sciences Participants in the BulNC – the Department of Computational Linguistics and the Department for Bulgarian Lexicology and Lexicography The BulNC project aims at creating a large- scale, representative, general corpus of Bulgarian

3 http://www.ibl.bas.bg/BGNC_bg.htm

4 The Bulgarian National Corpus – General Information a large general monolingual corpus 400 million graphic words and more than 10 000 electronic documents fully morpho-syntactically and partially semantically annotated equipped with a user-friendly online search tool, developed at the Department of Computational Linguistics

5 Application of the BulNC in computational linguistics in theoretical research on the field of morphology, syntax, semantics, text linguistics etc. in lexicography in teaching Bulgarian in translation practice

6 The BulNC and Bulgarian Lexicography Application of the BulNC in creating different types of dictionaries, e.g.: the multi-volume explanatory Dictionary of the Bulgarian Language Dictionary of the New Words in Bulgarian (from the end of the 20th century and the first decade of the 21st century) Bulgarian-English bilingual dictionaries (The Oxford Bulgarian Pocket Dictionary and The Oxford Bulgarian Mini Dictionary)

7 Dictionary of the Bulgarian Language (DBL) – General Information DBL will consist of 20 volumes and it will include approximately 160 000 words 13 published volumes (since the 1970s) including more than 105 thousand entries in total Next two volumes (vol. 14 and 15) are being compiled and edited The specificity of DBL – it presents as fully as possible and in details the rich vocabulary of the Bulgarian language for the last 200 years

8 Dictionary of the Bulgarian Language – Lexical Resources A lexical index of information (7 million index cards), used as the main lexicographical resource until recently An electronic corpus for lexicographic purposes, compiled in the period 2005-2009 and included in 2009 in the Bulgarian National Corpus as its integral part

9 The application of the BulNC in the creation of the Dictionary of the Bulgarian Language The academic multi-volume DBL is the first general explanatory dictionary of the Bulgarian language based on corpus data A few volumes of the DBL were created or are being worked on applying the Bulgarian National Corpus (vol. 13, 14 and 15; vol. 5 in the second extended and revised edition)

10 Improvements in the dictionary-making process due to the application of the Bulgarian National Corpus 1. Methods concerning the creation of the list of words of the DBL 2. of lexicographically relevant information when defining the meaning of lexical units 2. Acquisition of lexicographically relevant information when defining the meaning of lexical units 3. Extraction and selection of illustrative material

11 Methods concerning the creation of the list of words of the DBL Automatically generated alphabetical-frequency list of the word forms in the BulNC as a source of expanding the list of words of the DBL as a source of expanding the list of words of the DBL as a base for selection of words to be included in the DBL as entries

12 Acquisition of lexicographically relevant information when defining the meaning of lexical units The BulNC – an useful resource which facilitates: the recognition of the separate meanings of lexical units the presentation of the Bulgarian lexis at a high level of granularity the extraction of linguistic data using an user- friendly search tool the extraction of linguistic data using an user- friendly search tool

13 Searching for units in the BulNC with regular expressions

14 Acquisition of lexicographically relevant information when defining the meaning of lexical units Information about the collocations of the words and their frequency The frequency data as a criterion for determining the order of the various meanings of a word in the dictionary entry (Čermák 2010)

15 Query for collocations of the noun небе ‘sky’

16 Acquisition of lexicographically relevant information when defining the meaning of lexical units Information about some semantic relations of a given word – available due to the Bulgarian WordNet (http://dcl.bas.bg/en/wordnet_en.html) http://dcl.bas.bg/en/wordnet_en.html Synonyms: query returns adjective буен ‘wild’ and its synonyms неукротим ‘untamable’, необуздан ‘unruly’, силен ‘strong’ Synonyms: query returns adjective буен ‘wild’ and its synonyms неукротим ‘untamable’, необуздан ‘unruly’, силен ‘strong’ Hypernyms: query for the nouns бульон ‘bouillon’ and таратор (Bulgarian cold yoghurt soup) returns their hypernym супа ‘soup’ Hypernyms: query for the nouns бульон ‘bouillon’ and таратор (Bulgarian cold yoghurt soup) returns their hypernym супа ‘soup’

17 Query for the adjective буен ‘wild’ and its synonyms

18 Query for the hypernym of the noun таратор ‘Bulgarian cold yoghurt soup’

19 Acquisition of lexicographically relevant information when defining the meaning of lexical units Information which facilitates the lexicographer when making decisions about qualificators of some words in the DBL – e.g. an obsolete word, a new word, a rarely used word

20 Extraction and selection of illustrative material The BulNC as a resource for optimization of searching for and systematizing language material Selection of sub-corpora in accordance with the user’s research purposes

21 Selection of a sub-corpus when looking for the forms of the word вода ‘water’

22 Conclusions Applying a corpus-based approach ensures greater objectivity of lexicographic decisions and as a result of this the quality of the dictionaries created is improved


Download ppt "The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva."

Similar presentations


Ads by Google