Velina Slavova (Bulgaria) Vladimir Polyakov (Russia) THE METRICS OF COMPLEXITY BASED ON SYSTEM OF CASE RELATIONS IN TYPOLOGICAL STRUCTURE OF THE LANGUAGE (ON THE DATA OF DB «LANGUAGES OF THE WORLD») (*) * The research was supported by Russian Scientific Foundation of Humanities (grant № в)
Screenshots. Win Version
Source of Data for DB JM Encyclopedic issue “Jaziki Mira”(Languages of the World) – 18 volumes, printed by Institute of Linguistics of Russian Academy of Science from 1993 to Large Encyclopedic Dictionary. Linguistics (Edited by Yarceva V.N.) – includes interpretation of all terms of model of DB.
List of some Encyclopedic Publications “Jaziki Mira”(Languages of the World) Languages of the world: Uralic (1993). Languages of the world. Paleoasiatic languages. Мoscow: Publ. “Indricк”. (1996) p. Languages of the world: Turkic. Мoscow: Publ. “Indricк”. (1997) p. Languages of the world: Mongolic languages. Manchu-Tungus languages. Japan. Korean. (Ed.: Kibrik A.A., Rogova N.B., Romanova O.I.). Мoscow: Publ. “Indricк”. (1997) p. Languages of the world: Iranian languages. I. South-Western Iranian languages. Мoscow: Publ. “Indricк”. (1997) p. Languages of the world: Iranian languages. II. North-Western Iranian languages. Мoscow: Publ. “Indricк”. (1999). – 302 p. Languages of the world: Dardic and Nuristani languages. Мoscow: Publ. “Indricк”. (1998) p. Languages of the world: Iranian languages. III. East Iranian languages. Мoscow: Publ. “Indricк”. (1999) p. Languages of the world: Germanic languages. Celtic languages. Moscow: Publ. “Academia”. (1999) p. Languages of the world: Caucasian languages. RAS. Institute of Linguistics. Moscow: Publ. “Academia”. (2001) p. Languages of the world: Romance languages. Moscow: Publ. “Academia”. (2001) p. Languages of the world: Indo-Aryan languages of Ancient and Middle Period. Moscow: Publ. “Academia”. (2004) p. Languages of the world: Slavonic languages. RAS. Institute of Linguistics. /Ed. A.M. Moldovan, S.S. Skorvid, A.A. Kibrik/ Moscow: Publ. “Academia”. (2005) p. Languages of the world: Baltic languages. RAS. Institute of Linguistics. /Ed. V.N.Toporov, M.V.Zavyalov, A.A. Kibrik /. Moscow: Publ. “Academia”. (2006), 224 p.
Dictionary and source books Dictionary Two of 18 source books
Characteristics of Data Base “Languages of the World” Content The Data Base “Languages of the World” has the following quantitative characteristics. - contains more than 3800 features - the number of languages is 315 Eurasian languages - contains the description of the following spheres of language: phonetics, morphology, syntax. - representation of data: binary In Data Base “Languages of the World” the following language families and unities are represented: Austroasian, Austronesian, Altaic, Afroasian, Indoeuropean, Caucasian, Paleoasian, Sinotibetic, Uralic, Hurrito-Urartean. DB contains the description of languages-isolates: Ainu, Nivch, Burushaski, Sumeran, Elamite. The unique peculiarity of Data Base “Languages of the World” is a large collection of extinct languages description, that includes 54 essays. There is no analogues of such detailed and systematic description of exinct languages. The main principles forming of the model of language description are binarity, hierarchicity and paradigmaticity.
Task Formulation 1.Grammatical constructions are supposed to require different resources of the brain in processing. 2.There is another supposition that the total number of the resources of the brain aimed at processing of the volume, which is approximately equal in the meaning, must be constant. 3.Semantic cases can be an example of a complex construction for the verification of these statements (Fillmore’s cases). 4.The DB “Jazyky Mira” contains semantic cases that form a rather wide paradigm.
Example Let’s study an example of the accusative case “Суд обвинил Вас-ю в краже.” “The court accused Basil of robbery.” In the Russian language case is marked by a form of the noun (Вас-ю) and by a preposition (в), and in the English language – only by preposition (of).
Method of Data Processing Velina Slavova used the data of DB “Jazyki Mira” in order to receive a more convenient representation of the case paradigm. After a rather sophisticated reduction we received the first results that show examples of correlation of different case systems.
Case description in DB. Scope of the research. In DB JM we have 405 grammar features devoted to case system (in the Part number of Model). In this research only actant case meaning were investigated (140 grammar features ). They were divided in six fragments: --subject/object --contrastive case formation of subject --contrastive case formation of object --method of expressing subject--object-meanings --other actant cases -case of nominal predicate. At the first step only four fragments were investigated.
Examples of case description --subject/object ---absolutive ---absolutive/relative ---dative ---narrative ---nominative/accusative ---nominative/accusative/genitive ---nominative/accusative-genitive ---nominative/accusative/indefinite accusative ---nominative/acusative/genitive/partitive ---nominative/accusative/privative/sociative ---nominative/accusative/locative ---nominative/accusative/partitive ---nominative/dative-accusative ---nominative/narrative ---nominative/partitive ---nominative/genitive ---nominative/genitive/partitive ---nominative/general indirect ---nominative/ergative ---nominative/ergative/genitive At left the part of “subject/object” paradigms in DB is shown. At right fragment of description of the English language is shown *LANGUAGE DENOMINATION.English ………………………………………………… CASE MEANINGS.actant case meanings..subjective/objective...general case/accusative..contrastive case formation...of object....nouns and pronouns..method of expressing affixes...word order...auxiliary of attributive relation..prepositional of possesive relation..prepositinal construction..possesive affix at possesor's of locative relations..method of expression...prepositions ………………………………………………..
Metrics of complexity For each six part the own metrics of complexity was developed. Part of case description (Complex characteristics) Type of feature codingMetrics --subject/objectParadigma – only one choiceMaximal number of cases marked in language --contrastive case formation of subject Multi-choiceNumber of features presented in language --contrastive case formation of object Multi-choiceNumber of features presented in language --method of expressing subject--object- meanings Multi-choiceNumber of features presented in language
Correlation Analysis We can see good correlations between three complex characteristics (marked by yellow).
Factor Analysis We have two groups of factors (# 1 – yellow, # 2 - blue)
Tree Analysis The distances between the languages following this “SO syntactic rules complexity” measure seem to keep languages from some genealogic groups closed together. Nevertheless, it is seen that Indo-European languages are VERY dispersed. OLD languages seem to stay a part!
ANALYSIS OF RESULTS 1.The hypothesis about the preservation of the complexity of the grammar structure of the language on a certain level found its confirmation. The study showed that languages with a complex case paradigm have simpler grammatical means of expressing cases and fewer differences in the description of cases for subject/object. Languages with a simple case paradigm have more complex means of expressing case relations and have more differences in the description of cases for subject/object. Such dichotomy explains 76% variations of the content of the DB “Jazyki Mira” 2.In general such description of the case system (as two groups of factors) correlates well with the genealogical tree. The exception is Indo-European language family, which can be conditioned by a big geographical spread of EU languages and, consequently, intensive borrowing during areal contacts. This hypothesis requires additional check.
The present report is called upon to show that DB “Jazyki Mira” is an interesting resource for studying the complexity of different grammar parts of the language. We have only received the first experience. The methods and approaches are still at the stage of establishment and development. Works in this direction will be continued. AS A CONCLUSION
Thank you for your attention Contacts: