INFuture 2009, Zagreb, 2009-11-5/7 17/2/19 Transcription and transliteration in a computer data processing Greta Šimičević Faculty of Humanities and Social.

Slides:



Advertisements
Similar presentations
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Advertisements

European Thesaurus on International Relations and Area Studies A multilingual terminological tool on international affairs Axel Huckstorf Stiftung Wissenschaft.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Learning to Read a Non- alphabetic Script - Chinese Or: “I have to learn how many characters?”
Introduction to Russian phonology and word structure
Binary Numbers.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Phonetics and Phonology
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Week 4 Number Systems.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
Archival information system ARHiNET Croatian national archival information system Vlatka Lemić Croatian State Archives, Croatia.
The OCLC-AMICAL RESPOND project: Leveraging WorldCat to connect international American universities.
TRANSLATION & THE HIGH TECH INDUSTRY. INTRODUCTION Translation has been existing ever since mythology began, passed the prophethood, and now in modern.
Data entry: Validation
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
PDF Dissertation Full Text Book Promotion & Service Co., Ltd. ByByByBy Jirawat Promporn Jirawat Promporn k.co.th
Subject Gateway KIV SUBJECT GATEWAY – WHAT IS IT? Internet based service To locate high quality information available on the Internet.
Language Identification of Web Data for Building Linguistic Corpora Marija Stupar, Tereza Jurić, Nikola Ljubešić Faculty of Humanities and Social Sciences.
Languages of Europe. Languages of Europe Europe is slightly larger than the United States, but the population is more than double. We speak English.
Identifying Series: Attributes of a Work and Attributes of an Expression: RDA Chapters 5 and 6 Module 2a.
10/12/98Organization of Information in Collections Form of Names -- Personal Names (cont), Corporate Names and Uniform Titles University of California,
AS Level ICT Data entry: Problems with errors. Garbage in; Garbage out If incorrect data is entered into a data management system, the results of any.
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
SEND – ILL Service Online Marina Mayer, Rudjer Boskovic Institute Library, Croatia Alen Vodopijevec, Rudjer Boskovic Institute Library, Croatia.
Levels of Linguistic Analysis
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
Why are Croatian Higher Education Institutions Present on Social Networks? Kruno Golubić / University Computing Centre / University of Zagreb
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
1 ENGLISH MANUSCRIPTS U210A/B1/Ch 2. 2 ENGLISH MANUSCRIPTS Introduction:  Focus: the historical dimensions of the linguistic forms of English.  The.
Chanchal C Sarkar DY. Director, Trade Policy Division Department of Commerce, Ministry of Commerce & Industry TBT Agreement : Key Principles.
An Efficient Hindi-Urdu Transliteration System Nisar Ahmed PhD Scholar Department of Computer Science and Engineering, UET Lahore.
French, German, Italian, Russian, and Spanish Bonjour, Hallo, Ciao Здравствулте, Hola.
TECHNICAL SEMINAR ON IMPLEMENTATION OF PHONETICS IN CRYPTOGRAPHY BY:- VICKY AGARWAL (4JN03CS078) GUIDED BY:- SREEDEVI.S LECTURER DEPT OF CS&E.
Languages of Europe Romance, Germanic, and Slavic.
Towards integrating European research information
DATA REPRESENTATION - TEXT
Binary Representation in Text
Binary Representation in Text
Standardization supporting cultural diversity 1: Character repertoires, ordering and assignments to the 12-key telephone keypad for European languages.
ENCODING AND SENDING FORMATTED TEXT
INFS 211: Introduction to Information Technology
ANALYSIS OF NEW MULTIMEDIA KEYBOARD «FA»
Comparative legal studies (Zinkovskiy Sergey, associate professor, PhD Department of the Theory and History of State and Law) Topic 3 Problems of harmonization.
Spreadsheet Vocabulary Terms
Business Administrative Support Vocabulary
International Standard Name Identifier
What is a Flow Chart ? An organized combination of shapes, lines, and text that graphically illustrates a process or structure A pictorial representation.
Digital Signature.
Homeroom Bell Ringer Take out agenda and open it to your behavior card. Take out signed progress report and give it to Ryan.
Looking Inside the machine (Types of hardware, CPU, Memory)
Digital Asset Management Part 11: Access
Cataloging Tips and Tricks
What is a Flow Chart ? An organized combination of shapes, lines, and text that graphically illustrates a process or structure A pictorial representation.
Scholastica C Ukwoma, Ph.D
CLIENT RELATIONSHIP MANAGEMENT KEEPING TRACK OF REQUESTS THE EASY WAY
EUROPEAN LANGUAGES EUROPEAN LANGUAGES © Brain Wrinkles.
Germanic, Slavic, and Romance
Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.
Levels of Linguistic Analysis
TBT Agreement : Key Principles
INFOCODING BASICS & EXAMPLES OF CURRENT USE
PDF Dissertation Full Text
How to use hash tables to solve olympiad problems
The ultimate in data organization
Copy all of the information that you see on each slide.
User Behavior in Tagging in the OPAC : the example of the Faculty of Humanities and Social Sciences Library in Zagreb Aleksandra Pikić, Dorja Mučnjak Library.
Languages of Europe Today you are going to draw a “tree” that will show the different types of languages that are spoken in Europe. It would be good.
English phonetic symbols (Consonant Symbols)
Chapter 31 - The Global Digital Library
Presentation transcript:

INFuture 2009, Zagreb, 2009-11-5/7 17/2/19 Transcription and transliteration in a computer data processing Greta Šimičević Faculty of Humanities and Social Sciences Ana Marija Boljanović Croatian Standards Institute INFuture 2009, Zagreb, 2009-11-3/6 Boljanović, Šimičević: Transcription and transliteration in a computer data processing

INFuture 2009, Zagreb, 2009-11-5/7 17/2/19 Scope Introduction Transcription and transliteration (possible procedures for transfer of one script into another) Standardization and other systems in the area of transliteration Computer systems and transliteration Conclusion Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19 Boljanović, Šimičević: Transcription and transliteration in a computer data processing

INFuture 2009, Zagreb, 2009-11-5/7 17/2/19 Introduction In the second half of 20th century digitalization of library operations changed the way libraries function maintenance problems in data management and search problems occured increased problems when databases became accessible on the Internet amongst various problems there are also problems of transcription and transliteration Databases were filled with data and became larger, clumsier and increasingly disorganized Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19 Boljanović, Šimičević: Transcription and transliteration in a computer data processing

INFuture 2009, Zagreb, 2009-11-5/7 17/2/19 Transcription Transcription is a transfer of pronunciations and phonemes of one language into graphical system for phonetic recording of phonemes of another language, i.e. pronunciation of words in one language adapted to pronunciation in another language and to this other language's vocalization. Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19 Boljanović, Šimičević: Transcription and transliteration in a computer data processing

Transcription Examples: surname of the Russian author Цветаева: Tsvetaeva (Eng.), Zwetajewa (Ger.), Cvetaeva (Ita.), Cvjetajeva (Cro.), Tswetaewa (Pol.). Russian surname Щедрин : in Croatian and Czech it is transferred as Ščedrin, in Polish it is Szczedrin, in English Shchedrin, in French Chtchedrine, in Dutch Sjtsjedrin and, in German Schtschedrin. (as much as seven letters of Latin alphabet are needed for the Cyrillic character) four Russian Cyrillic diphthongs я, ю, ë, щ for which there are no graphemes in Latin alphabet (Tютчев, Mаяковский).

Transcription is limited by a language system the most frequent differences between systems lie in the diverse phonetisation of certain graphemes that we transfer the differences appear due to existence, or else lack of, specific graphemes and phonemes in different systems ч, ж, ш, я, ю, ë, щ, э, ы, ь transfer process from one script into another could not bring about uniformity on the global level Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19

INFuture 2009, Zagreb, 2009-11-5/7 17/2/19 Transliteration Transliteration is a transfer of characters (graphemes) of one script into characters of another script (e.g. from Glagolitic into Latin, from Cyrillic into Latin, etc.) transfer should occur almost automatically and both ways, so that the regress into original text should be possible But, clearly – with 25 or 26 globally accepted characters in Latin script it is not possible to transfer 40 or 50 Cyrillic characters without occasional recourse to combinations of the usual Latin graphemes for the special characters. Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19 Boljanović, Šimičević: Transcription and transliteration in a computer data processing

Transliteration with 25 or 26 characters in Latin script it is not possible to transfer 40 or 50 Cyrillic characters without occasional recourse to combinations of the usual Latin graphemes for the special characters the same symbols should not be used in transliteration of different characters in any language and, using two or more characters for one character is only acceptable when there is no better solution Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19

Transliteration numerous "inconsistencies" in application of different transliteration rules in transfer of Russian Cyrillic into Latin of diphthongs я, ю,ë, щ, diacritical characters ч, ж, ш and graphemes such as ц, х nevertheless, transliteration has, on the global level, proven to be a much better procedure than transcription concerning harmonization of data entry into databases from other scripts into Latin, which brought about attempts at creating various international rules for transliteration Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19

Standardization in the area of transliteration INFuture 2009, Zagreb, 2009-11-5/7 17/2/19 Standardization in the area of transliteration In order to facilitate and improve communication and data and information exchange standards for transliteration of Cyrillic characters into Latin characters were developed by International Organization for Standardization - ISO ISO 9:1995 Information and documentation - Transliteration of Cyrillic characters into Latin characters - Slavic and non-Slavic languages Almost all member states (162) of ISO have adopted this standard (with some national modifications) The fact that it has 162 member states speaks best about this international organization's significance. Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19 Boljanović, Šimičević: Transcription and transliteration in a computer data processing

Standardization in the area of transliteration Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19

Other systems in the area of transliteration Cyrillic Scholarly ISO/R 9:1968 GOST 1971 UN ISO 9:1995; GOST 2002 ALA-LC BGN/PCGN А а a Б б b В в v Г г g Д д d Е е e e, ye † Ё ё ë yo ë, yë † Ж ж ž zh З з z И и i Й й j ĭ y К к k Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19

Computer systems and transliteration Computer systems differentiate various data by distinctiveness of characters. Thus it can happen that one and the same information entered into the computer system via both the transcription process and transliteration process would, in fact, signify two different pieces of information for the computer. If at the same time several different rules are applied for transcription and transliteration, we could from one semantically identical data item create, as far as computer is concerned, a multitude of different data items. Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19

Computer systems and transliteration Automatic transliteration of Russian http://www.russki-mat.net/trans2.html http://www.russki-mat.net/trans.htm

Conclusion Single transliteration standard still does not exist: bibliographic databases were established before this problem was recognized on the global level International standards became attack on language traditions of large groups problem to reconsile all language traditions As a unique technical aid in transfer of characters - transliteration process has been completely accepted Tendency towards greater unification of standards continues to exist Boljanović, Šimičević: Transcription and transliteration in a computer data processing 17/2/19

Thank you!