Free Pascal compiler internationalisation Rimgaudas Laucius Institute of Mathematics and Informatics, Vilnius University Lithuania
Introduction Institute of Mathematics and Informatics, Informatics Methodology Department Software localisation Teaching of informatics and programming E-learning and standards Informatics terminology Vilnius University Localisation course
Localisation in Lithuania One of the four priorities emphasised in the strategic project for the development of the information society in Lithuania is: “to uphold the inheritance of Lithuanian language and culture implementing the information technologies and telecommunications”
Open Source in Lithuania Research which was carried out in 2004, “Open Source in Education” revealed that open source software integration into education has a large positive economical and also pedagogical effect Education requires high quality and fully localised software Open source software is more flexible in terms of localisation
Free Pascal compiler Excellent, open source compiler Works under all widely used operating systems Windows, Linux and others Widely used. Has been used in International, Baltic and national Lithuanian Olympiads in informatics for a few years already Replacement for obsolete Turbo Pascal system in Lithuanian schools
FPS
Compilers’ internationalisation Internationalisation is part of the software development process, so the internationalisation of development tools is very important Most contemporary software development tools are not internationalised enough Though this research is done on Free Pascal compiler, most of represented issues are common to most of compilers
Programming language standards Internationalisation relates with programming language standards Pascal programming language standards Standards of other languages
Examples of internationalised compilers There are not many of these examples One of the most well known internationalised programming system is LOGO Vector Pascal
Structure of Free Pascal Free Pascal is system made up of the compiler program itself and run-time library (RTL) Compiler and RTL interaction: Sometimes to change compiler one needs to change the RTL
Support of multilingual source code This is the first stage of compiler internationalisation There are many scripts which require more than the 8-bit character set
UTF-8 implementation Unicode ~ UTF-8 Some utilities used by compilers do not support pure Unicode (Unicode chars may be treated as pairs of 8-bit chars; example U+0900 ~ 09 00, (tab and end of string)) Allows step by step implementation of lexical extensions
Lexical extensions Strings Identifiers Directives Reserved words Operators Numbers
Strings WideString implementation issues –Compatibility with other systems –Ambiguity –Conversions between Unicode and other character sets
Ambiguity example procedure go(const s: WideString); begin... end; procedure go(const s: String); begin... end; begin Go('Hi'); end. Which overloaded procedures have to be called?
Unicode support layer Unicode support layer wraps OS APIs’ in an OS independent way. Under Win9x implements Microsoft Layer for Unicode (MSLU)
Identifiers Identifiers have to reflect clear meaning of object, be easy to comprehend and memorize. Best way to support these features is to allow use of identifiers written in vernacular language Unicode Standard Annex #31: Identifier and Pattern Syntax
Directives Names Parameters –Logical (ON, OFF) –Strings ({$warning Possible malfunctioning}) –File names ({$includepath..\inc})
Reserved words Unification myth –Compared 13 similar programming languages (Algol, Pascal, Modula, Ada, C, Java,…) –Only ~3% of reserved words are same –56% met only in particular language Possible unambiguous translation
Example of localised reserved words
Operators Unicode has all mathematical symbols needed to express mathematical operations Example:
Numbers There are various scripts to express decimal numbers. Example:
Decimal separator JAV, GB ‘.’ Most European countries ‘,’ Localisation of delimiter may cause ambiguity. Solution needs to extend syntax of numbers. 25,88 – real number 25, 88 – two numbers
Punctuation Spaces: general U+0020, nonbreaking U+00A0, ideographic U+3000, etc Quotes: “English”, "Lithuanian“, Etc
Bi-directional text Bi-directional text is an issue of text representation, not the compiler
Unicode file names support Handling of files requires OS API, so it have to be done via RTL’s Unicode support layer Compilers have to use MSLU under Win9x
Input/Output File input/output requires additional support for Unicode encoding Windows console does not support Unicode –It can be replaced but is it the best solution?
Localisation framework Strings and other resources have to be externalised for easy localisation Localisation kits have to be prepared
Questions? Thank you Contact