CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University
CIT3611 Week 5: Code sets2 Internationalisation - Basic Rules n Never hard-code translatable text n Do not reuse the same string in different context n 1 byte 1 character 1 glyph n Watch for strings with several parameters
CIT3611 Week 5: Code sets3 Internationalisation - Goals Making sure: n Your application is able to process text from any locale n The interface can be localised without changes in the source code n The documents or data created by your application are easy to localise
CIT3611 Week 5: Code sets4 Internationalisation - Code Sets Character set is like a "bag" of characters. Example: A, B, d, ñ n Code set, coded character set or code-page, is the same as the character set, but a specific value, the code (or code-point) affects each character. Example: A=65, B=66, d=100, ñ=241
CIT3611 Week 5: Code sets5 Code Sets - Get Your Facts Straight n The vocabulary pertaining to code sets is often used incorrectly. n The terms code set and code page are interchangeable. n Microsoft documentation is confusing regarding code sets. n Nadine Kano's book helps
CIT3611 Week 5: Code sets6 ANSI Windows not the real ANSI n The first version of Windows used ISO (Latin-1) for code set. Then Microsoft introduced 24 extra characters (codes from 0x80 to 0x9F) that are not part of Latin-1. n Noticeable in some of the fonts still shipped with Windows: MS Sans Serif has no glyph defined for these code-points. The code set for Windows US should be called Windows Latin- 1, or code-page 1252.
CIT3611 Week 5: Code sets7 "ANSI" not "Windows code set" n Some documents name the Windows code set "ANSI" even if when you use it in a different localised version of Windows, it is actually the Windows Cyrillic, or Windows Greek or Windows Turkish code set. n Same way the document uses "OEM" to refer to the DOS code-page, it should use a generic term for the Windows code set, rather than "ANSI."
CIT3611 Week 5: Code sets8 Don’t use ‘character sets’ or ‘charsets’ when you mean code sets n Code set is an implementation of the character set n Several code sets can implement the same character sets. In this case, the list of the characters supported is the same, but the codes are different. Eg. UCS-2 and UTF-8 are two different code sets, but they both implement the Unicode character set.
CIT3611 Week 5: Code sets9 Don’t mix up file format and file code set n People mix up the content and the container: the format of the file and its code set. They will say: "I saved this file in ASCII" when they really mean "I saved this file in Plain text." A plain text file could be in ASCII, but can also contain extended characters.
CIT3611 Week 5: Code sets10 Code Set - Families n DOS n ISO n Macintosh n Windows n IBM mainframe
CIT3611 Week 5: Code sets11 Code Sets - Unicode n Unicode an international character set n Has the principal scripts of the world n Unicode standard is foundation for the internationalisation and localisation of software n There are three levels of support for Unicode: 1: Combining characters not allowed 2: Avoid duplicate coded representations 3: All combining characters are allowed
CIT3611 Week 5: Code sets12 Han unification n To fit the tens of thousands of Chinese, Japanese and Korean ideograms in a 64-KByte space, Unicode uses the Han unification: where Japanese and Korean characters are derived from the Chinese characters. n In many cases the same symbol will mean the same thing.
CIT3611 Week 5: Code sets13 Character Composition n To support complex characters with diacritics, Unicode defines a generic way to encode a complex character. Instead of being coded in whole form, you can code any character with diacritics by using non-spacing marks. n Character composition is used, for example, to encode the Vietnamese characters.
CIT3611 Week 5: Code sets14 Surrogates n Hopefully you will not have to deal with surrogates. They are the mechanism put in place in Unicode to access the additional planes of ISO You can see them as "double- bytes," except they are double-wide-chars.
CIT3611 Week 5: Code sets15 Code Sets - Conversion n Converting from one code set to another is easy when you are only dealing with single-byte code sets.
CIT3611 Week 5: Code sets16 Screen-based help n plain text "Read Me" files, n tutorial files, n custom integrated help, n sample files and n stand-alone hypertext help.
CIT3611 Week 5: Code sets17 General Guidelines n Text Expansion n Jargon, Humor, Use of Gender- or Culture- Related Roles, Characteristics, or Issues n Consistency with Software, Hardware, and Documentation n Hypertext Links n Text Styles and Formatting
CIT3611 Week 5: Code sets18 General Guidelines cont. n On-Screen Controls n File Format
CIT3611 Week 5: Code sets19 Windows Online Help n "Title" Footnote Text n "Keyword List" Footnote Text n Definitions (Pop-up Topics)
CIT3611 Week 5: Code sets20 Prototyping the key to success n Effective prototyping may be the most valuable core competence an innovative organisation can hope to have (Michael Schreg) n ‘Spec Driven’ put much effort into developing a specification before proceding with production n ‘Prototype Driven’ begin with an early prototype, then proceed with many iterations
CIT3611 Week 5: Code sets21 Prototyping the essential medium of: n Information transmission n Interaction n Integration n Collaboration
CIT3611 Week 5: Code sets22 Work as play, play as work n You can ‘play your way’ to successful, innovative product development n At odds with traditional management models that champion predictability and control
CIT3611 Week 5: Code sets23 Supported by research n Research by Tabrizi & Eisenhart (Stanford) looked at 72 product dev projects in 36 countries in Asia, Nth America and Europe n Most effective were those that iterated constantly n Least were the hyper-organised, plan, plan planners n Strong prototyping cultures therefore produce strong products