Computer Science and Software Engineering University of Wisconsin - Platteville Note 9. Internationalization Yan Shi SE 3730 / CS 5730 Lecture Notes Part of the contents are from Ibrahim Meru’s presentation slides
Terminology Internationalization (I18N) —the process of designing a software application so that it can be adapted to various languages and regions without engineering changes —Making an application independent of any particular language or culture Localization (L10N) —the process of adapting internationalized software for a specific region or language by adding local-specific components and translating texts. Globalization (G11N) —G11N = I18N + L10N + multilingual support —Application can handle users from multiple countries/regions and languages (simultaneously)
Scope of I18N Example
Special Attention for G11N Design and Implementation —DO NOT hard code your texts in the code —Be aware of language and cultural differences Testing —Must have testers that recognize language and cultural defects Deployment and Sales —Must follow business rules and regulations of the countries in which you see —Copyrights and anti-piracy practices Installation —The install must be multi-language to direct users to their native language. Support and maintenance —Must be able to communicate in the language and during regular business hours. —All documentation must be kept synchronized in multiple languages with the product.
Character Sets ASCII: —the most popular character standard. —use only 7 bits maximum of 128 —adequate for English Code Pages: —a table of values describing the character set for a particular language —One code page per language/set of languages —There are hundreds of code pages —Different vendors may have difference code page numbering Unicode: —an effort to include all characters from previous code pages into single character enumeration. —use 2 bytes
Code Page 437 Standard in U.S. work for English and German 8-bit code point —0-127: ASCII — : international text characters
Interesting to Know (Alt code): How to type German on US keyboard? PART 1 - For this German character, type... These codes work with most fonts. Some fonts may vary. For the PC codes, always use the numeric (extended) keypad on the right of your keyboard and not the row of numbers at the top. (On a laptop you may have to use "num lock" and the special number keys.) German letter/symbolPC Code: Alt +Mac Code: option + ä0228u, then a Ä0196u, then A é0233E ö0246u, then o Ö0214u, then O ü0252u, then u Ü0220u, then U ß0223S
Some Other Code Pages Microsoft Windows OEM Code Pages: (US) 720 (Arabic) 737 (Greek) 775 (Baltic) 850 (Multilingual Latin I): works for most Western European languages.850 (Multilingual Latin I) 852 (Latin II): works for Central and Eastern European languages.852 (Latin II) 855 (Cyrillic) 857 (Turkish) 858 (Multilingual Latin I + Euro) 862 (Hebrew) 866 (Russian) 874 (Thai) 874 (Thai) 932 (Japanese Shift-JIS) 932 (Japanese Shift-JIS) 936 (Simplified Chinese GBK) 936 (Simplified Chinese GBK) 949 (Korean) 949 (Korean) 950 (Traditional Chinese Big5) 950 (Traditional Chinese Big5) 1258 (Vietnam) 1258 (Vietnam)
Size of Text Messages English requires fewer characters than most other western languages. As a rule of thumb, —French is 15% longer, —German is 25% longer. —Eastern languages, traditional or simplified Chinese, Japanese, and Korean require much fewer characters (2-3 character positions per word). Special consideration must be made for UI design and functionality to handle different length text messages of the languages supported. Message lengths also greatly complicates business forms and report designs. E.g.: “Contact customer support for help”
Keyboard Test Languages and cultures have different characters and special characters. Keyboards differ from country to country to support their character sets and usage patterns. These keyboards generate interrupts that must match the loaded code page.
German Keyboard
Arabic Keyboard
Traditional Chinese Keyboard
Hot Key Test We may want Hot keys and Shortcuts to be different because the words on the menus are different. —“Copy” alt-c, what should it be for “kopieren”? Hot key conventions differ – sometimes applications just stick with the English Hot key or short cut regardless of what the local command starts with.
Text Filter and Special Character Test Sometimes software will block other codes than ASCII. These codes may be needed to support non-English languages. Special characters in the middle of names may cause problems. —For example “O’Kelly”, ñ, ß, Ü.
Translation Test The sentence structure of typical English “S-V- O”, etc. Sentence structure may differ from language to language. Therefore, the software must be language sensitive w.r.t. sentence structure. —use variables in messages to assume any order:
Sorting Rules Where do the characters of a specific language need to fall into a collating sequence? This needs to be localized for people to use lists naturally. English sorts by normal ASCII value sequence. How to sort Chinese names?
Other Peripherals Printer: —Some printers does not support certain languages. —Testers must be aware of these non-I18N printers and test for compatibility. —Sizes of papers may also cause issues: A4 or Letter? Mouse with non-standard drivers Wireless support: GMS, CDMA, 3G, LTE Data storage: DVD, flash drive…
OS Localization Test There is not just Windows 7, it is Windows 7 German, French, Chinese, etc. Need to test completely on all supported OS localizations.
Data Format "01/02/03" ? Time zones and daylight savings? vs. 240,125 vs Money symbols vary: $125, > £125,000? Address formats Phone number formats Calendar formats Measurement units! (Mars Lander)
Colors Colors are interpreted differently among regions.
Icon Design Avoid humor, puns, slang, special, mythological, and religious symbols in icons. Do not require user to understand subtleties of originating language, culture. Ensure your icons are not offensive. —Thumbs up: insulting in Turkey —“Ok” sign: insulting in Brazil, other countries
Summary I18N, L10N, G11N Design Considerations: