Software Globalization With Windows 2000/XP Houman Pournasseh Lead Program Manager
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Agenda Definitions Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Definitions World-Ready: Properly globalized and localizable. Globalization: The process of designing and implementing source code so that it can accommodate any local market (locale) or script. Localizability: Designing software code and resources such that resources can be localized for any local market (locale) without changing the source code. Localization: The process of adapting a product (including both text and non- text elements) to meet the language, cultural, and political expectations and/or requirements of a specific local market (locale).
Users and Locales: To define their geographical location, users set the location To define formatting for date, time…, users set the user locale To run legacy applications (non- Unicode), users set the system locale To enter text in different languages, users set the input locale To select a UI language, users set the UI language
New to Windows XP Nine (9) new locales added to previous list of 126. Punjabi, Gujarati, Telugu, Kannada, Kyrgyz, Mongolian (Cyrillic), Galician, Divehi, Syriac New Indic and Arabic scripts Gujarati, Gurmukhi, Telugu, Kannada, Syriac, Divehi More robust font display for East Asian languages. Improved Regional Settings options. Largely improved MUI support New location (GEO) Support for GB18030
Agenda Definitions Why invest in World-Ready products? Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Why invest in World Ready products? Get into international market (World Wide Web era) Create a single functionality binary to: Reduce development effort and cost Ease support and maintenance pain Sim-ship and avoid being your own competitor
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Transforms of Unicode UTF-7: 7 bit transformation format (rare) UTF-8 8 bit transformation format For transmission over unknown lines: e.g. Web pages Codepage number CP_UTF8 = UTF-16 and UCS-2 Microsoft uses UTF-16 little-endian as its standard for Unicode encoding UTF-32 and UCS-4
Windows 2000/XP: Unicode & Single Binary Built in support for hundreds of languages Any (well behaved) language Win32 application can run on any language version of Windows 2000/XP Native Unicode support for new scripts Support for supplementary characters
Unicode Encoding Non-Unicode applications behavior depends on user’s settings and makes data exchange between OS language versions impossible.
Legacy systems support Few exceptions for not fully Unicode apps: App has to run on Win9x and NT Existing Internet protocols and standards require special encoding Supporting apps that need to run on Win9x Create two separate binaries: one ANSI & one Unicode Register as ANSI and internally convert to/from Unicode as needed Use MSLU!
TCHAR LPTSTR wchar_t char wchar_t * char * For 8 bit and double-byte characters: typedef char CHAR;// 8 bit character typedef char *LPSTR; // pointer to 8 bit string For Unicode (“Wide”) characters: typedef unsigned short WCHAR; // 16 bit character typedef WCHAR *LPWSTR; //pointer to 16 bit string Data types
Win32 API prototypes Generic function prototypes: // winuser.h #ifdef UNICODE #define SetWindowText SetWindowTextW #else #define SetWindowText SetWindowTextA #endif // UNICODE A routines behavior under Windows 2000/XP W routines behavior under Win9x
String manipulation functions and macros Generic CRT 8 bit codepage Unicode _tcscpystrcpywcscpy _tcscmpstrcmpwcscmp Generic Win32 8 bit codepage Unicode lstrcpylstrcpyAlstrcpyW lstrcmplstrcmpAlstrcmpW Compile with –D_UNICODE to get Unicode version Compile with –DUNICODE to get Unicode version Text macro: #ifdef UNICODE #define TEXT(string) L#string #else #define TEXT(string) string #endif // UNICODE
Unicode ANSI Converting between ANSI and Unicode MultiByteToWideChar for codepage Unicode WideCharToMultiByte for Unicode codepage CP can be any legal codepage number or a predefined such as: CP_ACP, CP_SYMBOL, CP_UTF8, etc. Tips for writing Unicode: Use generic data types and function prototypes Replace p++/p-- with CharNext/CharPrev Compute buffer sizes in TCHAR
Porting an ANSI application to Unicode
Encodings in Web pages ANSI codepages or ISO character encodings Mono-lingual or restricted to one script Raw Unicode: UTF-16 OK for Windows NT networks Number entities: क OK for occasional use UTF-8: Recommended encoding Supported by IE 4.0+ and Netscape 4.0+
Setting web encoding HTML/DHTML: Tag in the head of the document "> XML: ?> ASP: Specify charset using ASP directives: Per session: %> Per page: %>
Setting encodings for.NET Class: System.Text Distinction between: File, Request, and Response encodings in code: Response.ContentEncoding= in page directive: %> in configuration file: <globalization requestEncoding= responseEncoding= fileEncoding= />
Universally encoded page
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Windows 2000/XP: NLS NLS APIs allow you to automatically adjust to users formatting preferences: Date: 07/04/01 is 平成 13 年 7 月 4 日 in Japan Time: 9:00PM is 21:00 in the France Currency: $1, is 1.000,00 $ in Germany Large Numbers: 123,456, is 12,34,56, in Hindi Sort Order: Germanä comes after a Swedishä comes after z
Locale awareness Eliminate implicit locale assumptions from code: #define ToUpper(ch) \ ((ch)<='Z' ? (ch) : (ch)+'A' - 'a') Query system to format locale-dependent data using NLS APIs and LCIDs. 6 bits 10 bits4 bits12 bits Reserved Sub- language Sort ID Primary Language Language ID
NLS APIs Getting and setting locales Querying locales LCID GetSystemDefaultLCID EnumSystemLocales LCID GetUserDefaultLCID() LCID GetThreadLocale() Setting locales BOOL SetThreadLocale(LCID dwNewLocale) BOOL SetLocaleInfo(LCID,…) // Works for standard locales only! No APIs to set System locale, User locale, and UI language
NLS APIs Querying locale information To retrieve information specific to a given locale: GetLocaleInfo Gives information for any valid locale (takes an LCID). LCTYPE input tells type of info to retrieve for a given locale (e.g. currency symbol, name of months…). Returns info in string buffer (LPTSTR). To retrieve information specific to a location: GetGeoInfo Gives information for any valid location (takes an LCID). SYSGEOTYPE input tells type of info to retrieve for a given location(e.g. LCID, Time zones…).
NLS APIs Formatting data To enumerate formats: EnumCalendarInfo(Ex) EnumDateFormats EnumTimeFormats To format data directly: GetCurrencyFormat GetDateFormat GetTimeFormat
String comparison A locale depending comparison: lstrcmp or lstrcmpi Locale independent comparison Win2000 & below: Locale = MAKELCID(MAKELANGID (LANG_ENGLISH, SUBLANG_ENGLISH_US), SORT_DEFAULT); ComapreString(Locale,...,...,...,...); Windows XP: CompareString(LOCALE_INVARIANT, …, …, …, …, …);
A locale aware application
Locales in web pages Defaults to the user locale Supported by IE4.x and Netscape 4.x A server variable that can be retrieved by: Request.ServerVariables(" HTTP_ACCEPT_LANGUAGE") A property of the Navigator object navigator.UserLanguage
Locale awareness in web pages To retrieve user locale: A server variable: Request.ServerVariables("HTTP_ACCEPT_LANGUAGE") A property of the navigator object: navigator.UserLanguage To set a locale: In DHTML: SetLocale("de") DateData = FormatDateTime(now(), vbShortDate) In ASP:
Locale awareness in.NET Class: System.Globalization Referenced as CultureInfo – set of preferences based on language and culture. Pattern: xx-XX, such as fr-CA, de-AT(RFC-1766) Setting the CultureInfo: Implicit: Picked up from User Locale Explicit: In code: Thread.CurrentThread.CurrentCulture = new CultureInfo (“de-DE”) In page directive: %> In config: />
Locale aware web site
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Handling Input methods Easiest: Using edit controls (recommended) Responding directly to user input Input locales (language + input method): HKL GetKeyboardLayout ActivateKeyboardLayout LoadKeyboardLayout Windows messages: WM_INPUTLANGCHANGEREQUEST WM_INPUTLANGCHANGE WM_IME*.* (for IME support only) WM_CHAR and WM_IME_CHAR
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Complex Scripts have one or more of the following attributes: Bi-directional (BiDi) reordering (Arabic, Hebrew)reordering Contextual shaping (Arabic, Indic family) Contextual shaping Display of combining characters (Arabic, Thai, Indic)combining characters Specialized word-breaking (Thai) Text Justification (Arabic)Justification Windows 2000/XP: Complex Scripts
Back Complex Scripts BiDi reordering
Back Complex Scripts Contextual Shaping
Back Complex Scripts Combining Characters
Back Complex Scripts Justification
Uniscribe Clients: Windows 2000/XP, Trident, Microsoft Office 2000/XP A collection of exported APIs (high and low level) Hides implementation details A shaping engine per language USER GDI LPK. DLL USP Application
Options to display text Plain text in application Standard edit control or Win32 API ( ExtTextOut / DrawText ). Simple formatted text In Win32 apps, use Richedit control. For Web pages, use Document Object Model (DHTML). Advanced formatting Use Uniscribe (see SDK and MSJ article).
Special considerations When dealing with BiDi, set RTL reading order and alignment SetTextAlign / GetTextAlign with TA_RIGHT ExtTextOut with ETO_RTLREADING DrawText with DT_RTLREADING To measure line lengths: Do not sum cached character widths Do use a GetTextExtent function or Uniscribe When displaying typed text: Do not output characters one at a time! Do save text in a buffer and display the whole string with Uniscribe or Win32 API
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Windows 2000/XP: Font support Introduction of OpenType fonts: Extended TTF with glyphs for PE, ME, Thai, Greek, Turkish, Cyrillic … Font fallback mechanism for CS and Eastern Asian scripts used by Uniscribe Font linking mechanism used by GDI
Font independency Win32 programming Not to do: Hard code font face names Assume a given font is installed Assume selected font supports the desired script To do: Use MS Shell Dlg face name in Dialog resources EnumFontFamiliesEx or ChooseFont to select fonts
Font independency In Web pages Avoid placing text formatting values into in-line style. Hello Declare text style in CSS files:.myStyle {font-size: 10pt; font-family: Arial;} Hello Use WEFT to embed fonts to your web pages (IE only):
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Windows 2000/XP: Multilanguage UI Multilanguage version of Windows 2000/XP allows you to: Switch the language of UI without rebooting Set the language of UI per user Add/Remove language modules Offer your own solution for a multilingual UI
Multilingual UI Applications Possible options One localized.exe per target language Eng.exeGer.exeJpn.exe Myapp.exeEng GerJpn Myapp.exeEng.dllGer.dllJpn.dll One multilingual language resource DLL One resource DLL per target language
Satellite DLL Initialize to current UI language. Windows 2000/XP: GetUserDefaultUILanguage() Down-level platforms: See “Writing Multilingual User Interface Applications” on Globaldev. Allow user to select UI language. Use naming convention, for example: res.dll Find all resource DLLs using FindFirstFile and FindNextFile Use LoadLibrary(Ex) to load DLL file
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References
Windows 2000/XP: Mirroring technology To create an automatic right-to-left layout of the user interface for localized versions of bidirectional languages (Arabic and Hebrew).
Coordinate transformation Origin (0,0) in upper RIGHT corner of window X scale factor = -1 X values increase from right to left Default (LTR) window OriginOrigin Increasing x 01 Mirrored (RTL) window OriginOrigin Increasing x 01
Controlling the mirroring style Per Process: GetProcessDefaultLayout SetProcessDefaultLayout (LAYOUT_RTL) Per window: CreateWindowEx (WS_EX_LAYOUTRTL | WS_EX_NOINHERITLAYOUT ) SetWindowLong Per DC: GetLayout / SetLayout LAYOUT_BITMAPORIENTATIONPRESERVED ;
Controlling the mirroring style Dialog Resources: Set WS_EX_LAYOUTRTL in dialog template Message boxes: Use MB_RTLLAYOUT option BitBlt/StretchBlt: Use NOMIRRORBITMAP flag
Mirrored bitmap!Off screen bitblt Mirroring common issues
BiDi & mirroring in web pages In a web context, mirroring and RTL reading order go hand-in-hand: Using DIR attribute would: Set the “right” alignment of the text Set the right_to_left reading order of the text Mirror the page context Leave the orientation of stationary elements To set DIR attribute: Html: At an element level DHTML object: document.Dir = "RTL“
Tips for BiDi web pages Directional images: Avoid explicit alignments: Obsolete usage of “align=left” in tables and cells Avoid absolute positioning of elements Remember: tables get mirrored automatically, use them for robust reversibility!
Mirrored DHTML
Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Conclusion & References Conclusion & References
Final Conclusions Benefits of investing in development of World-Ready applications are real Windows 2000/XP eases the pain and sets the standard The biggest task in implementing World- Ready applications is setting the designers and engineers mind-set to think GLOBAL
MSDN for latest documentation about new APIs Developing International Software for Windows 95 and Windows NT Windows 2000/XP Globalization: World-Ready Guide You are not World-Ready If… aliases: Resources