Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program Manager Microsoft Corporation
Agenda: Overview of character encoding, Unicode Guidelines for supporting complex scripts Right-to-left layout of applications Multilingual User Interface
Overview of Character Encoding and Unicode
Why do character set differences matter? Historically, they fragmented code bases for both Windows and applications Single byte: European editions Single byte: European editions Double byte: Far East editions Double byte: Far East editions Bi-directional: Middle East editions Bi-directional: Middle East editions Make it difficult to share data Make it difficult to develop multilingual applications
Example: Multiple Hebrew Character Encodings 8bit Hebrew encodings still in use Windows codepage 1255 Windows codepage 1255 OEM (DOS) codepage 862 OEM (DOS) codepage 862 Visual Hebrew encodings (many exist) Visual Hebrew encodings (many exist)
Example: Multiple Arabic Character Encodings 8bit Arabic encodings supported in Internet Explorer 4.0/CS ASMO-708 ASMO-708 DOS 720 DOS 720 ISO ISO Windows Codepage 1256 Windows Codepage 1256 Other proprietary encodings Other proprietary encodings
Logical vs Visual Encoding Logical: Storage order is same as typing order Storage order is same as typing order Allows natural text processing: Allows natural text processing: Search Search Resizing (e.g., in web pages) Resizing (e.g., in web pages) IPC: Select, cut & paste IPC: Select, cut & paste Visual Natural text processing difficult or impossible Natural text processing difficult or impossible Cannot always map back to logical order Cannot always map back to logical order
What is Unicode? A 16-bit character encoding A mapping of characters to numbers A mapping of characters to numbers Syntax rules for display of complex scripts Syntax rules for display of complex scripts Not a font or glyph encoding! Not a font or glyph encoding! Not a sort algorithm! Not a sort algorithm! Includes all characters in common use in modern scripts (and others) Basis for the ISO character encoding standard Native text encoding for Windows NT
Unicode ™ / ISO 16-bit international character encoding Windows 2000 uses Unicode version 2.0 0x0000 0xFFFF Punctuation Future use ASCII Private use Compatibility Indian Greek Arabic, Hebrew Latin Ideographs (Hanzi, Kanji, Hanja) Symbols Hangul Kana Thai A FF964F (null)
Relatives of Unicode ISO/IEC bit ISO standard of 64K X 64K “planes” 32 bit ISO standard of 64K X 64K “planes” Unicode repertoire is plane 0 Unicode repertoire is plane 0 UTF-7 7 bit transformation format 7 bit transformation format Not widely used Not widely used UTF-8 8 bit transformation format 8 bit transformation format Used in web pages and some Used in web pages and some
Unicode in Win32: the W and A Entry Points Two kinds of window classes: Unicode, ANSI Win32 API has two versions of most functions: “W” (wide) version handles Unicode “W” (wide) version handles Unicode “A” (ANSI – ) assumes the system default code page (character encoding) “A” (ANSI – ) assumes the system default code page (character encoding)
Unicode in Win32 … Macros resolve to W or A entry point Example: Macro for RegisterClassEx #ifdef UNICODE #define RegisterClassEx RegisterClassExW #else #define RegisterClassEx RegisterClassExA #endif To create Unicode application: Compile with –DUNICODE or Compile with –DUNICODE or Use W routines explicitly Use W routines explicitly
For Applications that Must Also Run on Windows 98… Use Unicode everywhere with single binary, two code paths: On Windows NT use W entry points On Windows NT use W entry points On Windows 98, convert Unicode ANSI, use A entry points On Windows 98, convert Unicode ANSI, use A entry points See sample GLOBALDV for example See sample GLOBALDV for example See April Microsoft Systems Journal for details and other options
Summary: Use Unicode if you can! Represent all text with one unambiguous encoding Support multilingual text easily Avoid special processing for variable byte- length characters Use standard encoding recognized throughout the industry and the world Support new scripts that are only supported through Unicode
Guidelines for Supporting Complex Scripts in Applications
1. Displaying Complex Scripts in Plain-text In Win32 apps use standard edit control Use standard win32 API display functions Win32 APIs: ExtTextOutW or DrawTextW Win32 APIs: ExtTextOutW or DrawTextW ScriptString API in Uniscribe ScriptString API in Uniscribe
Pitfalls in Enabling for Complex Scripts When displaying typed text: Do not output characters one by one! Do not output characters one by one! Do save text in a buffer and display the whole string with Uniscribe or Win32 API Do save text in a buffer and display the whole string with Uniscribe or Win32 API To measure line lengths: Do not sum cached character widths Do not sum cached character widths Do use a GetTextExtent function or Uniscribe Do use a GetTextExtent function or Uniscribe
2. Displaying Complex Scripts in Simple Formatted Text In Win32 applications use rich edit control In web pages for Internet Explorer 5.0, use Document Object Model
3. Displaying CS in Text with Advanced Formatting and Layout Use script APIs (“Uniscribe”) See MSJ article of November 1998
Overview of Uniscribe Background and Purpose of Uniscribe Low level APIs High level APIs For details see November 1998 MSJ article
The Uniscribe DLL: USP10.DLL Platforms Windows 2000 Windows 2000 Windows NT 4 Windows NT 4 Windows 98 Windows 98 Windows 95 (excluding Far East) Windows 95 (excluding Far East) Single worldwide binary Installs with Windows2000, IE5, Office 2000
Hides language details Syllable structure (Indian, Thai) Contextual shaping (Arabic, Indic) Caret placement (all) Wordbreak (Thai) National digits (Arabic, Indic, Thai) Bidirectional layout (Arabic, Hebrew)
Hides Unicode OS details APIs are Unicode on all platforms Hides glyph codes Hides font differences Shaping tables Shaping tables Fixed repertoire fonts Fixed repertoire fonts
Uniscribe Structure Uniscribe Arabic shaping engine Layout XtoCP & CPtoX Justify Shape, Place and TextOut Unicode BiDi algorithm Itemize GDI Client Measurer Renderer Display Caret Mouse ExtTextOut ETO_ GLYPH_INDEX GetCharABC- WidthsI GetGlyphOutline CMAP & width tables, Open- Type library Hindi shaping engine Tamil shaping engine Thai shaping engine Vietnamese shaping Hebrew engine
Shaping engines Per script Understand language rules Understand font features OpenType provides full control OpenType provides full control Many older fixed layout fonts Many older fixed layout fonts
USERGDI LPK. DLL Uni- scribe Application
Low level APIs Support Formatting text Style runs Style runs Measurement Measurement Paragraph filling Paragraph filling Rendering Rendering Information needed for font fallback
Summary Script… Itemize Itemize Shape, Place Shape, Place Break, Layout Break, Layout TextOut TextOut CPtoX, XtoCP CPtoX, XtoCP
High level APIS Purpose Analysis Display Font fallback
Purpose For Windows 2000 ExtTextOut ExtTextOut DrawText DrawText System edit control System edit control Cross-platform Unicode plaintext display Easier than low level APIs
Summary of ScriptString APIs: ScriptString… Analyse Analyse … query analysis... … query analysis... Out Out Free Free Provides simple font fallback
Implementing Right-to-left Layout in Applications
Background On RTL Layout (“Mirroring”) For BiDi Localization Localized Arabic and Hebrew Windows ® is laid out from Right to Left In the past was done “ad hoc” or not at all Windows 2000 and BiDi Windows 98 include mechanisms to “automatically” mirror shell and applications Also helpful for multilingual user interface support
Mirroring in System Based on Coordinate Transformation Origin (0,0) in upper RIGHT corner of window X scale factor = -1, x values increase from right to left Default (LTR) Window OriginOrigin Increasing x 01 Mirrored (RTL) Window OriginOrigin Increasing x 01
More Background on Mirroring… Developers use programming interfaces and Windows style bits Automatic inheritance of RTL property: Child window of RTL window defaults to RTL Child window of RTL window defaults to RTL You can disable inheritance of RTL Property You can disable inheritance of RTL Property APIs provided to disable mirroring of bitmaps
Implementing Mirroring in Win32 Applications: Standard Windows Use SetProcessDefaultLayout: Affects all Windows created thereafter Affects all Windows created thereafter SetProcessDefaultLayout(LAYOUTRTL) ; SetProcessDefaultLayout(LAYOUTRTL) ; SetProcessDefaultLayout(0) ; // Reset to LTR SetProcessDefaultLayout(0) ; // Reset to LTR Or call CreateWindowEx: Use extended style WS_EX_LAYOUTRTL Use extended style WS_EX_LAYOUTRTL To inhibit mirroring in child windows, also set WS_EX_NOINHERITLAYOUT To inhibit mirroring in child windows, also set WS_EX_NOINHERITLAYOUT
Changing Layout of Existing Window BOOL IsRTLLayout ; // TRUE iff window is to be mirrored //... Get new value of IsRTLLayout LONG lExStyles = GetWindowLongA(hWnd, GWL_EXSTYLE) ; // Check whether new layout is opposite current layout if(!!(IsRTLLayout) != !!(lExStyles & WS_EX_LAYOUTRTL)){ lExStyles ^= WS_EX_LAYOUTRTL ; // Toggle layout lExStyles ^= WS_EX_LAYOUTRTL ; // Toggle layout // Set extended styles to new value // Set extended styles to new value SetWindowLongA(hWnd, GWL_EXSTYLE, lExStyles) ; SetWindowLongA(hWnd, GWL_EXSTYLE, lExStyles) ; // Update client area // Update client area InvalidateRect(hWnd, NULL, TRUE) ; InvalidateRect(hWnd, NULL, TRUE) ;}
Controlling Mirroring of a Device Context SetLayout(HDC hDc, DWORD dwLayout) dwLayout = 0 ; // will layout LTR dwLayout = LAYOUTRTL ;// will layout RLT dwLayout = LAYOUTRTL | LAYOUT_BITMAPORIENTATIONPRESERVED ; // will layout RTL, but not bitmaps GetLayout(HDC hDc, DWORD *pdwLayout) Tells what the layout settings are for a hDc
Mirroring in Win32 Applications: Dialogs Set WS_EX_LAYOUTRTL in dialog template Visual Studio 6 Dialog editor: Has option for RTL layout Has option for RTL layout BUG in Visual Studio 6: BUG in Visual Studio 6: Writes WS_EX_LAYOUT_RTL to RC file! Writes WS_EX_LAYOUT_RTL to RC file! Must correct RC file by hand to compile Must correct RC file by hand to compile Will be fixed in future version Will be fixed in future version
Mirroring in Win32 Applications: Message Boxes Set MB_RTLLAYOUT option bit
Guidelines for using RTL Layout Using coordinates Use GetWindowRect with care Use GetWindowRect with care Use client, rather than screen coordinates Use client, rather than screen coordinates Do not mix screen coordinates and client coordinates Do not mix screen coordinates and client coordinates Use MapWindowPoints to map rectangles, instead of ClientToScreen and ScreenToClient Use MapWindowPoints to map rectangles, instead of ClientToScreen and ScreenToClient Windows 95 does not support mirroring!
Implementing Multi-language User Interface in Applications
Guidelines for Multilanguage User Interface Initialize to current UI language Windows 2000: GetUserDefaultUILanguage() Windows 2000: GetUserDefaultUILanguage() Others: Use the language of the O/S Others: Use the language of the O/S See function InitUiLang in Globaldev sample code See function InitUiLang in Globaldev sample code
Guidelines for Multilanguage User Interface Allow user to select UI language Put language-dependent resources in resource DLLs Put language-dependent resources in resource DLLs Use naming convention, e.g., res.dll Use naming convention, e.g., res.dll Find all resource DLLs, put up list box of choices Find all resource DLLs, put up list box of choices See module UPDTLANG.CPP in Globaldev Sample
Summary Use Unicode to encode if you can Use controls to display text and accept user input Use Uniscribe for advanced formatting Use new RTL layout API for applications localized to RTL languages Consider multilingual user interface
Further Information and Resources (Watch for updates!) MSJ articles, e.g., Uniscribe: multilangtop.htm Uniscribe: multilangtop.htm multilangtop.htm multilangtop.htm Multilingual UI: Multilingual UI: nicode/multilangUnicodetop.htm nicode/multilangUnicodetop.htm Send suggestions to