Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program Manager Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
Advertisements

Beyond Text Representation Building on Unicode to Implement a Multilingual Text Analysis Framework Thomas Hampp – IBM Germany Content Management Development.
Murray Sargent III Microsoft Corporation Text Services Group, Word Tips & Tricks on Editing and Displaying Unicode Text.
Chris Pratley Group Program Manager Microsoft Word.
Chris Pratley Lead Program Manager Microsoft Office.
June 2004 Adil Allawi Technical Director
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
Developing Arabic Applications with Visual Studio 2005 Dina Lasheen Program Manager – Developer Division.
Unicode and Windows XP Cathy Wissink Program Manager Globalization Infrastructure, Design and Development Windows International Microsoft.
Chapter 3 Creating a Business Letter with a Letterhead and Table
Microsoft Word – Lesson 1
Tutorial 8: Developing an Excel Application
Tutorial 12: Enhancing Excel with Visual Basic for Applications
1 ADVANCED MICROSOFT POWERPOINT Lesson 5 – Using Advanced Text Features Microsoft Office 2003: Advanced.
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Using a Template to Create a Resume and Sharing a Finished Document
1 Introducing Collaboration to Single User Applications A Survey and Analysis of Recent Work by Brian Cornell For Collaborative Systems Fall 2006.
©2004 Brooks/Cole Chapter 1: Getting Started Sections Covered: 1.1Introduction to Programming 1.2Constructing a Java Program 1.3The print() and println()
Data Representation Kieran Mathieson. Outline Digital constraints Data types Integer Real Character Boolean Memory address.
Chapter 2: The Visual Studio.NET Development Environment Visual Basic.NET Programming: From Problem Analysis to Program Design.
Creating Web Page Forms
Windows XP Language Interface Packs (LIPs) - Localized OSs for the Masses Russ Rolfe Program Manager.
26 April 2001 Unicode and Windows XP, IUC 18 (Hong Kong) Unicode and Windows XP Cathy Wissink Program Manager, Globalization Windows Division Microsoft.
Creating Multi-lingual Applications and Websites with Microsoft Visual Studio 2005 Achim Ruopp International Program Manager Microsoft Corporation.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
HTML Overview for Proofreading. HTML layouts are divided into sections, and created in tables separating the images & content sections.
Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000  Key Concepts  Overview of Unicode  Migrating existing applications  Using Unicode.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Creating a Simple Page: HTML Overview
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
1 An ICU Library Supporting the Display of Complex Text Eric Mader Globalization Center of Competency, Cupertino, CA.
Lesson 1 Review Q and A’s.
Department of Mechanical Engineering, LSUSession VII MATLAB Tutorials Session VIII Graphical User Interface using MATLAB Rajeev Madazhy
Software Globalization With Windows 2000/XP Houman Pournasseh Lead Program Manager.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
Microsoft Access Lesson 1 Lexington Technology Center February 11, 2003 Bob Herring On the Web at
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
Introduction to Interactive Media Interactive Media Components: Text.
Lesson 11: Looking at Files and Folders what a file or folder is on the computer how to recognize a file or folder on the desktop how to recognize the.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Complex Scripts* in Internet Explorer 5.0 *and Multilingual text F. Avery Bishop Senior Program Manager Microsoft Corporation.
CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University.
 2002 Prentice Hall. All rights reserved. 1 Chapter 2 – Introduction to the Visual Studio.NET IDE Outline 2.1Introduction 2.2Visual Studio.NET Integrated.
Chapter 2 More Controls Programming in C#. NET Objectives Use text boxes, group boxes, check boxes, radio buttons, and picture boxes effectively.
Win32 Programming Lesson 19: Introduction to DLLs.
MISSION CRITICAL COMPUTING SQL Server Special Considerations.
1 CSC160 Chapter 1: Introduction to JavaScript Chapter 2: Placing JavaScript in an HTML File.
 2002 Prentice Hall. All rights reserved. 1 Introduction to the Visual Studio.NET IDE Outline Introduction Visual Studio.NET Integrated Development Environment.
17th International Unicode Conference 1 Font Coverage in Windows Bob Rasmussen Rasmussen Software, Inc. Bob Rasmussen: Master layout Codeexamples.
Microsoft Visual Basic 2012: Reloaded Fifth Edition Chapter One An Introduction to Visual Basic 2012.
Chapter 5 Using a Template to Create a Resume and Sharing a Finished Document Microsoft Word 2013.
Complex Text Layout Issues with examples from Myanmar
The New User Interface MEDITECH Training & Education.
Binary Representation in Text
Binary Representation in Text
Computer Fundamentals
Introduction to the Visual C# 2005 Express Edition IDE
Chapter 2 – Introduction to the Visual Studio .NET IDE
Programming with Microsoft Visual Basic 2008 Fourth Edition
Programming Logic and Design Eighth Edition
Presentation transcript:

Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program Manager Microsoft Corporation

Agenda:  Overview of character encoding, Unicode  Guidelines for supporting complex scripts  Right-to-left layout of applications  Multilingual User Interface

Overview of Character Encoding and Unicode

Why do character set differences matter?  Historically, they fragmented code bases for both Windows and applications Single byte: European editions Single byte: European editions Double byte: Far East editions Double byte: Far East editions Bi-directional: Middle East editions Bi-directional: Middle East editions  Make it difficult to share data  Make it difficult to develop multilingual applications

Example: Multiple Hebrew Character Encodings  8bit Hebrew encodings still in use Windows codepage 1255 Windows codepage 1255 OEM (DOS) codepage 862 OEM (DOS) codepage 862 Visual Hebrew encodings (many exist) Visual Hebrew encodings (many exist)

Example: Multiple Arabic Character Encodings  8bit Arabic encodings supported in Internet Explorer 4.0/CS ASMO-708 ASMO-708 DOS 720 DOS 720 ISO ISO Windows Codepage 1256 Windows Codepage 1256 Other proprietary encodings Other proprietary encodings

Logical vs Visual Encoding  Logical: Storage order is same as typing order Storage order is same as typing order Allows natural text processing: Allows natural text processing: Search Search Resizing (e.g., in web pages) Resizing (e.g., in web pages) IPC: Select, cut & paste IPC: Select, cut & paste  Visual Natural text processing difficult or impossible Natural text processing difficult or impossible Cannot always map back to logical order Cannot always map back to logical order

What is Unicode?  A 16-bit character encoding A mapping of characters to numbers A mapping of characters to numbers Syntax rules for display of complex scripts Syntax rules for display of complex scripts Not a font or glyph encoding! Not a font or glyph encoding! Not a sort algorithm! Not a sort algorithm!  Includes all characters in common use in modern scripts (and others)  Basis for the ISO character encoding standard  Native text encoding for Windows NT

Unicode ™ / ISO  16-bit international character encoding  Windows 2000 uses Unicode version 2.0 0x0000 0xFFFF Punctuation Future use ASCII Private use Compatibility Indian Greek Arabic, Hebrew Latin Ideographs (Hanzi, Kanji, Hanja) Symbols Hangul Kana Thai A FF964F (null)

Relatives of Unicode  ISO/IEC bit ISO standard of 64K X 64K “planes” 32 bit ISO standard of 64K X 64K “planes” Unicode repertoire is plane 0 Unicode repertoire is plane 0  UTF-7 7 bit transformation format 7 bit transformation format Not widely used Not widely used  UTF-8 8 bit transformation format 8 bit transformation format Used in web pages and some Used in web pages and some

Unicode in Win32: the W and A Entry Points  Two kinds of window classes: Unicode, ANSI  Win32 API has two versions of most functions: “W” (wide) version handles Unicode “W” (wide) version handles Unicode “A” (ANSI –  ) assumes the system default code page (character encoding) “A” (ANSI –  ) assumes the system default code page (character encoding)

Unicode in Win32 …  Macros resolve to W or A entry point  Example: Macro for RegisterClassEx #ifdef UNICODE #define RegisterClassEx RegisterClassExW #else #define RegisterClassEx RegisterClassExA #endif  To create Unicode application: Compile with –DUNICODE or Compile with –DUNICODE or Use W routines explicitly Use W routines explicitly

For Applications that Must Also Run on Windows 98…  Use Unicode everywhere with single binary, two code paths: On Windows NT use W entry points On Windows NT use W entry points On Windows 98, convert Unicode  ANSI, use A entry points On Windows 98, convert Unicode  ANSI, use A entry points See sample GLOBALDV for example See sample GLOBALDV for example  See April Microsoft Systems Journal for details and other options

Summary: Use Unicode if you can!  Represent all text with one unambiguous encoding  Support multilingual text easily  Avoid special processing for variable byte- length characters  Use standard encoding recognized throughout the industry and the world  Support new scripts that are only supported through Unicode

Guidelines for Supporting Complex Scripts in Applications

1. Displaying Complex Scripts in Plain-text  In Win32 apps use standard edit control  Use standard win32 API display functions Win32 APIs: ExtTextOutW or DrawTextW Win32 APIs: ExtTextOutW or DrawTextW ScriptString API in Uniscribe ScriptString API in Uniscribe

Pitfalls in Enabling for Complex Scripts  When displaying typed text: Do not output characters one by one! Do not output characters one by one! Do save text in a buffer and display the whole string with Uniscribe or Win32 API Do save text in a buffer and display the whole string with Uniscribe or Win32 API  To measure line lengths: Do not sum cached character widths Do not sum cached character widths Do use a GetTextExtent function or Uniscribe Do use a GetTextExtent function or Uniscribe

2. Displaying Complex Scripts in Simple Formatted Text  In Win32 applications use rich edit control  In web pages for Internet Explorer 5.0, use Document Object Model

3. Displaying CS in Text with Advanced Formatting and Layout  Use script APIs (“Uniscribe”)  See MSJ article of November 1998

Overview of Uniscribe  Background and Purpose of Uniscribe  Low level APIs  High level APIs  For details see November 1998 MSJ article

The Uniscribe DLL: USP10.DLL  Platforms Windows 2000 Windows 2000 Windows NT 4 Windows NT 4 Windows 98 Windows 98 Windows 95 (excluding Far East) Windows 95 (excluding Far East)  Single worldwide binary  Installs with Windows2000, IE5, Office 2000

Hides language details  Syllable structure (Indian, Thai)  Contextual shaping (Arabic, Indic)  Caret placement (all)  Wordbreak (Thai)  National digits (Arabic, Indic, Thai)  Bidirectional layout (Arabic, Hebrew)

Hides Unicode OS details  APIs are Unicode on all platforms  Hides glyph codes  Hides font differences Shaping tables Shaping tables Fixed repertoire fonts Fixed repertoire fonts

Uniscribe Structure Uniscribe Arabic shaping engine Layout XtoCP & CPtoX Justify Shape, Place and TextOut Unicode BiDi algorithm Itemize GDI Client Measurer Renderer Display Caret Mouse ExtTextOut ETO_ GLYPH_INDEX GetCharABC- WidthsI GetGlyphOutline CMAP & width tables, Open- Type library Hindi shaping engine Tamil shaping engine Thai shaping engine Vietnamese shaping Hebrew engine

Shaping engines  Per script  Understand language rules  Understand font features OpenType provides full control OpenType provides full control Many older fixed layout fonts Many older fixed layout fonts

USERGDI LPK. DLL Uni- scribe Application

Low level APIs Support  Formatting text Style runs Style runs Measurement Measurement Paragraph filling Paragraph filling Rendering Rendering  Information needed for font fallback

Summary  Script… Itemize Itemize Shape, Place Shape, Place Break, Layout Break, Layout TextOut TextOut CPtoX, XtoCP CPtoX, XtoCP

High level APIS  Purpose  Analysis  Display  Font fallback

Purpose  For Windows 2000 ExtTextOut ExtTextOut DrawText DrawText System edit control System edit control  Cross-platform Unicode plaintext display  Easier than low level APIs

Summary of ScriptString APIs:  ScriptString… Analyse Analyse … query analysis... … query analysis... Out Out Free Free  Provides simple font fallback

Implementing Right-to-left Layout in Applications

Background On RTL Layout (“Mirroring”) For BiDi Localization  Localized Arabic and Hebrew Windows ® is laid out from Right to Left  In the past was done “ad hoc” or not at all  Windows 2000 and BiDi Windows 98 include mechanisms to “automatically” mirror shell and applications  Also helpful for multilingual user interface support

Mirroring in System Based on Coordinate Transformation  Origin (0,0) in upper RIGHT corner of window  X scale factor = -1, x values increase from right to left Default (LTR) Window OriginOrigin Increasing x 01 Mirrored (RTL) Window OriginOrigin Increasing x 01

More Background on Mirroring…  Developers use programming interfaces and Windows style bits  Automatic inheritance of RTL property: Child window of RTL window defaults to RTL Child window of RTL window defaults to RTL You can disable inheritance of RTL Property You can disable inheritance of RTL Property  APIs provided to disable mirroring of bitmaps

Implementing Mirroring in Win32 Applications: Standard Windows  Use SetProcessDefaultLayout: Affects all Windows created thereafter Affects all Windows created thereafter SetProcessDefaultLayout(LAYOUTRTL) ; SetProcessDefaultLayout(LAYOUTRTL) ; SetProcessDefaultLayout(0) ; // Reset to LTR SetProcessDefaultLayout(0) ; // Reset to LTR  Or call CreateWindowEx: Use extended style WS_EX_LAYOUTRTL Use extended style WS_EX_LAYOUTRTL To inhibit mirroring in child windows, also set WS_EX_NOINHERITLAYOUT To inhibit mirroring in child windows, also set WS_EX_NOINHERITLAYOUT

Changing Layout of Existing Window BOOL IsRTLLayout ; // TRUE iff window is to be mirrored //... Get new value of IsRTLLayout LONG lExStyles = GetWindowLongA(hWnd, GWL_EXSTYLE) ; // Check whether new layout is opposite current layout if(!!(IsRTLLayout) != !!(lExStyles & WS_EX_LAYOUTRTL)){ lExStyles ^= WS_EX_LAYOUTRTL ; // Toggle layout lExStyles ^= WS_EX_LAYOUTRTL ; // Toggle layout // Set extended styles to new value // Set extended styles to new value SetWindowLongA(hWnd, GWL_EXSTYLE, lExStyles) ; SetWindowLongA(hWnd, GWL_EXSTYLE, lExStyles) ; // Update client area // Update client area InvalidateRect(hWnd, NULL, TRUE) ; InvalidateRect(hWnd, NULL, TRUE) ;}

Controlling Mirroring of a Device Context  SetLayout(HDC hDc, DWORD dwLayout) dwLayout = 0 ; // will layout LTR dwLayout = LAYOUTRTL ;// will layout RLT dwLayout = LAYOUTRTL | LAYOUT_BITMAPORIENTATIONPRESERVED ; // will layout RTL, but not bitmaps  GetLayout(HDC hDc, DWORD *pdwLayout) Tells what the layout settings are for a hDc

Mirroring in Win32 Applications: Dialogs  Set WS_EX_LAYOUTRTL in dialog template  Visual Studio 6 Dialog editor: Has option for RTL layout Has option for RTL layout BUG in Visual Studio 6: BUG in Visual Studio 6: Writes WS_EX_LAYOUT_RTL to RC file! Writes WS_EX_LAYOUT_RTL to RC file! Must correct RC file by hand to compile Must correct RC file by hand to compile Will be fixed in future version Will be fixed in future version

Mirroring in Win32 Applications: Message Boxes  Set MB_RTLLAYOUT option bit

Guidelines for using RTL Layout  Using coordinates Use GetWindowRect with care Use GetWindowRect with care Use client, rather than screen coordinates Use client, rather than screen coordinates Do not mix screen coordinates and client coordinates Do not mix screen coordinates and client coordinates Use MapWindowPoints to map rectangles, instead of ClientToScreen and ScreenToClient Use MapWindowPoints to map rectangles, instead of ClientToScreen and ScreenToClient  Windows 95 does not support mirroring!

Implementing Multi-language User Interface in Applications

Guidelines for Multilanguage User Interface  Initialize to current UI language Windows 2000: GetUserDefaultUILanguage() Windows 2000: GetUserDefaultUILanguage() Others: Use the language of the O/S Others: Use the language of the O/S See function InitUiLang in Globaldev sample code See function InitUiLang in Globaldev sample code

Guidelines for Multilanguage User Interface  Allow user to select UI language Put language-dependent resources in resource DLLs Put language-dependent resources in resource DLLs Use naming convention, e.g., res.dll Use naming convention, e.g., res.dll Find all resource DLLs, put up list box of choices Find all resource DLLs, put up list box of choices  See module UPDTLANG.CPP in Globaldev Sample

Summary  Use Unicode to encode if you can  Use controls to display text and accept user input  Use Uniscribe for advanced formatting  Use new RTL layout API for applications localized to RTL languages  Consider multilingual user interface

Further Information and Resources  (Watch for updates!)  MSJ articles, e.g., Uniscribe: multilangtop.htm Uniscribe: multilangtop.htm multilangtop.htm multilangtop.htm Multilingual UI: Multilingual UI: nicode/multilangUnicodetop.htm nicode/multilangUnicodetop.htm  Send suggestions to