DEV-10: Supporting Multiple Languages In Your Application Salvador Viñals Consultant Product Manager
© 2006 Progress Software Corporation2 DEV-10: Supporting Multiple Languages In Your Application Agenda International support with OpenEdge® 10 OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte For more information, go to… Summary This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation3 DEV-10: Supporting Multiple Languages In Your Application Code-Pages and Unicode Code-pages Many code-pages Max 255 characters each Each with regionally-limited repertoire of characters Unicode Uni code = One Uni code = Universal Virtually all the world's characters Distinguishes characters by script, but not by language. UTF-8, UTF-16, UTF-32 Unicode binary representations (8,16,32 bits)
© 2006 Progress Software Corporation4 DEV-10: Supporting Multiple Languages In Your Application OpenEdge Products OpenEdge 10 products support UTF-8 (Unicode) Database (Personal, Workgroup, Enterprise) Application Servers [AppServer, WebSpeed] (Basic, Enterprise) GUI Clients (Client Networking, WebClient) and Batch Client Exceptions Character Client and DataServers: Use code-pages instead Code-pages and Unicode can interoperate International readiness
© 2006 Progress Software Corporation5 DEV-10: Supporting Multiple Languages In Your Application Configurations UTF-8 or Code-pages AppServer ™ WebSpeed® OpenEdge Application Servers OE Batch Client UTF-8 or Code-pages OpenEdge RDBMS UTF-8 or Code-pages Oracle MS SQL ODBC UTF-8 OpenEdge DataServers Code-pages Web Service Client GUI Character UTF-8 or Code-pages Code-pages SQL Clients UTF-8
© 2006 Progress Software Corporation6 DEV-10: Supporting Multiple Languages In Your Application Translation Products Translation Manager (TranMan) Visual Translator (VisTran) Products life cycle Progress V9 – Functionally Stable OpenEdge 10 – Active TranMan and VisTran run on Windows only, however they can be used to manage translations of ChUI or GUI applications.
© 2006 Progress Software Corporation7 DEV-10: Supporting Multiple Languages In Your Application Agenda International support with OpenEdge 10 OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte For more information, go to… Summary This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation8 DEV-10: Supporting Multiple Languages In Your Application Support for GB18030 Code Page Chinese code page Required for all new software sold in mainland China
© 2006 Progress Software Corporation9 DEV-10: Supporting Multiple Languages In Your Application Support for GB18030 Code Page Why is this code page unique? Does not fit into lead-byte / trail-byte model It has 1, 2, and 4 byte characters Cannot tell from lead-byte if there are 2 or 4 bytes in the character
© 2006 Progress Software Corporation10 DEV-10: Supporting Multiple Languages In Your Application Support for GB18030 Code Page Supported by making conversions of the GB18030 code page to and from UTF-8 Requires cpinternal to be UTF-8 –No cpinternal for GB18030 Reading and writing a file in GB18030 –Converts to/from UTF-8
© 2006 Progress Software Corporation11 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting Unicode sorting for UTF-8 Language-sensitive collations Tailor app to expectations of locale Language Location (country, region, etc.) Easy to use Functions just like any other collation for ABL, and OpenEdge Database or SQL users Prior to 10.0B UTF-8 collation was binary sort The goal …
© 2006 Progress Software Corporation12 DEV-10: Supporting Multiple Languages In Your Application Catalan, català (ca,cat) -- Catalan alphabet: Aa (Àà), Bb, Cc (Çç), Dd, -- Ee (Éé, Èè), Ff, Gg, Hh, -- Ii (Íí, Ïï), Jj, [Kk], Ll, Mm, Nn, -- Oo (Óó, Òò), Pp, Qq, Rr, Ss, Tt, -- Uu (Úú, Üü), Vv, [Ww], Xx, [Yy], Zz L·L is ordered as L+L. -- & LL << l·l <<< L·l <<< L·L Finnish, suomi (fi,fin) -- Finnish alphabet: Aa, Bb, [Cc], Dd, Ee, Ff, Gg, Hh, -- Ii, Jj, Kk, Ll, Mm, Nn, Oo, Pp, -- [Qq], Rr, Ss (Šš), Tt, Uu, Vv [Ww], -- [Xx], Yy [Üü], Zz (Žž), [Åå], Ää -- [Ææ], Öö [Øø] -- & V << w <<< W & Y << ü <<< Ü & Z < å <<< Å < ä <<< Ä << æ <<< Æ < ö <<< Ö << ø <<< Ø French, français (fr,fra) -- French alphabet: Aa (Àà, Ââ), (Ææ), Bb, Cc (Çç), Dd, -- Ee (Éé, Èè, Êê, Ëë), Ff, Gg, Hh, -- Ii (Îî, Ïï), Jj, [Kk], Ll, Mm, -- Nn (Ññ), Oo (Ôô), (Œœ), Pp, Qq, Rr, -- Ss, Tt, Uu (Ùù, Ûû), Vv, [Ww], Xx, -- Yy (Ÿÿ), Zz The ligatures Æ and Œ are ordered -- as A+E and O+E respectively. -- [accentorder backward] Unicode 4.1 Default Collation Order Unicode 4.1 Default Collation Order ISO/IEC Unicode default latin alphabet: Aa, Bb, Cc, Dd, Ee, Əə, Ff, Gg, Hh, -- Ii, ı, Jj, Kk, Ll, Mm, Nn, Ŋŋ, Oo, -- Pp, Qq, ĸ, Rr, Ss, Tt, Ŧŧ, Uu, Vv, -- Ww, Xx, Yy, Zz, Þþ Unicode default greek alphabet: Αα, Ββ, Γγ, Δδ, Εε, Ζζ, Ηη, Θθ, Ιι, -- Κκ, Λλ, Μμ, Νν, Ξξ, Οο, Ππ, Ρρ, Σσς, -- Ττ, Υυ, Φφ, Χχ, Ψψ, Ωω Unicode default cyrillic alphabet: Аа, Әә, Бб, Вв, Гг, Ғғ, Дд, Ђђ, Ѓѓ, -- Ее, Єє, Жж, Җҗ, Зз, Ѕѕ, Ии, Іі, Її, -- Йй, Јј, Кк, Ққ, Ҝҝ, Лл, Љљ, Мм, Нн, -- Ңң, Њњ, Оо, Өө, Пп, Рр, Сс, Тт, Ћћ, -- Ќќ, Уу, Ўў, Үү, Ұұ, Фф, Хх, Ҳҳ, Һһ, -- Цц, Чч, Ҹҹ, Џџ, Шш, Щщ, Ъъ, Ыы, Ьь, -- Ээ, Юю, Яя -- Some collation examples Latin alphabet
© 2006 Progress Software Corporation13 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting OpenEdge Database meta-schema Table _DB-collate –Already used for single-byte sort weights –New functionality used for summary information Table _Collation –Added in 10.0A in preparation –Can hold any amount of collation data Internals
© 2006 Progress Software Corporation14 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting ABL Usage Reference collation by name –For example “ICU-fr” for French Specify using -cpcoll –Identifies collation table to use with code page in memory at session startup – is the collation table in convmap.cp or the name of the ICU collation ABL Statements –COMPARE –COLLATE
© 2006 Progress Software Corporation15 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting COMPARE and COLLATE new strengths supported 10.0A strengths: CASE-INSENSITIVE, CASE- SENSITIVE, CAPS and RAW Added strengths PRIMARY SECONDARY = CASE-INSENSITIVE TERTIARY = CASE-SENSITIVE QUATERNARY
© 2006 Progress Software Corporation16 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting /* French collation */ DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-fr") /* Spanish collation */ DISPLAY “ICU-es = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-es") ICU-fr = yes ICU-es = no Output of above statements Sort order depends on selected collation
© 2006 Progress Software Corporation17 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting OpenEdge uses collations for The –cpcoll startup parameter The database collation The collation of a database CLOB column An argument to the COMPARE function or COLLATE option of the BY phrase
© 2006 Progress Software Corporation18 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting Once a collation is specified for the database in the _Collation table, it cannot be modified Once the collation is written to the _Collation table, it is the only collation with that name that can be used by that database It is strongly recommended that databases should be backed up before using an ICU collation Rules
© 2006 Progress Software Corporation19 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting The following examples assume UTF-8 database with “basic” collation Names: –beet, carrot, çedilla, entry, école, trust, zoom FOR EACH words WHERE name < “t”: DISPLAY name. END. beet carrot entry Output result Example 1 of 4
© 2006 Progress Software Corporation20 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting FOR EACH words WHERE name >= “t”: DISPLAY name. END. trust zoom école çedilla Output result Example 2 of 4
© 2006 Progress Software Corporation21 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,“ICU-en”): DISPLAY name. END. beet carrot entry école çedilla Output result Example 3 of 4 beet carrot entry Before, without COMPARE
© 2006 Progress Software Corporation22 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,“ICU-en”) BY COLLATE(name,“case-insensitive”,“ICU-en”): DISPLAY name. END. beet carrot çedilla école entry Example 4 of 4 Output result Before, without BY COLLATE beet carrot entry école çedilla
© 2006 Progress Software Corporation23 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting OpenEdge supports ICU collations in the icui18n library for supported OpenEdge languages ICU-ja__HQ = Japanese Hiragana Quaternary One additional collation is supported - Japanese Hiragana Quaternary as case- sensitive Uses the QUATERNARY strength as the CASE-SENSITIVE strength Supported Collations
© 2006 Progress Software Corporation24 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting ICU Collations Available 1 of 3 ICU-UCAUCA (default Unicode Collation Algorithm) ICU-arArabic ICU-beBelarusian ICU-bgBulgarian ICU-caCatalan ICU-csCzech ICU-daDanish ICU-de__PHONEBOOKGerman phonebook ICU-elGreek ICU-en_BEEnglish Belgium ICU-eoEsperanto ICU-esSpanish ICU-es__TRADITIONALSpanish traditional ICU-etEstonian ICU-faPersian ICU-fiFinnish ICU-frFrench ICU-guGujarati
© 2006 Progress Software Corporation25 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting ICU Collations Available 2 of 3 ICU-heHebrew ICU-hiHindi ICU-hi__DIRECTHindi direct ICU-hrCroatian ICU-huHungarian ICU-isIcelandic ICU-jaJapanese ICU-koKorean ICU-knKannada ICU-ltLithuanian ICU-lvLatvian ICU-mkMacedonian ICU-mrMarathi ICU-mtMaltese ICU-nbNorwegian Bokmål ICU-nnNorwegian Nynorsk ICU-plPolish ICU-roRomanian
© 2006 Progress Software Corporation26 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting ICU Collations Available 3 of 3 ICU-ruRussian ICU-shSaint Helena ICU-skSlovak ICU-slSlovenian ICU-sqAlbanian ICU-srSerbian ICU-svSwedish ICU-taTamil ICU-teTelugu ICU-thThai ICU-trTurkish ICU-ukUkrainian ICU-viVietnamese ICU-zhChinese ICU-zh__PINYINChinesePinyin ICU-zh_HKChineseHong Kong ICU-zh_MOChineseMacau ICU-zh_TWChineseTaiwan
© 2006 Progress Software Corporation27 DEV-10: Supporting Multiple Languages In Your Application Collations Gotchas If Database, Clients and Servers use different collations (-cpcoll), indexed and non-indexed queries may return different results If a client needs different collation than database, you can use COMPARE, COLLATE on the client Performance impact with large results sets
© 2006 Progress Software Corporation28 DEV-10: Supporting Multiple Languages In Your Application Configuration Gotchas Database code-page is 1252 on Windows server OpenEdge install startup.pf setting is: –cpinternal 1252 –cpstream 1252 French Windows Client with a default Windows code page of 1252, and a DOS system code page of ibm850 DOS Character Client starts without specifying -cpinternal and –cpstream so uses 1252 from startup.pf Typical character client configuration, 1/2
© 2006 Progress Software Corporation29 DEV-10: Supporting Multiple Languages In Your Application Configuration Gotchas User enters “è” (Hex 8A in ibm850) Since session is started with –cpinternal 1252 OpenEdge doesn’t convert when writing to the database. The entered value is written to the database as 8A, when it should be E8 (1252) Start Character Client with –cpinternal and –cpstream set to ibm850 Typical character client configuration, 2/2
© 2006 Progress Software Corporation30 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization Unicode has different ways of expressing the same characters Decomposed Á = (U+0041, Latin Capital Letter A) + (U+0301, Combining Acute Accent ´) Composed Á = (U+00C1, Latin Capital Letter A with Acute) What is Normalization?
© 2006 Progress Software Corporation31 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization XML (and other W3C entities) expects data in “NFC” form Best way to convert from Unicode to other code pages Useful when doing tasks such as making comparisons Why Normalization? NFC = Canonical Decomposition, followed by Canonical Composition
© 2006 Progress Software Corporation32 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization NORMALIZE Returns either CHAR or LONGCHAR –Matches the source string CHAR variable must be UTF-8 LONGCHAR variable can be any form of Unicode –UTF-8, UTF-16, UTF-32 result-string = NORMALIZE(source-string, normalization-mode) NORMALIZE Language Function
© 2006 Progress Software Corporation33 DEV-10: Supporting Multiple Languages In Your Application Normalization Modes Supported NFD: Canonical Decomposition NFC: Canonical Decomposition, followed by Canonical Composition (default) NFKD: Compatibility Decomposition NFKC: Compatibility Decomposition, followed by Canonical Composition None: No change to source string. Turns off normalization when normalization-mode is a variable Normalization modes from ICU library
© 2006 Progress Software Corporation34 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization Unicode Normalization Forms Recommended for understanding normalization forms used with NORMALIZE function International Components for Unicode (ICU) libraries & globalization, in-depth information Additional information
© 2006 Progress Software Corporation35 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables Prior to 10.1A User had to configure word-break tables for use with double-byte and UTF-8 databases
© 2006 Progress Software Corporation36 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables Default Word-Break Tables added for: Double-byte UTF-8 Databases These are available ‘out of the box’ Either in product or for download Simplifies accessing non-single-byte databases 10.1A simplifies implementing double-byte databases
© 2006 Progress Software Corporation37 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables 10.1A provides 10 compiled files See list on next slide Ranging from proword.245 to proword.254 Located in subdirectory with corresponding empty databases Subdirectory prolang/ 10.1A simplifies implementing double-byte databases
© 2006 Progress Software Corporation38 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables Compiled, Available out of the box Available as part of the Supplemental PROMSGS package Available for download Japanese SHIFT-JIS proword.253 Japanese EUCJIS proword.250 Korean CP949 proword.248 Korean KSC5601 proword.252 Chinese (simplified) CP936 proword.247 Chinese (simplified) GB2312 proword.251 Chinese (traditional) CP950 proword.249 Chinese (traditional) BIG-5 proword.246 Chinese (traditional) CP950-HKSCS proword.245 UTF-8 proword A simplifies implementing double-byte databases
© 2006 Progress Software Corporation39 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables What if you are using proword file in the range of 245 – 254? Copy the file to proword. –Where is less than 240 Apply word rule to the database –No index-build is required for this change Remember, apply the change in all tiers (Client, Server, Database) to prevent corruption!
© 2006 Progress Software Corporation40 DEV-10: Supporting Multiple Languages In Your Application Agenda International support with OpenEdge 10 OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte For more information, go to… Summary This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation41 DEV-10: Supporting Multiple Languages In Your Application For More Information, go to… Expand to New Countries Business Empowerment Program Contact your Account Manager Product documentation OpenEdge Development: Internationalizing Applications OpenEdge Development: Visual Translator OpenEdge Development: Translation Manager Visit PSDN for white papers and presentations, for example: “Understanding Internationalization” web seminar Training and Professional Services –
© 2006 Progress Software Corporation42 DEV-10: Supporting Multiple Languages In Your Application Agenda International support with OpenEdge 10 OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte For more information, go to… Summary This presentation includes annotations with additional, complementary information
© 2006 Progress Software Corporation43 DEV-10: Supporting Multiple Languages In Your Application In Summary Use UTF-8 GB18030 Linguistic Sorting and Collations Use ICU-* Unicode Normalization Default word-break tables and double-byte Expand to New Countries Business Empowerment Program
© 2006 Progress Software Corporation44 DEV-10: Supporting Multiple Languages In Your Application Questions?
© 2006 Progress Software Corporation45 DEV-10: Supporting Multiple Languages In Your Application Thank you for your time
© 2006 Progress Software Corporation46 DEV-10: Supporting Multiple Languages In Your Application