Presentation is loading. Please wait.

Presentation is loading. Please wait.

DEV-10: Supporting Multiple Languages In Your Application Salvador Viñals Consultant Product Manager.

Similar presentations


Presentation on theme: "DEV-10: Supporting Multiple Languages In Your Application Salvador Viñals Consultant Product Manager."— Presentation transcript:

1 DEV-10: Supporting Multiple Languages In Your Application Salvador Viñals Consultant Product Manager

2 © 2006 Progress Software Corporation2 DEV-10: Supporting Multiple Languages In Your Application Agenda  International support with OpenEdge® 10  OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte  For more information, go to…  Summary This presentation includes annotations with additional, complementary information

3 © 2006 Progress Software Corporation3 DEV-10: Supporting Multiple Languages In Your Application Code-Pages and Unicode  Code-pages Many code-pages Max 255 characters each Each with regionally-limited repertoire of characters  Unicode Uni code = One Uni code = Universal Virtually all the world's characters Distinguishes characters by script, but not by language.  UTF-8, UTF-16, UTF-32 Unicode binary representations (8,16,32 bits)

4 © 2006 Progress Software Corporation4 DEV-10: Supporting Multiple Languages In Your Application OpenEdge Products  OpenEdge 10 products support UTF-8 (Unicode) Database (Personal, Workgroup, Enterprise) Application Servers [AppServer, WebSpeed] (Basic, Enterprise) GUI Clients (Client Networking, WebClient) and Batch Client  Exceptions Character Client and DataServers: Use code-pages instead  Code-pages and Unicode can interoperate International readiness

5 © 2006 Progress Software Corporation5 DEV-10: Supporting Multiple Languages In Your Application Configurations UTF-8 or Code-pages AppServer ™ WebSpeed® OpenEdge Application Servers OE Batch Client UTF-8 or Code-pages OpenEdge RDBMS UTF-8 or Code-pages Oracle MS SQL ODBC UTF-8 OpenEdge DataServers Code-pages Web Service Client GUI Character UTF-8 or Code-pages Code-pages SQL Clients UTF-8

6 © 2006 Progress Software Corporation6 DEV-10: Supporting Multiple Languages In Your Application Translation Products  Translation Manager (TranMan)  Visual Translator (VisTran)  Products life cycle Progress V9 – Functionally Stable OpenEdge 10 – Active TranMan and VisTran run on Windows only, however they can be used to manage translations of ChUI or GUI applications.

7 © 2006 Progress Software Corporation7 DEV-10: Supporting Multiple Languages In Your Application Agenda  International support with OpenEdge 10  OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte  For more information, go to…  Summary This presentation includes annotations with additional, complementary information

8 © 2006 Progress Software Corporation8 DEV-10: Supporting Multiple Languages In Your Application Support for GB18030 Code Page  Chinese code page  Required for all new software sold in mainland China

9 © 2006 Progress Software Corporation9 DEV-10: Supporting Multiple Languages In Your Application Support for GB18030 Code Page  Why is this code page unique? Does not fit into lead-byte / trail-byte model It has 1, 2, and 4 byte characters Cannot tell from lead-byte if there are 2 or 4 bytes in the character

10 © 2006 Progress Software Corporation10 DEV-10: Supporting Multiple Languages In Your Application Support for GB18030 Code Page  Supported by making conversions of the GB18030 code page to and from UTF-8 Requires cpinternal to be UTF-8 –No cpinternal for GB18030 Reading and writing a file in GB18030 –Converts to/from UTF-8

11 © 2006 Progress Software Corporation11 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  Unicode sorting for UTF-8  Language-sensitive collations  Tailor app to expectations of locale Language Location (country, region, etc.)  Easy to use Functions just like any other collation for ABL, and OpenEdge Database or SQL users Prior to 10.0B UTF-8 collation was binary sort The goal …

12 © 2006 Progress Software Corporation12 DEV-10: Supporting Multiple Languages In Your Application Catalan, català (ca,cat) -- Catalan alphabet: -- -- Aa (Àà), Bb, Cc (Çç), Dd, -- Ee (Éé, Èè), Ff, Gg, Hh, -- Ii (Íí, Ïï), Jj, [Kk], Ll, Mm, Nn, -- Oo (Óó, Òò), Pp, Qq, Rr, Ss, Tt, -- Uu (Úú, Üü), Vv, [Ww], Xx, [Yy], Zz -- -- L·L is ordered as L+L. -- & LL << l·l <<< L·l <<< L·L Finnish, suomi (fi,fin) -- Finnish alphabet: -- -- Aa, Bb, [Cc], Dd, Ee, Ff, Gg, Hh, -- Ii, Jj, Kk, Ll, Mm, Nn, Oo, Pp, -- [Qq], Rr, Ss (Šš), Tt, Uu, Vv [Ww], -- [Xx], Yy [Üü], Zz (Žž), [Åå], Ää -- [Ææ], Öö [Øø] -- & V << w <<< W & Y << ü <<< Ü & Z < å <<< Å < ä <<< Ä << æ <<< Æ < ö <<< Ö << ø <<< Ø French, français (fr,fra) -- French alphabet: -- -- Aa (Àà, Ââ), (Ææ), Bb, Cc (Çç), Dd, -- Ee (Éé, Èè, Êê, Ëë), Ff, Gg, Hh, -- Ii (Îî, Ïï), Jj, [Kk], Ll, Mm, -- Nn (Ññ), Oo (Ôô), (Œœ), Pp, Qq, Rr, -- Ss, Tt, Uu (Ùù, Ûû), Vv, [Ww], Xx, -- Yy (Ÿÿ), Zz -- -- The ligatures Æ and Œ are ordered -- as A+E and O+E respectively. -- [accentorder backward] Unicode 4.1 Default Collation Order Unicode 4.1 Default Collation Order ISO/IEC 14651 -- Unicode default latin alphabet: -- -- Aa, Bb, Cc, Dd, Ee, Əə, Ff, Gg, Hh, -- Ii, ı, Jj, Kk, Ll, Mm, Nn, Ŋŋ, Oo, -- Pp, Qq, ĸ, Rr, Ss, Tt, Ŧŧ, Uu, Vv, -- Ww, Xx, Yy, Zz, Þþ -- -- Unicode default greek alphabet: -- -- Αα, Ββ, Γγ, Δδ, Εε, Ζζ, Ηη, Θθ, Ιι, -- Κκ, Λλ, Μμ, Νν, Ξξ, Οο, Ππ, Ρρ, Σσς, -- Ττ, Υυ, Φφ, Χχ, Ψψ, Ωω -- -- Unicode default cyrillic alphabet: -- -- Аа, Әә, Бб, Вв, Гг, Ғғ, Дд, Ђђ, Ѓѓ, -- Ее, Єє, Жж, Җҗ, Зз, Ѕѕ, Ии, Іі, Її, -- Йй, Јј, Кк, Ққ, Ҝҝ, Лл, Љљ, Мм, Нн, -- Ңң, Њњ, Оо, Өө, Пп, Рр, Сс, Тт, Ћћ, -- Ќќ, Уу, Ўў, Үү, Ұұ, Фф, Хх, Ҳҳ, Һһ, -- Цц, Чч, Ҹҹ, Џџ, Шш, Щщ, Ъъ, Ыы, Ьь, -- Ээ, Юю, Яя -- Some collation examples Latin alphabet

13 © 2006 Progress Software Corporation13 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  OpenEdge Database meta-schema Table _DB-collate –Already used for single-byte sort weights –New functionality used for summary information Table _Collation –Added in 10.0A in preparation –Can hold any amount of collation data Internals

14 © 2006 Progress Software Corporation14 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  ABL Usage Reference collation by name –For example “ICU-fr” for French  Specify using -cpcoll –Identifies collation table to use with code page in memory at session startup – is the collation table in convmap.cp or the name of the ICU collation ABL Statements –COMPARE –COLLATE

15 © 2006 Progress Software Corporation15 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  COMPARE and COLLATE new strengths supported 10.0A strengths: CASE-INSENSITIVE, CASE- SENSITIVE, CAPS and RAW  Added strengths PRIMARY SECONDARY = CASE-INSENSITIVE TERTIARY = CASE-SENSITIVE QUATERNARY

16 © 2006 Progress Software Corporation16 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting /* French collation */ DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-fr") /* Spanish collation */ DISPLAY “ICU-es = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-es") ICU-fr = yes ICU-es = no  Output of above statements Sort order depends on selected collation

17 © 2006 Progress Software Corporation17 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  OpenEdge uses collations for The –cpcoll startup parameter The database collation The collation of a database CLOB column An argument to the COMPARE function or COLLATE option of the BY phrase

18 © 2006 Progress Software Corporation18 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  Once a collation is specified for the database in the _Collation table, it cannot be modified  Once the collation is written to the _Collation table, it is the only collation with that name that can be used by that database  It is strongly recommended that databases should be backed up before using an ICU collation Rules

19 © 2006 Progress Software Corporation19 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  The following examples assume UTF-8 database with “basic” collation Names: –beet, carrot, çedilla, entry, école, trust, zoom FOR EACH words WHERE name < “t”: DISPLAY name. END. beet carrot entry  Output result Example 1 of 4

20 © 2006 Progress Software Corporation20 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting FOR EACH words WHERE name >= “t”: DISPLAY name. END. trust zoom école çedilla  Output result Example 2 of 4

21 © 2006 Progress Software Corporation21 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,“ICU-en”): DISPLAY name. END. beet carrot entry école çedilla  Output result Example 3 of 4 beet carrot entry  Before, without COMPARE

22 © 2006 Progress Software Corporation22 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,“ICU-en”) BY COLLATE(name,“case-insensitive”,“ICU-en”): DISPLAY name. END. beet carrot çedilla école entry Example 4 of 4  Output result  Before, without BY COLLATE beet carrot entry école çedilla

23 © 2006 Progress Software Corporation23 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting  OpenEdge supports ICU collations in the icui18n library for supported OpenEdge languages  ICU-ja__HQ = Japanese Hiragana Quaternary  One additional collation is supported - Japanese Hiragana Quaternary as case- sensitive Uses the QUATERNARY strength as the CASE-SENSITIVE strength Supported Collations

24 © 2006 Progress Software Corporation24 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting ICU Collations Available 1 of 3  ICU-UCAUCA (default Unicode Collation Algorithm)  ICU-arArabic  ICU-beBelarusian  ICU-bgBulgarian  ICU-caCatalan  ICU-csCzech  ICU-daDanish  ICU-de__PHONEBOOKGerman phonebook  ICU-elGreek  ICU-en_BEEnglish Belgium  ICU-eoEsperanto  ICU-esSpanish  ICU-es__TRADITIONALSpanish traditional  ICU-etEstonian  ICU-faPersian  ICU-fiFinnish  ICU-frFrench  ICU-guGujarati

25 © 2006 Progress Software Corporation25 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting ICU Collations Available 2 of 3  ICU-heHebrew  ICU-hiHindi  ICU-hi__DIRECTHindi direct  ICU-hrCroatian  ICU-huHungarian  ICU-isIcelandic  ICU-jaJapanese  ICU-koKorean  ICU-knKannada  ICU-ltLithuanian  ICU-lvLatvian  ICU-mkMacedonian  ICU-mrMarathi  ICU-mtMaltese  ICU-nbNorwegian Bokmål  ICU-nnNorwegian Nynorsk  ICU-plPolish  ICU-roRomanian

26 © 2006 Progress Software Corporation26 DEV-10: Supporting Multiple Languages In Your Application Linguistic Sorting ICU Collations Available 3 of 3  ICU-ruRussian  ICU-shSaint Helena  ICU-skSlovak  ICU-slSlovenian  ICU-sqAlbanian  ICU-srSerbian  ICU-svSwedish  ICU-taTamil  ICU-teTelugu  ICU-thThai  ICU-trTurkish  ICU-ukUkrainian  ICU-viVietnamese  ICU-zhChinese  ICU-zh__PINYINChinesePinyin  ICU-zh_HKChineseHong Kong  ICU-zh_MOChineseMacau  ICU-zh_TWChineseTaiwan

27 © 2006 Progress Software Corporation27 DEV-10: Supporting Multiple Languages In Your Application Collations Gotchas  If Database, Clients and Servers use different collations (-cpcoll), indexed and non-indexed queries may return different results  If a client needs different collation than database, you can use COMPARE, COLLATE on the client Performance impact with large results sets

28 © 2006 Progress Software Corporation28 DEV-10: Supporting Multiple Languages In Your Application Configuration Gotchas  Database code-page is 1252 on Windows server  OpenEdge install startup.pf setting is: –cpinternal 1252 –cpstream 1252  French Windows Client with a default Windows code page of 1252, and a DOS system code page of ibm850  DOS Character Client starts without specifying -cpinternal and –cpstream so uses 1252 from startup.pf Typical character client configuration, 1/2

29 © 2006 Progress Software Corporation29 DEV-10: Supporting Multiple Languages In Your Application Configuration Gotchas  User enters “è” (Hex 8A in ibm850)  Since session is started with –cpinternal 1252 OpenEdge doesn’t convert when writing to the database. The entered value is written to the database as 8A, when it should be E8 (1252)  Start Character Client with –cpinternal and –cpstream set to ibm850 Typical character client configuration, 2/2

30 © 2006 Progress Software Corporation30 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization  Unicode has different ways of expressing the same characters  Decomposed Á = (U+0041, Latin Capital Letter A) + (U+0301, Combining Acute Accent ´)  Composed Á = (U+00C1, Latin Capital Letter A with Acute) What is Normalization?

31 © 2006 Progress Software Corporation31 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization  XML (and other W3C entities) expects data in “NFC” form  Best way to convert from Unicode to other code pages  Useful when doing tasks such as making comparisons Why Normalization? NFC = Canonical Decomposition, followed by Canonical Composition

32 © 2006 Progress Software Corporation32 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization  NORMALIZE Returns either CHAR or LONGCHAR –Matches the source string CHAR variable must be UTF-8 LONGCHAR variable can be any form of Unicode –UTF-8, UTF-16, UTF-32 result-string = NORMALIZE(source-string, normalization-mode) NORMALIZE Language Function

33 © 2006 Progress Software Corporation33 DEV-10: Supporting Multiple Languages In Your Application Normalization Modes Supported  NFD: Canonical Decomposition  NFC: Canonical Decomposition, followed by Canonical Composition (default)  NFKD: Compatibility Decomposition  NFKC: Compatibility Decomposition, followed by Canonical Composition  None: No change to source string. Turns off normalization when normalization-mode is a variable Normalization modes from ICU library

34 © 2006 Progress Software Corporation34 DEV-10: Supporting Multiple Languages In Your Application Unicode Normalization  Unicode Normalization Forms Recommended for understanding normalization forms used with NORMALIZE function http://www.unicode.org/unicode/reports/tr15/  International Components for Unicode (ICU) libraries & globalization, in-depth information http://icu.sourceforge.net/userguide/intro.html Additional information

35 © 2006 Progress Software Corporation35 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables  Prior to 10.1A User had to configure word-break tables for use with double-byte and UTF-8 databases

36 © 2006 Progress Software Corporation36 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables  Default Word-Break Tables added for: Double-byte UTF-8 Databases  These are available ‘out of the box’ Either in product or for download  Simplifies accessing non-single-byte databases 10.1A simplifies implementing double-byte databases

37 © 2006 Progress Software Corporation37 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables  10.1A provides 10 compiled files See list on next slide Ranging from proword.245 to proword.254  Located in subdirectory with corresponding empty databases Subdirectory prolang/ 10.1A simplifies implementing double-byte databases

38 © 2006 Progress Software Corporation38 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables Compiled, Available out of the box  Available as part of the Supplemental PROMSGS package  Available for download Japanese SHIFT-JIS proword.253 Japanese EUCJIS proword.250 Korean CP949 proword.248 Korean KSC5601 proword.252 Chinese (simplified) CP936 proword.247 Chinese (simplified) GB2312 proword.251 Chinese (traditional) CP950 proword.249 Chinese (traditional) BIG-5 proword.246 Chinese (traditional) CP950-HKSCS proword.245 UTF-8 proword.254 10.1A simplifies implementing double-byte databases

39 © 2006 Progress Software Corporation39 DEV-10: Supporting Multiple Languages In Your Application Default Word-Break Tables  What if you are using proword file in the range of 245 – 254? Copy the file to proword. –Where is less than 240 Apply word rule to the database –No index-build is required for this change  Remember, apply the change in all tiers (Client, Server, Database) to prevent corruption!

40 © 2006 Progress Software Corporation40 DEV-10: Supporting Multiple Languages In Your Application Agenda  International support with OpenEdge 10  OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte  For more information, go to…  Summary This presentation includes annotations with additional, complementary information

41 © 2006 Progress Software Corporation41 DEV-10: Supporting Multiple Languages In Your Application For More Information, go to…  Expand to New Countries Business Empowerment Program Contact your Account Manager  Product documentation OpenEdge Development: Internationalizing Applications OpenEdge Development: Visual Translator OpenEdge Development: Translation Manager  Visit PSDN for white papers and presentations, for example: “Understanding Internationalization” web seminar  Training and Professional Services – www.progress.com

42 © 2006 Progress Software Corporation42 DEV-10: Supporting Multiple Languages In Your Application Agenda  International support with OpenEdge 10  OpenEdge internationalization update GB18030 Sorting and Collations Unicode Normalization Default word-break tables and double-byte  For more information, go to…  Summary This presentation includes annotations with additional, complementary information

43 © 2006 Progress Software Corporation43 DEV-10: Supporting Multiple Languages In Your Application In Summary  Use UTF-8  GB18030  Linguistic Sorting and Collations Use ICU-*  Unicode Normalization  Default word-break tables and double-byte  Expand to New Countries Business Empowerment Program

44 © 2006 Progress Software Corporation44 DEV-10: Supporting Multiple Languages In Your Application Questions?

45 © 2006 Progress Software Corporation45 DEV-10: Supporting Multiple Languages In Your Application Thank you for your time

46 © 2006 Progress Software Corporation46 DEV-10: Supporting Multiple Languages In Your Application


Download ppt "DEV-10: Supporting Multiple Languages In Your Application Salvador Viñals Consultant Product Manager."

Similar presentations


Ads by Google