DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT.

Slides:



Advertisements
Similar presentations
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
Advertisements

Designing Tables in Microsoft Access By Ed Lance.
Programming Tips for .NetUI
XHTML Basics.
QDV 7 Overview A powerful estimating tool designed to match up with your own specific methodologies.
DEV-13: You've Got a Problem, Here’s How to Find It
Solutions for Multilingual Literature by XSL Formatter 6,800 known languages.
MCT260-Operating Systems I Operating Systems I Using Text Editors.
מבנה מחשב תרגול 2 ייצוג תווים בחומרה. A programmer that doesn’t care about characters encoding in not much better than a medical doctor who doesn’t believe.
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
Computer Science 101 Web Access to Databases Overview of Web Access to Databases.
DEV-14: Understanding and Programming for the AppServer™
Translator Module Overview The new Translator Module for WebIEP offers a means for producing high quality IEP forms printed in Spanish. The module is.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
2.1.4 BINARY ASCII CHARACTER SETS A451: COMPUTER SYSTEMS AND PROGRAMMING.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
Ogden Air Logistics Center. Purpose of Excel2FV Many agencies produce point lists of different data (target lists, force locations, etc.) in either Excel.
Web Development Challenges and How They are Solved in ps:eScript Matt Verrinder Progress Software UK Internet & Integration Technologies.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Faculty: Anita Kanavalli Department of CSE M S Ramaiah Institute of Technology Bangalore E mail-
 2000 Deitel & Associates, Inc. All rights reserved. Chapter 24 – Web Servers (PWS, IIS, Apache, Jigsaw) Outline 24.1Introduction 24.2Microsoft Personal.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Module 3: Table Selection
DONE-10: Adminserver Survival Tips Brian Bowman Product Manager, Data Management Group.
IBM Maximo Asset Management © 2007 IBM Corporation Tivoli Technical Exchange Calls Aug 31, Maximo - Multi-Language Capabilities Ritsuko Beuchert.
ASCII and Unicode.
Agenda Data Representation – Characters Encoding Schemes ASCII
File Formats Chapter 9 Bit Literacy. File formats are often ignored by users Applications automatically save files in the application’s format All formats.
Ch 21 Command Syntax Using the DIR Command with Parameters and Wildcards.
Dr Masri Ayob TK 2633: Microprocessor & Interfacing Lecture 7: Assembly Language.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
OPS-25: Unicode and the DataServer
Hans-Peter Plag October 9, 2014 Session 2 Storing Information File Formats Accessing Information Processing Information.
DEV-5: Introduction to WebSpeed ® Stephen Ferguson Sr. Training Program Manager.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Midterm Hardware vs. Software Everyone got this right!
ARCH-4: The Presentation Layer in the OpenEdge® Reference Architecture Frank Beusenberg Senior Technical Consultant.
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
OPS-15: What was Happening with My Database, AppServer ™, OS... Yesterday, Last Month, Last Year? Libor LaubacherRuanne Cluer Principal Tech Support Engineer.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 1 BACKNEXTEND 1-1 LINKS TO OBJECTIVES Create and Open a Database Create and Open a Database.
Catch Me If You Can P ractical Structured Error Handling Peter van Dam.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
1 Information Management DIG 3563 – Lecture 14 Data Formats J. Michael Moshell University of Central Florida Original image* by Moshell et al. Imagery.
Week 7 Lecture 2 Globalization Support in the Database.
Introduction to Unix (CA263) File Editing By Tariq Ibn Aziz.
Unicode Normalize Engine Submitted by: Jose Yallouz Shlomi Ben-Shabat Supervisor: Maxim Gurevich.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University.
Differences Training BAAN IVc-BaanERP 5.0c: Application Administration, Customization and Exchange BaanERP 5.0c Tools / Exchange.
Understanding Character Encodings Basics of Character Encodings that all Programmers should Know. Pritam Barhate, Cofounder and CTO Mobisoft Infotech.
MISSION CRITICAL COMPUTING SQL Server Special Considerations.
Information Coding Schemes Group Member : Yvonne Tiffany Jurifah bt Junaidi Clara Jane George.
Unicode WTF is UTF? (for Secondary School Students) Jan Zidek Tieto Czech s.r.o. ☺ U+263A.
1 I18N testing Netbeans 6.8 beta 1.
DATA REPRESENTATION - TEXT
Binary Representation in Text
Binary Representation in Text
Unit 2.6 Data Representation Lesson 2 ‒ Characters
NEMO – Reformating tool
DEV-25: You've Got a Problem, Here’s How to Find It
Java programming lecture one
BASIC PHP and MYSQL Edward S. Flores.
Computer Science I CSC 135.
Presentation transcript:

DEV-23: Global Applications and Code Pages Jordi Sastre Application Architect PSC IT

© 2007 Progress Software Corporation 2 DEV-23: Global Applications and Code Pages Introduction  Global applications need to deal with several languages, countries and time zones  Do’s and don'ts about globalization using OpenEdge ® technology  Based on real experience from an IT department  Not a complete review of OpenEdge features

© 2007 Progress Software Corporation 3 DEV-23: Global Applications and Code Pages Agenda  Code Pages Overview  OpenEdge Settings  Common Mistakes  Hints & Tips  Linguistic Sorting and Collation  Time Zones  Summary  Questions

© 2007 Progress Software Corporation 4 DEV-23: Global Applications and Code Pages Code Pages Overview  Code page is a table that maps characters to numbers (code points)  ASCII was created in 1963 to encode 127 characters based on the English alphabet  ASCII = “American Standard Code for Information Interchange”  EBCDIC = “Extended Binary Coded Decimal Interchange Code”  8-bit code pages appeared for other languages, encoding up to 255 characters

© 2007 Progress Software Corporation 5 DEV-23: Global Applications and Code Pages Code Pages Overview  All code pages include the ASCII encoding in the first 127 code points, except EBCDIC  A single code page does not contain all characters for all languages, except Unicode  A character may have different code points in different code pages  Data may become corrupted when transferred between two different code pages

© 2007 Progress Software Corporation 6 DEV-23: Global Applications and Code Pages Data Corruption “è” France Czech Republic E8 “č” 1250ISO8859-1

© 2007 Progress Software Corporation 7 DEV-23: Global Applications and Code Pages Data Corruption  English uses the 127 codes that are common in all code pages, including Unicode  Problems may occur when: Handling non-English data Using platforms with non-English settings Pasting MS Office text, even in English

© 2007 Progress Software Corporation 8 DEV-23: Global Applications and Code Pages 8-bit Code Pages  ISO and ISO were defined by ISO and mainly used on Unix systems.  1250 and 1252 were defined by Microsoft and used on MS Windows.  IBM437, IBM850 and IBM852 were defined by IBM and used on PC-DOS/MS-DOS.

© 2007 Progress Software Corporation 9 DEV-23: Global Applications and Code Pages 8-bit Code Pages  ISO8859-1, IBM850 and 1252 are used for Western European languages: Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish, etc.  ISO8859-2, IBM852 and 1250 are used for Central European languages: Czech, Hungarian, Polish, German, etc.  IBM437 is mainly used for English, although it contains some extra characters

© 2007 Progress Software Corporation 10 DEV-23: Global Applications and Code Pages 8-bit Code Pages  Examples of character encoding: ISO8859-1ISO IBM437IBM850IBM852 a 61 á E1 A0 È C8 n/a C8 n/a D4 n/a Č C8 n/a C8 n/a AC “ n/a 93 n/a

© 2007 Progress Software Corporation 11 DEV-23: Global Applications and Code Pages 8-bit Code Pages  Where to find code page tables: 10.1B Internationalizing Applications manual (IBM850 and ISO8859-1) ibm.com/servers/eserver/iseries/software/globalization/co depages.html

© 2007 Progress Software Corporation 12 DEV-23: Global Applications and Code Pages Unicode What is Unicode? Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.  ISO/IEC  It covers virtually ALL characters in the world!

© 2007 Progress Software Corporation 13 DEV-23: Global Applications and Code Pages Unicode and UTF  Unicode stands for “Unique Code”  UTF stands for “Unicode Transformation Format”  UTF is not a code page, but an encoding format for the Unicode code page  UTF encodes Unicode codes into 1 to 4 bytes  UTF-8, UTF-16 and UTF-32 are the three basic encoding forms supported by Unicode  All UTF formats handle all Unicode codes

© 2007 Progress Software Corporation 14 DEV-23: Global Applications and Code Pages UTF Encoding Examples UnicodeUTF-8UTF-16UTF-32 U+004D 4D00 4D D U+00A1 C2 A100 A A1 U+00E1 C3 A100 E E1 U+0470 D0 C U+4E9C E4 BA 9C4E 9C E 9C U F0 90 9C 82D8 00 DF

© 2007 Progress Software Corporation 15 DEV-23: Global Applications and Code Pages Unicode Conversion  All code pages convert to Unicode  Unicode may not convert to other code pages Unicode IBM437 IBM852 IBM ISO ISO IBM437 IBM852 IBM ISO ISO ?

© 2007 Progress Software Corporation 16 DEV-23: Global Applications and Code Pages Agenda  Code Pages Overview  OpenEdge Settings  Common Mistakes  Hints & Tips  Linguistic Sorting and Collation  Time Zones  Summary  Questions

© 2007 Progress Software Corporation 17 DEV-23: Global Applications and Code Pages OpenEdge Settings  Database settings _db._db-xl-name: Database code page _db._db-coll-name: Database collation  Startup parameters -cpinternal: Process code page -cpstream: Input/Output code page -cpcoll: Process collation -d: Date format -E: Numeric format

© 2007 Progress Software Corporation 18 DEV-23: Global Applications and Code Pages More OpenEdge Settings  -cplog: Code page for log files (-cpstream)  -cpterm: Code page for screen I/O (-cpstream)  -cpprint: Code page for printing (-cpstream)  -numsep: Separator for thousands (-E)  -numdec: Separator for decimals (-E)  -cprcodein/-cprcodeout: Code page for compiled code (-cpinternal)  -lng: Translation Manager language

© 2007 Progress Software Corporation 19 DEV-23: Global Applications and Code Pages Even More OpenEdge Settings  convmap.cp: Character Processing Tables  progress.ini: Fonts (More parameters in documentation)

© 2007 Progress Software Corporation 20 DEV-23: Global Applications and Code Pages OpenEdge Settings OpenEdge Process OS files Keyboard Screen -cpinternal -cpstream Database _db-xl-name OpenEdge code page conversions ! OpenEdge code page conversions ! Printer GUI CHUI _db-xl-name, -cpinternal and -cpstream

© 2007 Progress Software Corporation 21 DEV-23: Global Applications and Code Pages DB SERVER _mprosrv DB SERVER _mprosrv OS files -cpinternal -cpstream OpenEdge Settings GUI CLIENT prowin32 GUI CLIENT prowin32 OS files Keyboard Screen -cpinternal -cpstream Database _db-xl-name WEBSPEED™ _progres -web WEBSPEED™ _progres -web OS files Web Browser Web Browser -cpinternal -cpstream APPSERVER ™ _proapsv APPSERVER ™ _proapsv OS files -cpinternal -cpstream Printer CHUI CLIENT _progres CHUI CLIENT _progres OS files Keyboard Screen -cpinternal -cpstream Printer

© 2007 Progress Software Corporation 22 DEV-23: Global Applications and Code Pages OpenEdge Settings  Since OpenEdge 10 supports UTF-8 in most processes…  … just configure all OE settings to UTF-8 !  Well, not really. We need to look at: Operating System Web Server Printer drivers Data from/to other systems OCX’s Terminal Emulators, etc.

© 2007 Progress Software Corporation 23 DEV-23: Global Applications and Code Pages DB SERVER _mprosrv DB SERVER _mprosrv OS files -cpinternal -cpstream OpenEdge Settings GUI CLIENT prowin32 GUI CLIENT prowin32 OS files Keyboard Screen -cpinternal -cpstream Database _db-xl-name WEBSPEED™ _progres -web WEBSPEED™ _progres -web OS files Web Browser Web Browser -cpinternal -cpstream APPSERVER ™ _proapsv APPSERVER ™ _proapsv OS files -cpinternal -cpstream Printer CHUI CLIENT _progres CHUI CLIENT _progres OS files Keyboard Screen -cpinternal -cpstream Printer

© 2007 Progress Software Corporation 24 DEV-23: Global Applications and Code Pages OpenEdge Settings  Database should use Unicode (UTF-8) to ensure support for all characters _db-xl-name (metaschema field)

© 2007 Progress Software Corporation 25 DEV-23: Global Applications and Code Pages OpenEdge Settings  Processes should use Unicode to ensure support for all characters  Best if -cpinternal matches database  Batch Client (_progres –b) can use Unicode, but Character Client (_progres) cannot  Interfaces with Windows controls -cpinternal (startup parameter)

© 2007 Progress Software Corporation 26 DEV-23: Global Applications and Code Pages OpenEdge Settings  -cpstream is the main cause of data corruption when set incorrectly  It tells the code page of input/output data from/to files  On Character Client it also tells the code page of keyboard and screen  Rule of thumb: Set -cpstream to match the Operating System code page Use ABL to override -cpstream when needed -cpstream (startup parameter)

© 2007 Progress Software Corporation 27 DEV-23: Global Applications and Code Pages OpenEdge Settings -cpstream (startup parameter)  Unix/Linux code page  DOS code page C:\>mode con cp Status for device CON: Code page: 437 % locale charmap ISO8859-1

© 2007 Progress Software Corporation 28 DEV-23: Global Applications and Code Pages OpenEdge Settings  Contains the Character Processing Tables  DLC/convmap.cp  DLC/prolang/convmap/convmap.dat  OpenEdge 10.1B out of the box contains: 54 code pages 595 code page conversion tables 491 collation tables  More tables can be added convmap.cp (OpenEdge file)

© 2007 Progress Software Corporation 29 DEV-23: Global Applications and Code Pages OpenEdge Settings  Use appropriate fonts for code page and language: Recommended to replace MS Sans Serif with Microsoft Sans Serif MS Gothic or MS Mincho for Japanese MS Song for Chinese Use script when needed font0=Courier New, size=8, script=russian font0=Courier New, size=8, script=easteurope progress.ini (OpenEdge file)

© 2007 Progress Software Corporation 30 DEV-23: Global Applications and Code Pages Not an OpenEdge Setting  Linked fonts  Information about Windows fonts: Windows Fonts

© 2007 Progress Software Corporation 31 DEV-23: Global Applications and Code Pages OpenEdge Settings  Meet requirements for Input/Output: -cpinternal for process and GUI I/O (UTF-8) -cpstream for file I/O and CHUI I/O (OS)  Decide the code page when exporting data  Know the code page when importing data Summary OpenEdge Process OS files Keyboard Screen -cpinternal -cpstream Database _db-xl-name Printer GUI CHUI

© 2007 Progress Software Corporation 32 DEV-23: Global Applications and Code Pages Agenda  Code Pages Overview  OpenEdge Settings  Common Mistakes  Hints & Tips  Linguistic Sorting and Collation  Time Zones  Summary  Questions

© 2007 Progress Software Corporation 33 DEV-23: Global Applications and Code Pages Common Mistakes Loading or importing data with the wrong code page C4 8C 7A C4 8C 7A Čzech ÄŚzech Čzech ISO UTF

© 2007 Progress Software Corporation 34 DEV-23: Global Applications and Code Pages Byte Order Mark (BOM)  Identifies the UTF encoding of a data file  Unicode code point U+FEFF  U+FEFF is also encoded:  UTF-8: EF BB BF  UTF-16BE: FE FF  UTF-16LE: FF FE  UTF-32BE: FE FF  UTF-32LE: FF FE  OpenEdge understands BOMs when reading

© 2007 Progress Software Corporation 35 DEV-23: Global Applications and Code Pages Byte Order Mark (BOM) EF BB DF C4 8C 7A EF BB DF C4 8C 7A Čzech ISO UTF Caution Caution !

© 2007 Progress Software Corporation 36 DEV-23: Global Applications and Code Pages (…) "imuller" "Ian Muller" "Y" "C" "jdoe" "Jane Doe" "N" "U" "jsmith" "John Smith" "Y" "C" "jsanchez" "Juan Sánchez" "Y" "C" PSC filename=users records= ldbname=mydatabase timestamp=2007/03/28-20:55:03 numformat=44,46 dateformat=mdy-1950 map=NO-MAP cpstream=ISO (…) "imuller" "Ian Muller" "Y" "C" "jdoe" "Jane Doe" "N" "U" "jsmith" "John Smith" "Y" "C" "jsanchez" "Juan Sánchez" "Y" "C" PSC filename=users records= ldbname=mydatabase timestamp=2007/03/28-20:55:03 numformat=44,46 dateformat=mdy-1950 map=NO-MAP cpstream=ISO Common Mistakes Loading or importing data with the wrong code page

© 2007 Progress Software Corporation 37 DEV-23: Global Applications and Code Pages _progres E0 -cpstream IBM850 Common Mistakes Updating data with the wrong code page _mprosrv OS = 1252 à à Ó E0 D3 E0 D3 -cpinternal IBM850 -cpinternal ISO _db-xl-name ISO8859-1

© 2007 Progress Software Corporation 38 DEV-23: Global Applications and Code Pages _progres E0 -cpstream 1252 Common Mistakes Updating data with the CORRECT code page _mprosrv OS = 1252 à à à E0 85 E0 -cpinternal IBM850 -cpinternal ISO _db-xl-name ISO8859-1

© 2007 Progress Software Corporation 39 DEV-23: Global Applications and Code Pages _progres –web Common Mistakes Updating data with the wrong code page -cpstream UTF-8

© 2007 Progress Software Corporation 40 DEV-23: Global Applications and Code Pages Common Mistakes Incorrect tools to verify data  Notepad sometimes guesses the code page based on the content  Notepad understands BOM, Excel doesn’t  Startup parameters in Procedure Editor  Fonts in progress.ini  Terminal Emulator needs to be configured to support remote OS code page  Use an Hexadecimal Editor  Two wrongs may make it look right

© 2007 Progress Software Corporation 41 DEV-23: Global Applications and Code Pages Agenda  Code Pages Overview  OpenEdge Settings  Common Mistakes  Hints & Tips  Linguistic Sorting and Collation  Time Zones  Summary  Questions

© 2007 Progress Software Corporation 42 DEV-23: Global Applications and Code Pages Tips & Hints  When starting development, make sure all the components have the correct code page settings  Each application may need different code page settings  When integrating, review the code page settings of all applications and processes involved Development and Integration

© 2007 Progress Software Corporation 43 DEV-23: Global Applications and Code Pages Tips & Hints How to display the code page settings: MESSAGE "Database = " DBCODEPAGE(1) SKIP "Collation = " DBCOLLATION(1) SKIP "-cpinternal = " SESSION:CPINTERNAL SKIP "-cpstream = " SESSION:CPSTREAM SKIP "-cpcoll = " SESSION:CPCOLL SKIP VIEW-AS ALERT-BOX.

© 2007 Progress Software Corporation 44 DEV-23: Global Applications and Code Pages Tips & Hints  Temp-tables use their own word-break tables for word indexes  Use -ttwrdrul parameter Temp-tables using Word Indexes Database Word Break Table Progress clients prowin32 _progres [-web] Progress clients prowin32 _progres [-web] proutil -C wbreak-compiler -ttwrdrulproutil -C word-rules

© 2007 Progress Software Corporation 45 DEV-23: Global Applications and Code Pages Tips & Hints  When using OUTPUT TO, know the code page you need the output to be converted to, which will be dependant on how the file will be used  When using INPUT FROM, know in what code page the imported data was encoded  To override the -cpstream default: OUTPUT TO file CONVERT TARGET "UTF-8". INPUT FROM file CONVERT SOURCE "UTF-8".  Stamp code page, especially for integration Input/Output

© 2007 Progress Software Corporation 46 DEV-23: Global Applications and Code Pages Tips & Hints  Many UTF-8 characters are more than one byte: returns UTF-8 can be multi-byte! DEFINE VARIABLE c AS CHARACTER INIT "á". MESSAGE LENGTH(c) SKIP LENGTH(c,"RAW") VIEW-AS ALERT-BOX.

© 2007 Progress Software Corporation 47 DEV-23: Global Applications and Code Pages Tips & Hints  Use CHR() and ASC() with code page parameters  Do not hard-code encoding values  See examples… CHR() and ASC()

© 2007 Progress Software Corporation 48 DEV-23: Global Applications and Code Pages Tips & Hints  Detecting non-breaking blank spaces (NBSP)  Better code: CHR() and ASC() – Example 1 CASE SESSION:CPINTERNAL: WHEN "UTF-8" THEN IF c = CHR(49824) THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX. WHEN "ISO8859-1" THEN IF c = CHR(160) THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX. END CASE. IF c = CHR(49824,SESSION:CPINTERNAL,"UTF-8") THEN MESSAGE "NBSP" VIEW-AS ALERT-BOX.

© 2007 Progress Software Corporation 49 DEV-23: Global Applications and Code Pages Tips & Hints  OpenEdge silently ignores incorrect values to ASC() or CHR() CHR() and ASC() – Example 2 /* When run with –cpinternal UTF-8 it returns YES because 160 is not a valid UTF-8 encoding. When run with –cpinternal 1252 it returns NO. */ MESSAGE CHR(160) = "" VIEW-AS ALERT-BOX. /* Always returns NO */ MESSAGE CHR(49824,SESSION:CPINTERNAL,"UTF-8") = "" VIEW-AS ALERT-BOX.

© 2007 Progress Software Corporation 50 DEV-23: Global Applications and Code Pages Tips & Hints  CHR() and ASC() work with encoding values, as opposed to code points  For example, this code run on a session with -cpinternal UTF-8 returns (C3A1) and not 225 (00E1). UnicodeUTF-8 U+00E1 C3 A1 CHR() and ASC() – Example 3 DEFINE VARIABLE c AS CHARACTER NO-UNDO. c = "á". MESSAGE ASC(c) VIEW-AS ALERT-BOX.

© 2007 Progress Software Corporation 51 DEV-23: Global Applications and Code Pages Tips & Hints  If needed, Unicode code points can be used: Unicode code points DEFINE VARIABLE c AS CHARACTER NO-UNDO. c = "á". MESSAGE c = "~u00E1" SKIP c = CHR(50081) SKIP c = CHR(225,"UTF-8","1252") VIEW-AS ALERT-BOX.

© 2007 Progress Software Corporation 52 DEV-23: Global Applications and Code Pages _progres E0 -cpstream IBM850 Tips & Hints -mprosrv OS = 1252 à à Ó E0 D3 E0 D3 -cpinternal IBM850 -cpinternal ISO _db-xl-name ISO Un-corrupting data

© 2007 Progress Software Corporation 53 DEV-23: Global Applications and Code Pages Tips & Hints Un-corrupting data FOR EACH myTable EXCLUSIVE-LOCK. RUN FixChar(INPUT-OUTPUT myTable.myField). END. PROCEDURE FixChar: DEF INPUT-OUTPUT PARAM c AS CHAR NO-UNDO. c = CODEPAGE-CONVERT(c,"IBM850","ISO8859-1"). END PROCEDURE.  ISO database with data encoded in IBM850  Run on session with -cpinternal iso8859-1

© 2007 Progress Software Corporation 54 DEV-23: Global Applications and Code Pages Tips & Hints  How to output UTF-8 BOM to a file  Intended for Notepad (.txt) or web browser (.html) BOM OUTPUT TO text.txt CONVERT TARGET "UTF-8". PUT CONTROL "~357~273~277". /* BOM */ PUT UNFORMATTED "UTF-8 text". OUTPUT CLOSE.

© 2007 Progress Software Corporation 55 DEV-23: Global Applications and Code Pages Tips & Hints Web browser needs to map WebSpeed’s -cpstream  Original outputHeader procedure: PROCEDURE outputHeader: output-content-type ("text/html"). END PROCEDURE. _progres –web -cpstream UTF-8Encoding ??? Web Browser

© 2007 Progress Software Corporation 56 DEV-23: Global Applications and Code Pages Tips & Hints Web browser needs to map WebSpeed’s -cpstream (1)  Use OpenEdge’s convcp.p procedure PROCEDURE outputHeader: DEF VAR cMimeCP AS CHAR NO-UNDO. RUN adecomm/convcp.p(SESSION:CPSTREAM, "ToMime", OUTPUT cMimeCP). output-content-type ("text/html; charset=" + cMimeCP). END PROCEDURE.

© 2007 Progress Software Corporation 57 DEV-23: Global Applications and Code Pages Tips & Hints Web browser needs to map WebSpeed’s –cpstream (2)  Use User Defined Function  GetMimeCP converts OpenEdge code page names to MIME names  See example… PROCEDURE outputHeader: output-content-type ("text/html; charset=" + GetMimeCP(SESSION:CPSTREAM)). END PROCEDURE.

© 2007 Progress Software Corporation 58 DEV-23: Global Applications and Code Pages FUNCTION GetMimeCP RETURNS CHAR (INPUT progress-CodePage AS CHAR): DEF VAR pro-cplist AS CHAR INIT "1250,1251,1252,1253,1254,1255,1256,1257,1258, ,BIG-5, EUCJIS,GB2312,IBM037,IBM273,IBM277,IBM278,IBM284,IBM297,IBM437, IBM500,IBM850,IBM851,IBM852,IBM857,IBM858,IBM861,IBM862,IBM866, ISO8859-1,ISO ,ISO ,ISO8859-2,ISO8859-3, ISO8859-4,ISO8859-5,ISO8859-6,ISO8859-7,ISO8859-8,ISO8859-9, KOI8-R,KSC5601,ROMAN-8,SHIFT-JIS,UCS2,UTF-8". DEF VAR MIME-cplist AS CHAR INIT "Windows-1250,Windows-1251,Windows-1252,Windows-1253,Windows-1254, Windows-1255,Windows-1256,Windows-1257,Windows-1258,TIS-620,Big5, EUC-JP,GB_ ,IBM037,IBM273,IBM277,IBM278,IBM284,IBM297, IBM437,IBM500,IBM850,IBM851,IBM852,IBM857,IBM00858,IBM861,IBM862, IBM866,ISO ,ISO ,ISO ,ISO ,ISO , ISO ,ISO ,ISO ,ISO ,ISO ,ISO , KOI8-R,KS_C_ ,hp-roman8,Shift_JIS,UTF-16,UTF-8". DEF VAR i AS INT. i = LOOKUP(progress-CodePage,pro-cplist). RETURN IF i = 0 THEN "Unknown" ELSE ENTRY(i,MIME-cplist). END FUNCTION. Tips & Hints GetMimeCP example

© 2007 Progress Software Corporation 59 DEV-23: Global Applications and Code Pages  Do not store decimal values in char fields  prog2.p will fail if run with a different -E or - numdec than prog1.p  Comma-delimited lists Tips & Hints Caution with numeric format /* prog1.p */ DEFINE VARIABLE d AS DECIMAL INIT CREATE table. table.char1 = STRING(d). /* prog2.p */ FIND FIRST table. DISPLAY DECIMAL(table.char1).

© 2007 Progress Software Corporation 60 DEV-23: Global Applications and Code Pages Tips & Hints Date and Numeric formats can be changed at run time DEFINE VARIABLE mynum AS DECIMAL NO-UNDO. SESSION:DATE-FORMAT = "mdy". DISPLAY SESSION:DATE-FORMAT TODAY SKIP. SESSION:DATE-FORMAT = "dmy". DISPLAY SESSION:DATE-FORMAT TODAY FORMAT " " SKIP. SESSION:DATE-FORMAT = "ymd". DISPLAY SESSION:DATE-FORMAT TODAY FORMAT " " SKIP. mynum = SESSION:NUMERIC-FORMAT = "American". DISPLAY SESSION:NUMERIC-FORMAT STRING(mynum) SKIP. SESSION:NUMERIC-FORMAT = "European". DISPLAY SESSION:NUMERIC-FORMAT STRING(mynum) SKIP WITH NO-LABELS.

© 2007 Progress Software Corporation 61 DEV-23: Global Applications and Code Pages Tips & Hints  Never use the “undefined” code page  If the source and target code pages are the same, no conversion happens  If we always make the same mistake we’ll not notice the data corruption  r-code is encoded using -cpinternal  Source files are encoded using -cpstream  Recognize UTF-8 read as iso8859-1: ö becomes ö Miscellaneous

© 2007 Progress Software Corporation 62 DEV-23: Global Applications and Code Pages Tips & Hints  How to create a UTF-8 word-break table: > proutil -C wbreak-compiler %DLC%\prolang\convmap\utf8-bas.wbt 1 > copy proword.1 %DLC%  How to create a UTF-8 database: > prodb %DLC%\prolang\utf\empty.db > proutil -C word-rules 1  How to start a UTF-8 client: > _progres -b –cpinternal UTF-8 -ttwrdrul 1 > prowin32 –cpinternal UTF-8 -ttwrdrul 1 DBA reminder

© 2007 Progress Software Corporation 63 DEV-23: Global Applications and Code Pages Agenda  Code Pages Overview  OpenEdge Settings  Common Mistakes  Hints & Tips  Linguistic Sorting and Collation  Time Zones  Summary  Questions

© 2007 Progress Software Corporation 64 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation  Collation: Set of rules for ordering and comparing character data  OpenEdge supports 54 ICU (International Components for Unicode) collations with UTF-8  Local databases vs global databases  COMPARE and COLLATE

© 2007 Progress Software Corporation 65 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation FOR EACH mytable BY myfield: DISPLAY myfield WITH FONT 8. END. Sorting with Basic collation Aaa Ááá Äää Ççç Ĉĉĉ Bbb Ccc Zzz Basic

© 2007 Progress Software Corporation 66 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation FOR EACH mytable BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-UCA"): DISPLAY myfield WITH FONT 8. END. Sorting with English collation Aaa Ááá Äää Bbb Ccc Ĉĉĉ Ççç Zzz Aaa Ááá Äää Ççç Ĉĉĉ Bbb Ccc Zzz BasicICU-UCA

© 2007 Progress Software Corporation 67 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation FOR EACH mytable BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8. END. Sorting with Finnish collation Aaa Ááá Bbb Ccc Ĉĉĉ Ççç Zzz Äää Aaa Ááá Äää Ççç Ĉĉĉ Bbb Ccc Zzz BasicICU-fi Aaa Ááá Äää Bbb Ccc Ĉĉĉ Ççç Zzz ICU-UCA

© 2007 Progress Software Corporation 68 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation FOR EACH mytable WHERE myfield >= "C" BY myfield: DISPLAY myfield WITH FONT 8. END. Comparing with Basic collation Ccc Zzz Basic

© 2007 Progress Software Corporation 69 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation FOR EACH mytable WHERE COMPARE(myfield,">=","C", "CASE-INSENSITIVE","ICU-UCA") BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-UCA"): DISPLAY myfield WITH FONT 8. END. Comparing with English collation Ccc Zzz Basic Ccc Ĉĉĉ Ççç Zzz ICU-UCA

© 2007 Progress Software Corporation 70 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation FOR EACH mytable WHERE COMPARE(myfield,">=","C", "CASE-INSENSITIVE","ICU-fi") BY COLLATE(myfield,"CASE-INSENSITIVE","ICU-fi"): DISPLAY myfield WITH FONT 8. END. Comparing with Finnish collation Ccc Ĉĉĉ Ççç Zzz Äää Ccc Zzz BasicICU-fi Ccc Ĉĉĉ Ççç Zzz ICU-UCA

© 2007 Progress Software Corporation 71 DEV-23: Global Applications and Code Pages Linguistic Sorting and Collation Global Setup Database -cpcoll ICU-uca AppServer English User French User Czech User Finnish User TEMP- TABLES TEMP- TABLES TEMP- TABLES TEMP- TABLES TEMP- TABLES TEMP- TABLES TEMP- TABLES TEMP- TABLES -cpcoll ICU-en -cpcoll ICU-fr -cpcoll ICU-cs -cpcoll ICU-fi -cpcoll ICU-uca --- Uses client collation in COMPARE and COLLATE -cpcoll ICU-uca --- Uses client collation in COMPARE and COLLATE RUN ASprg.p ON hAppServer (INPUT SESSION:CPCOLL, INPUT USERID, INPUT, OUTPUT TABLE ttMytable). Caution with performance!

© 2007 Progress Software Corporation 72 DEV-23: Global Applications and Code Pages Agenda  Code Pages Overview  OpenEdge Settings  Common Mistakes  Hints & Tips  Linguistic Sorting and Collation  Time Zones  Summary  Questions

© 2007 Progress Software Corporation 73 DEV-23: Global Applications and Code Pages Time Zones  Timestamps: client vs server vs GMT  Display time: saved vs converted  Database queries: saved vs converted Considerations

© 2007 Progress Software Corporation 74 DEV-23: Global Applications and Code Pages Time Zones DST used DST no longer used DST never used Extra consideration  Daylight Saving Time for time conversions

© 2007 Progress Software Corporation 75 DEV-23: Global Applications and Code Pages Time Zones  Operating Systems have time zone tables Solaris: /usr/share/lib/zoneinfo HP-UX: /usr/lib/tztab Red Hat: /usr/share/zoneinfo Windows: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones  Java uses its own time zone tables  OpenEdge relies on the platform OS Support

© 2007 Progress Software Corporation 76 DEV-23: Global Applications and Code Pages Time Zones DATETIME and DATETIME-TZ data types DEFINE VARIABLE dt AS DATETIME. DEFINE VARIABLE dtz AS DATETIME-TZ. dt = NOW. dtz = NOW. MESSAGE dt SKIP dtz VIEW-AS ALERT-BOX. This is offset, not Time Zone ! This is offset, not Time Zone !

© 2007 Progress Software Corporation 77 DEV-23: Global Applications and Code Pages Time Zones Timestamping Database All times are GMT All times are GMT AppServer User Gets OS time in GMT Gets OS time in GMT Converts GMT To User’s Time Zone Converts GMT To User’s Time Zone

© 2007 Progress Software Corporation 78 DEV-23: Global Applications and Code Pages Time Zones Displaying times Database GMT Times AppServer User Converts GMT To User’s Time Zone Converts GMT To User’s Time Zone User 12:30 GMT 08:30 14:30 22:30 (-1) 07:30 (-1) 13:30 (0) 22:30 (+1) 23:30 SummerWinter Bedford USA Berlin Germany Brisbane Australia Sydney Australia

© 2007 Progress Software Corporation 79 DEV-23: Global Applications and Code Pages Time Zones users 10 user-id C X(8) User ID 20 tz-id C X(4) Time zone ID timezones 10 tz-id C X(4) Time zone ID 20 tz-name C X(40) Time zone name tz-changes 10 tz-id C X(4) Time zone ID 20 tz-date D 99/99/9999 Date that the changes apply from 30 min-1 I ->>>9 Normal minutes of difference from GMT 40 min-2 I ->>>9 Minutes of difference from GMT during DST 50 from-month I >9 Month when DST starts 60 from-day I 9 Code for day when DST starts 70 from-time C 99:99 Time when DST starts 80 to-month I >9 Month when DST ends 90 to-day I 9 Code for day when DST ends 100 to-time C 99:99 Time when DST ends Database tables

© 2007 Progress Software Corporation 80 DEV-23: Global Applications and Code Pages Time Zones ABL functions  GetGMT() to get current time in GMT FUNCTION GetGMT RETURNS DATETIME (): DEF VAR dtGMT AS DATETIME NO-UNDO. dtGMT = ADD-INTERVAL(NOW,- TIMEZONE,'MINUTES'). RETURN dtGMT. END FUNCTION.

© 2007 Progress Software Corporation 81 DEV-23: Global Applications and Code Pages Time Zones ABL functions  ConvertDT() to convert GMT to user’s time FUNCTION ConvertDT RETURNS DATETIME (INPUT pdtNow AS DATETIME NO-UNDO, INPUT pcTz-id AS CHARACTER NO-UNDO): DEF VAR dtOut AS DATETIME NO-UNDO. FIND LAST tz-change NO-LOCK WHERE tz-change.tz-id = pcTz-id AND tz-change.tz-date <= DATE(pdtNow) NO-ERROR. (...) RETURN dtOut. END FUNCTION.

© 2007 Progress Software Corporation 82 DEV-23: Global Applications and Code Pages Agenda  Code Pages Overview  OpenEdge Settings  Common Mistakes  Hints & Tips  Linguistic Sorting and Collation  Time Zones  Summary  Questions

© 2007 Progress Software Corporation 83 DEV-23: Global Applications and Code Pages Summary  UTF-8 for database and -cpinternal as a start  Know the code page of data getting into and out of OpenEdge (-cpstream / CONVERT )  Two wrongs may make it look right  It’s not only about conversion, but checking results as well – Use hexadecimal tools  Take a look at the 10.1B Internationalizing Applications manual  Code Pages are tricky, but fun !

© 2007 Progress Software Corporation 84 DEV-23: Global Applications and Code Pages Questions?

© 2007 Progress Software Corporation 85 DEV-23: Global Applications and Code Pages Thank you for your time

© 2007 Progress Software Corporation 86 DEV-23: Global Applications and Code Pages