Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.

Slides:



Advertisements
Similar presentations
Murray Sargent III Microsoft Corporation Text Services Group, Word Tips & Tricks on Editing and Displaying Unicode Text.
Advertisements

Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
PACS – 11/16/13 1 Unicode With everything becoming globalized these days, more characters to represent a wider array of languages than just English are.
NLS and The Case of the Missing Kanji Brian Hitchcock OCP DBA 8, 8i, 9i Global Sales IT Sun Microsystems NoCOUG.
IT Systems What Number? EN230-1 Justin Champion C208 –
Representing Information as Bit Patterns
MCT260-Operating Systems I Operating Systems I Using Text Editors.
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
מבנה מחשב תרגול 2 ייצוג תווים בחומרה. A programmer that doesn’t care about characters encoding in not much better than a medical doctor who doesn’t believe.
Lecture 3 1 ISO/IEC and Unicode It is a coded character set(codeset) –Designed for text processing and exchange Features: –Universal: characters.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using XML Parsers and Unicode Ellen Pearlman Eileen Mullin Programming.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
2.1.4 BINARY ASCII CHARACTER SETS A451: COMPUTER SYSTEMS AND PROGRAMMING.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky Veronika.
ENCODING AND DECODING Experiencing one (or more) bytes out of your A’s.
Introduction to Computing Using Python Chapter 6  Encoding of String Characters  Randomness and Random Sampling.
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
AITI Tutorial: Internationalization Coding for the world MIT AITI July NNth, 2005.
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
Unicode (and Java) Brice Giesbrecht.
ASCII and Unicode.
Encoding and fonts Edward Garrett Software Developer, ELAR.
Representing text Each of different symbol on the text (alphabet letter) is assigned a unique bit patterns the text is then representing as.
Agenda Data Representation – Characters Encoding Schemes ASCII
Lecture 2 Character Codes and Low-Structure Text Document Formats.
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
Spring /6.831 User Interface Design and Implementation1 Lecture 22: Internationalization.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
Chapter 3: The UNIX Editors ASCII and vi Editors.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Character representation.
Characters In Java single characters are represented using the data type char. Character constants are written as symbols enclosed in single quotes, for.
Text and Graphics September 26, Unit 3.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Computer System Basics 1 Number Systems & Text Representation Computer Forensics BACS 371.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
SEC (1.4) Representing Information as bit patterns.
Representing Characters in a computer Pressing a key on the computer a code is generated that the computer can convert into a symbol for displaying or.
Week 7 Lecture 2 Globalization Support in the Database.
Chapter Three The UNIX Editors.
Lis508 lecture 2: characters to textual documents Thomas Krichel
Representation of Characters
CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University.
Understanding Character Encodings Basics of Character Encodings that all Programmers should Know. Pritam Barhate, Cofounder and CTO Mobisoft Infotech.
1 Problem Solving using Computers “Data....Representation, and Storage.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Characters CS240.
Information Coding Schemes Group Member : Yvonne Tiffany Jurifah bt Junaidi Clara Jane George.
Searching, Modifying, and Encoding Text. Parts: 1) Forming Regular Expressions 2) Encoding and Decoding.
1.4 Representation of data in computer systems Character.
1 Non-Numeric Data Representation V1.0 (22/10/2005)
Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text
Lesson Objectives Aims You should be able to:
Representing Information as bit patterns
Data Encoding Characters.
TOPICS Information Representation Characters and Images
COMS 161 Introduction to Computing
INFOCODING BASICS & EXAMPLES OF CURRENT USE
Comp Org & Assembly Lang
C Programming Language
ASCII and Unicode.
Introduction to UNICODE (ஒருங்குறி)
Presentation transcript:

Localizing OpenClinica Hiroaki Honshuku: SQA 1

© What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard Code for Information Interchange  Characters, Numerals, Symbols, Control Characters  7-bit: 0~127  0x41 = letter ‘A’, 0x61 = letter ‘a’  ISO-8859-n  8-bit:  iso : Latin-1, covers most of European Language  iso : Cyrillic alphabet  No CJK (Chinese, Japanese, Korean) support  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard Code for Information Interchange  Characters, Numerals, Symbols, Control Characters  7-bit: 0~127  0x41 = letter ‘A’, 0x61 = letter ‘a’  ISO-8859-n  8-bit:  iso : Latin-1, covers most of European Language  iso : Cyrillic alphabet  No CJK (Chinese, Japanese, Korean) support 2

© What is Character Encoding (cont.)  iso versus iso iso iso A0x65A0x176 B0x66B0x178

© What is Character Encoding (cont.)  iso versus iso  CJK Encoding Mess  Chinese: Big5 (Traditional), GB18030 (Simplified)  Japanese: iso-2022-JP, EUC-JP, Shift-JIS  Korean: EUC-KR, KS C 5861  iso versus iso  CJK Encoding Mess  Chinese: Big5 (Traditional), GB18030 (Simplified)  Japanese: iso-2022-JP, EUC-JP, Shift-JIS  Korean: EUC-KR, KS C iso iso A0x65A0x176 B0x66B0x178

© What is Character Encoding (cont.)  iso versus iso  CJK Encoding Mess  Chinese: Big5 (Traditional), GB18030 (Simplified)  Japanese: iso-2022-JP, EUC-JP, Shift-JIS  Korean: EUC-KR, KS C 5861  Windows propriety Encoding  CP1252, CP932, etc  iso versus iso  CJK Encoding Mess  Chinese: Big5 (Traditional), GB18030 (Simplified)  Japanese: iso-2022-JP, EUC-JP, Shift-JIS  Korean: EUC-KR, KS C 5861  Windows propriety Encoding  CP1252, CP932, etc 5 iso iso A0x65A0x176 B0x66B0x178

© Unicode  1887: Apple + Xerox  1991: Unicode Consortium  1887: Apple + Xerox  1991: Unicode Consortium 6

© Unicode  1887: Apple + Xerox  1991: Unicode Consortium  UTF-8: 1,112,064 Code Points  Standard  ASCII Compatible  Unix, Linux, Mac OS  Big Endian  1887: Apple + Xerox  1991: Unicode Consortium  UTF-8: 1,112,064 Code Points  Standard  ASCII Compatible  Unix, Linux, Mac OS  Big Endian 7

© Unicode  1887: Apple + Xerox  1991: Unicode Consortium  UTF-8: 1,112,064 Code Points  Standard  ASCII Compatible  Unix, Linux, Mac OS  Big Endian  UTF-16 (UCS-2) : 1,112,064 Code Points  Windows Only  Little Endian: Requires BOM (Bite Order Marker)  1887: Apple + Xerox  1991: Unicode Consortium  UTF-8: 1,112,064 Code Points  Standard  ASCII Compatible  Unix, Linux, Mac OS  Big Endian  UTF-16 (UCS-2) : 1,112,064 Code Points  Windows Only  Little Endian: Requires BOM (Bite Order Marker) 8

© OpenClinica and i18n  i18n Support since  OpenClinica i18n Work in Progress  Data Mart  Response OptionText  CRF Name  Discrepancy Note data passing  Escaping Ctrl Chars and MS Propriety Chars  Should detect at CRF upload  Hard-coded strings  Missing encode declaration in some Export formats  i18n Support since  OpenClinica i18n Work in Progress  Data Mart  Response OptionText  CRF Name  Discrepancy Note data passing  Escaping Ctrl Chars and MS Propriety Chars  Should detect at CRF upload  Hard-coded strings  Missing encode declaration in some Export formats 9

© Microsoft Specific issues  Display issues on Windows  Pre-Win7, GUI was not fully UTF-8 compatible  Displayed character corruption after saving data  Viewing extracted data  Use UTF-8 compatible Text Editor  Never Copy/Paste from MSOffice  Display issues on Windows  Pre-Win7, GUI was not fully UTF-8 compatible  Displayed character corruption after saving data  Viewing extracted data  Use UTF-8 compatible Text Editor  Never Copy/Paste from MSOffice 10

© Demonstration  Search Subjects and Tables  CRF and Data Entry  Discrepancy Notes  Rules  Data Import  Data Extract  Search Subjects and Tables  CRF and Data Entry  Discrepancy Notes  Rules  Data Import  Data Extract 11

© How to Localize  Documentation  documents/openclinica-and-internationalization documents/openclinica-and-internationalization  UTF-8 Converter  i18n strings needs to be Hex value   Calendar Widget can take UTF-8 strings  Pseudo Translation  Insert one distinctive non-ASCII character  Duplicate English properties files first  Search “ = “ and replace by “ = \u8a66”  Documentation  documents/openclinica-and-internationalization documents/openclinica-and-internationalization  UTF-8 Converter  i18n strings needs to be Hex value   Calendar Widget can take UTF-8 strings  Pseudo Translation  Insert one distinctive non-ASCII character  Duplicate English properties files first  Search “ = “ and replace by “ = \u8a66” 12

© How to Localize (cont.) 1. Duplicate English properties files  Exclude licensing.properties 1. Duplicate English properties files  Exclude licensing.properties 13

© How to Localize (cont.) 1. Duplicate English properties files  Exclude licensing.properties 2. Rename duplicated files to your Locale NO 1. Duplicate English properties files  Exclude licensing.properties 2. Rename duplicated files to your Locale NO 14

© How to Localize (cont.) 1. Duplicate English properties files  Exclude licensing.properties 2. Rename duplicated files to your Locale 3. Date Format  Edit format.properties file 1. Duplicate English properties files  Exclude licensing.properties 2. Rename duplicated files to your Locale 3. Date Format  Edit format.properties file 15

© How to Localize (cont.) 1. Duplicate English properties files  Exclude licensing.properties 2. Rename duplicated files to your Locale 3. Date Format  Edit format.properties file 4. Translate per GUI page  Avoids possible legacy strings  Use Text Editor that supports global search 1. Duplicate English properties files  Exclude licensing.properties 2. Rename duplicated files to your Locale 3. Date Format  Edit format.properties file 4. Translate per GUI page  Avoids possible legacy strings  Use Text Editor that supports global search 16

© Thank You! 17