AITI Tutorial: Internationalization Coding for the world MIT AITI July NNth, 2005.

Slides:



Advertisements
Similar presentations
IT253: Computer Organization Lecture 6: Assembly Language and MIPS: Programming Tonga Institute of Higher Education.
Advertisements

By Yuri Tijerino Kwansei Gakuin University
Internationalization of Java Platform Presenter: Ataru Nakazawa Advisor: Xiaoping Jia Date: January 23, 2004.
מבנה מחשב תרגול 2 ייצוג תווים בחומרה. A programmer that doesn’t care about characters encoding in not much better than a medical doctor who doesn’t believe.
Object-oriented Programming in Java. © Aptech Ltd. Internationalization and Localization/Session 12 2  Describe internationalization  Describe localization.
CMSC 341 Introduction to Java Based on tutorial by Rebecca Hasti at
Decisions in Python Comparing Strings – ASCII History.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Introduction to Computing Using Python Chapter 6  Encoding of String Characters  Randomness and Random Sampling.
“Introduction to Programming With Java”
CIS3023: Programming Fundamentals for CIS Majors II Summer 2010 Ganesh Viswanathan Graphical User Interface (GUI) Design using Swing Course Lecture Slides.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Prepared by Uzma Hashmi Instructor Information Uzma Hashmi Office: B# 7/ R# address: Group Addresses Post message:
Introduction to Programming Prof. Rommel Anthony Palomino Department of Computer Science and Information Technology Spring 2011.
Sakai: Localization & Internationalization Beth Kirschner University of Michigan
SOFTWARE INTERNATIONALIZATION Dallas Ramsden. Internationalization GOAL Software that can run ANYWHERE in the world without having the source code changed.
Agenda Data Representation – Characters Encoding Schemes ASCII
Internationalization and the Java Stack Part 1 Matt Wheeler.
Sadegh Aliakbary. Copyright ©2014 JAVACUP.IRJAVACUP.IR All rights reserved. Redistribution of JAVACUP contents is not prohibited if JAVACUP.
Basics Programming Concepts. Basics A computer program is a set of instructions to tell a computer what to do Machine language = circuit level language.
COMP 110 Spring Announcements Computers in class on Friday: Lab Office Hours: Monday 12-2 New students see me after class Administrative Changes.
COMP 110: Introduction to Programming Tyler Johnson January 14, 2009 MWF 11:00AM-12:15PM Sitterson 014.
Internationalization and the Java Stack Matt Wheeler.
 2003 Joel C. Adams. All Rights Reserved. Calvin CollegeDept of Computer Science(1/16) Internationalization and Locales Joel Adams and Jeremy Frens Calvin.
System development with Java Lecture 2. Rina Errors A program can have three types of errors: Syntax and semantic errors – called.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
Introduction to Java Thanks to Dan Lunney (SHS). Java Basics File names The “main” method Output to screen Escape Sequence – Special Characters format()
The string data type String. String (in general) A string is a sequence of characters enclosed between the double quotes "..." Example: Each character.
Internationalization in the Java Stack Matt Wheeler.
Basic Java Programming CSCI 392 Week Two. Stuff that is the same as C++ for loops and while loops for (int i=0; i
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
CS 11 java track: lecture 1 Administrivia need a CS cluster account cgi-bin/sysadmin/account_request.cgi need to know UNIX
JAVA Tokens. Introduction A token is an individual element in a program. More than one token can appear in a single line separated by white spaces.
© 2012 Pearson Education, Inc. All rights reserved. 1-1 Why Java? Needed program portability – Program written in a language that would run on various.
Programming Concept Chapter I Introduction to Java Programming.
Java means Coffee Java Coffee Beans The name “JAVA” was taken from a cup of coffee.
Liang, Introduction to Java Programming, Fifth Edition, (c) 2005 Pearson Education, Inc. All rights reserved Chapter 26 Internationalization.
Chapter 12: Internationalization Processing Date and Time Processing Date and Time  Locale  Date  TimeZone  Calendar and GregorianCalendar  DateFormat.
Netprog: Java Intro1 Crash Course in Java. Netprog: Java Intro2 Why Java? Network Programming in Java is very different than in C/C++ –much more language.
 JAVA Compilation and Interpretation  JAVA Platform Independence  Building First JAVA Program  Escapes Sequences  Display text with printf  Data.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Introduction to Programming
Strings in MIPS. Chapter 2 — Instructions: Language of the Computer — 2 Character Data Byte-encoded character sets – ASCII: 128 characters 95 graphic,
Chapter 1: Introducing JAVA. 2 Introduction Why JAVA Applets and web applications Very rich GUI libraries Portability (machine independence) A real Object.
Core Java Introduction Byju Veedu Ness Technologies httpdownload.oracle.com/javase/tutorial/getStarted/intro/definition.html.
Introduction to Computing Concepts Note Set 12. Writing a Program from Scratch public class SampleProgram1 { public static void main (String [] args)
1 Chapter 20 Internationalization. 2 Objectives F To describe Java's internationalization features (§ 20.1). F To construct a locale with language, country,
© 2012 Pearson Education, Inc. All rights reserved types of Java programs Application – Stand-alone program (run without a web browser) – Relaxed.
Internationalization Slide 1©SoftMoore Consulting.
Understanding Character Encodings Basics of Character Encodings that all Programmers should Know. Pritam Barhate, Cofounder and CTO Mobisoft Infotech.
Agenda Comments Identifiers Keywords Syntax and Symentics Indentation Variables Datatype Operator.
Aside: Running Supplied *.java Programs Just double clicking on a *.java file may not be too useful! 1.In Eclipse, create a project for this program or.
A data type in a programming language is a set of data with values having predefined characteristics.data The language usually specifies:  the range.
Computer Science A 1. Course plan Introduction to programming Basic concepts of typical programming languages. Tools: compiler, editor, integrated editor,
Computer Science 209 Software Development Packages.
Review by Mr. Maasz, Summary of Chapter 2: Starting Out with Java.
Binary Representation in Text
Binary Representation in Text
Unit 2.6 Data Representation Lesson 2 ‒ Characters
INTERNATIONALIZATION
Internationalization
Introduction to Java Programming
An Introduction to Java – Part I, language basics
Chapter 35 Internationalization
INFOCODING BASICS & EXAMPLES OF CURRENT USE
Introductory Java Programming
Lecture 22: Number Systems
Computer Programming-1 CSC 111
Presentation transcript:

AITI Tutorial: Internationalization Coding for the world MIT AITI July NNth, 2005

What is Internationalization? Internationalization:  Designing applications to easily support different languages and regions.  Abbreviated as “I18N”.  (There are 18 letters between ‘I’ and ‘N’). Localization:  Adapting software to a specific region.  Abbreviated as “L10N”.

Why do we care? Translation: Don’t want to have to search through many files for words to translate. Date Formats: Is “7/6/5” July 6th, 2005, June 7th, 2005, or June 5th, 2007? Currency Formats: Is nine thousand dollars 9 000,00, 9.000,00, or $9,000.00? Non-Latin Characters:  Spanish: ¡Viva España!  Chinese: 早晨好  Arabic: هـ - الموافق

Properties of I18N. Same executable can be run worldwide with different local data. Text elements are not hard-coded. Should not have to recompile to add new languages. Dates and currencies stored in region independent format. Localizes easily.

Example Non-I18N Program public class NotI18N { static public void main(String[] args){ System.out.println("Hello"); System.out.println(”Thank you"); } What if we want to ship this software to 70 different countries?

Locales Locales: Objects that identify a particular language and region. Locale(String country, String lang); Static Locales: Locale.US, Locale.Japan, Locale.UK, Locale.PRC Two-letter country and language codes. Locale swahiliKenya; swahiliKenya = new Locale(“sw”, “KE”); Locale arabicIraq; arabicIraq = new Locale(“ar”, “IQ”);

Check for Supported Locales Not every Locale will be supported. Can check which Locales are available: import java.util.*; import java.text.DateFormat; public class Available { static public void main(String[ args) { Locale list[]; list = DateFormat.getAvailableLocales(); for (int i = 0; i < list.length; i++) System.out.println(list[i].toString()); }

Resource Bundles We want to isolate Locale-specifc data, like text strings. Resource Bundle:  Look up Locale-specific objects with a key.  ListResourceBundle: 2D key/value array.  PropertyResourceBundle: Flat text file. We’ll deal with plain text properties files: Look up with a string, get back a string. If you need to look up an object, you’d use a ListResourceBundle.

Properties Files Example properties files: # Labels.properties hello = Hello thanks = Thank You # Labels_sw.properties hello = Jambo thanks = Asante # Labels_es.properties hello = Hola thanks = Gracias

Creating Resource Bundles ResourceBundles are created by giving a base name and optionally a locale. ResourceBundle labels = ResourceBundle.getBundle (”Labels", currentLocale); If currentLocale is “sw_KE” and default is “en_US”, it will search files in this order: 1. Labels_sw_KE.properties 2. Labels_sw.properties 3. Labels_en_US.properties 4. Labels_en.properties 5. Labels.properties

Using Resource Bundles static void printMessages(Locale currentLocale) { ResourceBundle labels = ResourceBundle.getBundle ("Labels", currentLocale); System.out.println ("Current Locale is " + currentLocale.getDisplayName()); System.out.println (labels.getString("hello")); System.out.println (labels.getString("thanks")); }

What about China? A few billion people do not use the Latin alphabet. But your keyboard is likely to use it. How do we type Chinese, Japanese, Arabic, Thai, Cyrillic, etc., characters in our properties files?

Character Representation Characters are often represented by fixed-width, 8-bit bytes, esp. C/C++. This only allows for 256 characters. Unicode: Character encoding that supports 1,114,112 different symbols. Can represent any Unicode characters with 3-bytes. Java has default Unicode support.

Ethiopic Unicode Characters

Many Unicode Formats Most Unicode characters are rarely used. Programmers don’t want to waste space with 3-byte representations. There are many different ways to represent Unicode characters. Official: UTF-8, UTF-16, UTF-32. Unofficial: UCS-2, UCS-4. Java uses UCS-2 (very close to UTF-16). (We can mostly ignore these details.)

Using Unicode in Java Unicode characters can be represented using regular plaintext. Characters are represented as ‘\uNNNN’. 4-digit character codes can be found at: Encoding the character ‘©’:  The Unicode value for ‘©’ is 00A9 in hex (169).  String str = "\u00A9";  char c = '\u00A9'; Need GUI or terminal that supports Unicode.

Unicode Demo in Swing import javax.swing.*; public class UnicodeDemo extends JFrame { public static void main(String[] Args) { UnicodeDemo app = new UnicodeDemo(); app.setSize(100,100); JLabel label = new JLabel("Copyright \u00A9 2005", JLabel.CENTER); app.getContentPane().add(label); app.setTitle("Unicode Demo"); app.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); app.setVisible(true); }

Demo Output

Pop Quiz: Review Terms Internationalization (I18N) Localization (L10N) Locales ResourceBundles Properties Files Unicode UCS-2

For More Information Online tutorial with example code: