Software Usability Course notes for CSI 5122 - University of Ottawa Section 5: Internationalization Timothy C. Lethbridge

Slides:



Advertisements
Similar presentations
XP New Perspectives on Microsoft Office Word 2003 Tutorial 7 1 Microsoft Office Word 2003 Tutorial 7 – Collaborating With Others and Creating Web Pages.
Advertisements

1 ADVANCED MICROSOFT POWERPOINT Lesson 5 – Using Advanced Text Features Microsoft Office 2003: Advanced.
Slide 1 Word Processing. Slide 2 What is a word processor? A word processor is a computer that you use for writing, editing and printing text. A dedicated.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Whitmore/Stevenson: Strategies for Engineering Communication 1 of 9 Paper Résumés  Use white space effectively by providing adequate margins (about one.
8 November Forms and JavaScript. Types of Inputs Radio Buttons (select one of a list) Checkbox (select as many as wanted) Text inputs (user types text)
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Internationalization of Java Platform Presenter: Ataru Nakazawa Advisor: Xiaoping Jia Date: January 23, 2004.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
1/25 Writing Character sets Unicode Input methods.
15 September How Computers Work: Other Forms of Data.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Creating Web Page Forms
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Computer Science and Software Engineering University of Wisconsin - Platteville Note 9. Internationalization Yan Shi SE 3730 / CS 5730 Lecture Notes Part.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Programming Variables. Named area in the computer memory, intended to contain values of a certain kind (integers, real numbers, characters etc.) They.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
© The McGraw-Hill Companies, 2006 Chapter 1 The first step.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Internationalization (I18N) Sufficiency Testing Presented to Seattle Area Software Quality Assurance Group June 19, 2003.
SOFTWARE INTERNATIONALIZATION Dallas Ramsden. Internationalization GOAL Software that can run ANYWHERE in the world without having the source code changed.
1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
1 Unit E-Guidelines (c) elsaddik SEG 3210 User Interface Design & Implementation Prof. Dr.-Ing. Abdulmotaleb.
Localization Michelle Johnston, Firebird Services Ltd.
 Pearson Education, Inc. All rights reserved Formatted Output.
XP Mohammad Moizuddin Creating Web Pages with HTML Tutorial 1 1 New Perspectives on Creating Web Pages With HTML Tutorial 1: Developing a Basic Web Page.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
Company Confidential 1 This presentation is solely for the use of Patni personnel. No part of it may be circulated, quoted, or reproduced for distribution.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
LESSON 2 – REVIEW QUESTIONS Casey Polera. GIVEN REVIEW QUESTIONS Q: Explain When you would use the print layout and Read Mode views: A: Print Layout –
Enter into new markets more easily Lower cost and time for development and translation Increase customer satisfaction and adoption Avoid costly mistakes.
Text and Graphics September 26, Unit 3.
Welcome! The Topic For Today Is Word Processing and Desktop Publishing.
Input, Output, and Processing
Software Usability Course notes for CSI University of Ottawa Section 7: Accessibility - Usability for the Disabled Timothy C. Lethbridge
Working with the VB IDE. Running a Program u Clicking the”start” tool begins the program u The “break” tool pauses a program in mid-execution u The “end”
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Data Representation Conversion 24/04/2017.
Data Representation, Number Systems and Base Conversions
Week 7 Lecture 2 Globalization Support in the Database.
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
UoS Libraries 2011 EndNote X5 - basic graduate session.
CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Intermediate 2 Computing Unit 2 - Software Development.
Managing Text Flow Lesson 5. Setting Page Layout The layout of a page helps communicate your message. Although the content of your document is obviously.
Understanding Character Encodings Basics of Character Encodings that all Programmers should Know. Pritam Barhate, Cofounder and CTO Mobisoft Infotech.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
MANAGING TEXT FLOW Lesson 5. OBJECTIVES SOFTWARE ORIENTATION The Page Layout tab contains groups of commands that will produce a formatted document’s.
Binary Representation in Text
Binary Representation in Text
Unit 2.6 Data Representation Lesson 2 ‒ Characters
XML QUESTIONS AND ANSWERS
Representing Characters
Trust and Culture on the Web
Fundamentals of Data Representation
Software Usability Course notes for CSI University of Ottawa
Cooper Part II Making Well-Behaved Products Different Needs
Multi-National Invoices
DESIGNING WEB INTERFACE Presented By, S.Yamuna AP/CSE 8/1/2019
Presentation transcript:

Software Usability Course notes for CSI University of Ottawa Section 5: Internationalization Timothy C. Lethbridge

Lethbridge - CSI Section 5: Internationalization2 Basic terminology Locale Set of features that can be varied depending on the language and culture of the user or the data Internationalization (I18N) The process of designing software so that it can be easily adapted to different locales Localization (L10N) The process of adapting software to a locale

Lethbridge - CSI Section 5: Internationalization3 Different aspects of locale The following can be treated somewhat separately The user ’ s preferred locale —E.g. formats for dates, times etc. The language of the UI —The system might not have a language corresponding to the user ’ s preferred locale The locale of the data —e.g. currencies, formats embedded in it

Lethbridge - CSI Section 5: Internationalization4 Names and Titles Some countries require you to specify Mr, Dr, Eng. Etc.) These titles do not necessarily translate The family name is not always last You do not always sort based on the family name In Iceland you sort based on ‘ first ’ name Salutations in letters (e.g. Dear) are different in different locales

Lethbridge - CSI Section 5: Internationalization5 Calendars The Gregorian calendar should not always be assumed Proper localization of some software requires the use (at least as an option) of calendars distinct to a culture —E.g. emperor-era calendar in Japan —Calendars of various religions where year 0 was not 2001 years ago Fiscal-year based calendars vary widely —Some have 13 months (364/28) or 53 weeks

Lethbridge - CSI Section 5: Internationalization6 Humour Generally does not translate Puns are language-specific People are sensitive to different things in different cultures —Jokes/cartoons can be offensive

Lethbridge - CSI Section 5: Internationalization7 Icons Icons that are a play on words do not translate E.g. —A tray for a server application —A rocket for launching an application —A running man for running an application —“ B ”, “ I ”, “ U ”

Lethbridge - CSI Section 5: Internationalization8 Icons … continued ‘ $ ’ does not mean ‘ money ’, but ‘ dollar ’ It implicitly means ‘ American dollar ’ Some concepts have been found extremely hard to represent as an icon E.g. Sorting ( ‘ A->Z ’ is not universal) Images of people or body parts such as hands Considered inappropriate in some cultures What skin colour do you use?

Lethbridge - CSI Section 5: Internationalization9 Language selection Avoid using national flags from which people pick their preferred language Multiple countries use the same language What order do you display languages? What language do you display languages In the language itself With a translation in the language of the operating system

Lethbridge - CSI Section 5: Internationalization10 Oral pronunciation Important for voice I/O systems Don ’ t forget to take it into account Higher recognition accuracy can be obtained by tailoring voice input to regional dialects Voice output in the wrong dialect can make an application sound ‘ foreign ’

Lethbridge - CSI Section 5: Internationalization11 Capitalization Some lowercase characters have different uppercase equivalents in different locales E.g. ’ ı ’ becomes ‘ I ’ in Turkish, whereas ‘ i ’ is capitalized with a dot on top. There is no such thing as UPPERCASE for many languages

Lethbridge - CSI Section 5: Internationalization12 Punctuation ‘ ! ’, ‘ ? ’ and ‘ # ’ are not consistently used among languages —In Spanish: ¿ … ? —‘ # ’ does not mean ‘ number ’ —In French, a space precedes a ? Use of ‘ / ’ can be confusing —Swap rows/columns/filters —Show/hide display cues —Page 1/2 vs. 1/2 page

Lethbridge - CSI Section 5: Internationalization13 Cultural references Common problems: Normal business hours Ways payments are made —Some countries require use of a PIN on a credit card Different styles of addresses

Lethbridge - CSI Section 5: Internationalization14 Language ≠ Culture English products are sold in more countries than translated products Many countries (e.g. in Africa, India) have too many different languages and accept English software

Lethbridge - CSI Section 5: Internationalization15 Language ≠ Culture (continued) A Norwegian user: May not find a product with a UI in his/her language, so will accept an English or Swedish one But will want the software to work with Norwegian data: —Currency —Language

Lethbridge - CSI Section 5: Internationalization16 Date formats Date separators depend on locale ‘ / ’, ‘ - ’, ‘. ’ Variables in document templates: ‘ am ’ and ‘ pm ’ Not universally used (many cultures use 24 hour clock)

Lethbridge - CSI Section 5: Internationalization17 Date formats continued ISO standard dates are unambiguous —yyyy-mm-dd hh:mm:ss Non ISO date means different things in different locales. —If not using ISO, then display dates in the locale of the user —Preferably use a ‘ long ’ form with the month spelled out (in the correct language) —However, the UI might not have been translated into the local language -Use the spelled-out date in the local language anyway

Lethbridge - CSI Section 5: Internationalization18 Numeric formats Depends on locale, not language of application Group separator Number of digits in a group —In English and ISO it is 3 Group separator —In English ‘, ’, but ISO uses space, and some locales use ‘. ’ or none Do you use the group separator for 1000?

Lethbridge - CSI Section 5: Internationalization19 Numeric formats (continued) Decimal separator ‘. ’, ‘. ’, ‘, ’ Negative symbol ‘ - ’, ‘ ~ ’, ‘ (…) ’ Can be positioned before or after the point May require a space between the symbol and the number

Lethbridge - CSI Section 5: Internationalization20 Currency Use the currency symbol of the data! I.e. $ doesn ’ t automatically translate to £ or € when the locale changes Format depends on the user ’ s locale, not the currency Differences in formats: —Symbol —Position (before or after the currency) —Blanks separating the symbol from the data

Lethbridge - CSI Section 5: Internationalization21 Currency, continued Different ways of expressing US$1000 —$1000 (In the US, or in Canada and the UK if the application doesn ’ t mix currencies) —US$1000 (In English Canada, if the application mixes currencies) —1000 $ (In most French locales) —1000 USD when mixing large numbers of currencies Strong currencies need decimal precision (e.g. 2 digits after the decimal point for cents)

Lethbridge - CSI Section 5: Internationalization22 Currency, continued You may have to display all data in two currencies in some locales Summing payments made over a period of time Beware that different exchange rates will have been in effect Many complex rules to do this that are highly variable

Lethbridge - CSI Section 5: Internationalization23 Paper size ‘ Letter ’ in most of the Americas; ‘ A4 ’ everywhere else Does not depend on language Poses distinct problems for generating printouts and pdf files Make sure your output can fit on both paper sizes

Lethbridge - CSI Section 5: Internationalization24 Measurements Be aware of the need to use imperial or metric units —Consider user preferences —But also understand industrial norms -Even in the US, many industries are now metric Beware of odd measurements in data —You may not want people working with multiples of 2.54cm or inch Watch out for precision loss due to repeatedly converting

Lethbridge - CSI Section 5: Internationalization25 Addresses Don ’ t rely on a fixed number of lines Don ’ t rely on a particular order of elements E.g. Street, City, Province, postal code is not universal E.g. Postal code in Canada comes after the province, but in many European countries it comes before the city

Lethbridge - CSI Section 5: Internationalization26 Addresses, continued What language should an address be written when sending mail? The language of the destination Except that the country should be written in the language of origin

Lethbridge - CSI Section 5: Internationalization27 Phone numbers Dependent on the region of the number, not on the user ’ s locale Except for the need to add an international dialing code Numbers and number formats change over time

Lethbridge - CSI Section 5: Internationalization28 Phone numbers, continued Allow for free-format numbers Keep them in the way the user entered them Allow the user to enter them free-form, including characters such as Allow for extensions in numbers Edit numbers automatically to meet needed local format Free (1-800) numbers are not international Although there are also some new international free numbers —From Canada dial

Lethbridge - CSI Section 5: Internationalization29 Sorting

Lethbridge - CSI Section 5: Internationalization30 Translatability If a string can be viewed by a user, it must be translatable! Concatenations Due to gender and number agreement, as well as the standard of order in a sentence E.g. Page number -> Numéro de page E.g. Number of pages -> Nombre de pages

Lethbridge - CSI Section 5: Internationalization31 Translatability... Expansion of text Many other languages can take at least 30% more space —Allow for this, or else the UI may have to be redesigned Narrow columns often cannot accommodate long German words

Lethbridge - CSI Section 5: Internationalization32 Translatability... The more compact the English writing, the longer the translation —‘ Telegraphic ’ style does not translate well Abbreviations may have to be expanded when translated —E.g. ‘ QTD ’ is common in financial applications (Quarter to date) —(Trimestre corrent fino ad oggi) (Italian)

Lethbridge - CSI Section 5: Internationalization33 Translatability …. Ambiguous phrases How would a translator translate the following menu items? —‘ Display options ’ -Options of the display -Show the options (all of them) —‘ Update version ’ -Change to the new version -Show the current version Expert English users will often understand these in context

Lethbridge - CSI Section 5: Internationalization34 Translatability … When you give text to translators, make sure they know for each piece of text E.g.. a menu label, menu item, group box etc … the purpose … the part of speech Noun, verb etc. —All items in a menu or set of check boxes should have the same grammatical structure

Lethbridge - CSI Section 5: Internationalization35 Design of internationalized software Create a resource file for each locale and language All strings to be displayed (except data) are taken from this file English is just one language Decisions about languages and character sets need to be made early in design

Lethbridge - CSI Section 5: Internationalization36 Design of internationalized software Special care must be taken when integrating 3rd party software May not follow the same internationalization standards as you want Memory required by the application may vary according to the language used

Lethbridge - CSI Section 5: Internationalization37 Scripts, fonts and character sets Definitions: Code point: Number representing a character Glyph: Visual appearance of a character Extended character: Anything with a code point > 128

Lethbridge - CSI Section 5: Internationalization38 Definitions, continued Special character: A term considered a bit derogatory Accented character: Character whose glyph incorporates an accent —as opposed to having an accent added when displayed Diacritic: A symbol used to modify the appearance of characters —E.g. the cedilla (ç) is a diacritic, not an accent

Lethbridge - CSI Section 5: Internationalization39 Complex scripts Scripts with many diacritics and character shapes E.g. In Arabic, characters look different depending on their position relative to others E.g. in Thai, diacritics can be stacked on top of each other several levels Also in Thai, spaces separate syllables, not words —‘ ABCD ’ ‘ AB CD ’ ‘ A BCD ’ mean different things, causing problems at line breaks

Lethbridge - CSI Section 5: Internationalization40 Scripts that do not run left-right E.g. Arabic Mirror the UI. Everything on left moves to the right etc. But watch out for images etc. Problem if the text says, the diagram in the top-right corner Text entered right-left But numbers may still be entered left-right Some languages run top-bottom

Lethbridge - CSI Section 5: Internationalization41 Large ideographic scripts E.g. Japanese, Chinese Many standards and vendor-specific implementations Use multiple bytes for each character Standard C functions 9e.g. strncpy) do not work properly and can chop off parts of characters Inter-line spacing must be larger than Latin fonts since the characters are ‘ taller ’

Lethbridge - CSI Section 5: Internationalization42 Miscellaneous problems with multilingual software Inability to enter needed text at a keyboard! Upper-casing is absent or different in different languages —Some uppercasing algorithms will translate text into garbage Open French text on a Chinese operating system: —Extended characters are displayed as Chinese characters and subsequent characters disappear!

Lethbridge - CSI Section 5: Internationalization43 Unicode Intended to display all characters in all languages Including technical symbols Allows exchange of data without people having to worry about what character set must go with it A single code-point (number) for each character Mostly complete for Western languages

Lethbridge - CSI Section 5: Internationalization44 Unicode … Incorporates basic ASCII Follows international standard ISO Has about characters now Mostly two-byte It is independent of language Language using the same symbols use the same code points in Unicode

Lethbridge - CSI Section 5: Internationalization45 Unicode issues Evolving as languages evolve Does not address sorting, font and layout Contains some ‘ private use areas ’ Has some idiosyncrasies: E.g. identical glyphs with multiple code points Some characters can be encoded as a single character or as two —E.g. Ä or A + ¨

Lethbridge - CSI Section 5: Internationalization46 Unicode code set vs. format Each character has a number, but there is more than one way to encode the numbers in data! Fixed-width UCS-4: All characters take 4 bytes. Unused bytes set to zero —e.g. US-ASCII up to 128 Causes considerable expansion of English text

Lethbridge - CSI Section 5: Internationalization47 Unicode code set vs. format Variable-width Uses from 1 to 6 bytes US ASCII encoded on 1 byte Other single-byte characters on 2 bytes Most Asian characters on 3 bytes

Lethbridge - CSI Section 5: Internationalization48 Unicode fonts MB! Some fonts build in ‘ intelligence ’ E.g. how to render text

Lethbridge - CSI Section 5: Internationalization49 Some web resources on Internationalization and Localizatioon W3C: Software Globalization: Language Automation: Do a web search and you will find tons of other resources