CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Beyond Text Representation Building on Unicode to Implement a Multilingual Text Analysis Framework Thomas Hampp – IBM Germany Content Management Development.
Murray Sargent III Microsoft Corporation Text Services Group, Word Tips & Tricks on Editing and Displaying Unicode Text.
Chris Pratley Lead Program Manager Microsoft Office.
Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
DICOM INTERNATIONAL CONFERENCE & SEMINAR Oct 9-11, 2010 Rio de Janeiro, Brazil Building a DICOM Library in C# Victor Derks GE Healthcare.
Anti-Virus Product Development Cliff Penton Head of Software Development Sophos Plc Slides © 1999 Sophos Plc
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
מבנה מחשב תרגול 2 ייצוג תווים בחומרה. A programmer that doesn’t care about characters encoding in not much better than a medical doctor who doesn’t believe.
Data Representation Kieran Mathieson. Outline Digital constraints Data types Integer Real Character Boolean Memory address.
15 September How Computers Work: Other Forms of Data.
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
COMPUTER FUNDAMENTALS David Samuel Bhatti
26 April 2001 Unicode and Windows XP, IUC 18 (Hong Kong) Unicode and Windows XP Cathy Wissink Program Manager, Globalization Windows Division Microsoft.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
Alexey Miroshnikov © Copyright InfoStroy Ltd., 2013.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky Veronika.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Representing text Each of different symbol on the text (alphabet letter) is assigned a unique bit patterns the text is then representing as.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 10 This presentation © 2004, MacAvon Media Productions Characters & Fonts.
Agenda Data Representation – Characters Encoding Schemes ASCII
Data Usually computing systems are complex devices, dealing with a vast array of information categories.
Lecture 2 Character Codes and Low-Structure Text Document Formats.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
1 Unit E-Guidelines (c) elsaddik SEG 3210 User Interface Design & Implementation Prof. Dr.-Ing. Abdulmotaleb.
Localization Michelle Johnston, Firebird Services Ltd.
HTML (HyperText Markup Language)
Globalisation & Computer systems Week 4 writing systems and their implications for globalisation character representation ASCII extended ASCII code pages.
Web page - A Web page is a simple text file that contains a set of HTML tags (code) that describe (to the browser) what should go on a web page. It may.
IBM Globalization Center of Competency © 2006 IBM Corporation IUC 29, Burlingame, CAMarch 2006 Automatic Character Set Recognition Eric Mader, IBM Andy.
Starting your course from scratch January Outline Should already know Moodle basics Layout best practice Moodle course formats Using blocks Key.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Character representation.
Text and Graphics September 26, Unit 3.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
User Documentation. User documentation  Is needed to help people (the users) understand how to use a computer system or software application, such as.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Anlab ( ) Kim, Yangjung Characters & Fonts.
Data Representation, Number Systems and Base Conversions
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
17-Mar-16 Characters and Strings. 2 Characters In Java, a char is a primitive type that can hold one single character A character can be: A letter or.
Basics of Unicode (base upon a presentation by NRSI, SIL International)
1.4 Representation of data in computer systems Character.
Lecture Coding Schemes. Representing Data English language uses 26 symbols to represent an idea Different sets of bit patterns have been designed to represent.
1 Non-Numeric Data Representation V1.0 (22/10/2005)
DATA REPRESENTATION - TEXT
Essential Skills for Computing Fonts
Binary Representation in Text
Binary Representation in Text
Conversion of information in different coding systems
Unit 2.6 Data Representation Lesson 2 ‒ Characters
Characters & Fonts Digital Multimedia, 2nd edition
Representing Information as bit patterns
TOPICS Information Representation Characters and Images
Representing Characters
Text.
Lecture 2 Data representation
Characters & Fonts Digital Multimedia, 2nd edition
INFOCODING BASICS & EXAMPLES OF CURRENT USE
ASCII and Unicode.
Presentation transcript:

CIT3611 Software i18n Wk 4: Code sets, Online Help, Prototyping David Tuffley School of Computing & IT Griffith University

CIT3611 Week 5: Code sets2 Internationalisation - Basic Rules n Never hard-code translatable text n Do not reuse the same string in different context n 1 byte 1 character 1 glyph n Watch for strings with several parameters

CIT3611 Week 5: Code sets3 Internationalisation - Goals Making sure: n Your application is able to process text from any locale n The interface can be localised without changes in the source code n The documents or data created by your application are easy to localise

CIT3611 Week 5: Code sets4 Internationalisation - Code Sets Character set is like a "bag" of characters. Example: A, B, d, ñ n Code set, coded character set or code-page, is the same as the character set, but a specific value, the code (or code-point) affects each character. Example: A=65, B=66, d=100, ñ=241

CIT3611 Week 5: Code sets5 Code Sets - Get Your Facts Straight n The vocabulary pertaining to code sets is often used incorrectly. n The terms code set and code page are interchangeable. n Microsoft documentation is confusing regarding code sets. n Nadine Kano's book helps

CIT3611 Week 5: Code sets6 ANSI Windows not the real ANSI n The first version of Windows used ISO (Latin-1) for code set. Then Microsoft introduced 24 extra characters (codes from 0x80 to 0x9F) that are not part of Latin-1. n Noticeable in some of the fonts still shipped with Windows: MS Sans Serif has no glyph defined for these code-points. The code set for Windows US should be called Windows Latin- 1, or code-page 1252.

CIT3611 Week 5: Code sets7 "ANSI" not "Windows code set" n Some documents name the Windows code set "ANSI" even if when you use it in a different localised version of Windows, it is actually the Windows Cyrillic, or Windows Greek or Windows Turkish code set. n Same way the document uses "OEM" to refer to the DOS code-page, it should use a generic term for the Windows code set, rather than "ANSI."

CIT3611 Week 5: Code sets8 Don’t use ‘character sets’ or ‘charsets’ when you mean code sets n Code set is an implementation of the character set n Several code sets can implement the same character sets. In this case, the list of the characters supported is the same, but the codes are different. Eg. UCS-2 and UTF-8 are two different code sets, but they both implement the Unicode character set.

CIT3611 Week 5: Code sets9 Don’t mix up file format and file code set n People mix up the content and the container: the format of the file and its code set. They will say: "I saved this file in ASCII" when they really mean "I saved this file in Plain text." A plain text file could be in ASCII, but can also contain extended characters.

CIT3611 Week 5: Code sets10 Code Set - Families n DOS n ISO n Macintosh n Windows n IBM mainframe

CIT3611 Week 5: Code sets11 Code Sets - Unicode n Unicode an international character set n Has the principal scripts of the world n Unicode standard is foundation for the internationalisation and localisation of software n There are three levels of support for Unicode: 1: Combining characters not allowed 2: Avoid duplicate coded representations 3: All combining characters are allowed

CIT3611 Week 5: Code sets12 Han unification n To fit the tens of thousands of Chinese, Japanese and Korean ideograms in a 64-KByte space, Unicode uses the Han unification: where Japanese and Korean characters are derived from the Chinese characters. n In many cases the same symbol will mean the same thing.

CIT3611 Week 5: Code sets13 Character Composition n To support complex characters with diacritics, Unicode defines a generic way to encode a complex character. Instead of being coded in whole form, you can code any character with diacritics by using non-spacing marks. n Character composition is used, for example, to encode the Vietnamese characters.

CIT3611 Week 5: Code sets14 Surrogates n Hopefully you will not have to deal with surrogates. They are the mechanism put in place in Unicode to access the additional planes of ISO You can see them as "double- bytes," except they are double-wide-chars.

CIT3611 Week 5: Code sets15 Code Sets - Conversion n Converting from one code set to another is easy when you are only dealing with single-byte code sets.

CIT3611 Week 5: Code sets16 Screen-based help n plain text "Read Me" files, n tutorial files, n custom integrated help, n sample files and n stand-alone hypertext help.

CIT3611 Week 5: Code sets17 General Guidelines n Text Expansion n Jargon, Humor, Use of Gender- or Culture- Related Roles, Characteristics, or Issues n Consistency with Software, Hardware, and Documentation n Hypertext Links n Text Styles and Formatting

CIT3611 Week 5: Code sets18 General Guidelines cont. n On-Screen Controls n File Format

CIT3611 Week 5: Code sets19 Windows Online Help n "Title" Footnote Text n "Keyword List" Footnote Text n Definitions (Pop-up Topics)

CIT3611 Week 5: Code sets20 Prototyping the key to success n Effective prototyping may be the most valuable core competence an innovative organisation can hope to have (Michael Schreg) n ‘Spec Driven’ put much effort into developing a specification before proceding with production n ‘Prototype Driven’ begin with an early prototype, then proceed with many iterations

CIT3611 Week 5: Code sets21 Prototyping the essential medium of: n Information transmission n Interaction n Integration n Collaboration

CIT3611 Week 5: Code sets22 Work as play, play as work n You can ‘play your way’ to successful, innovative product development n At odds with traditional management models that champion predictability and control

CIT3611 Week 5: Code sets23 Supported by research n Research by Tabrizi & Eisenhart (Stanford) looked at 72 product dev projects in 36 countries in Asia, Nth America and Europe n Most effective were those that iterated constantly n Least were the hyper-organised, plan, plan planners n Strong prototyping cultures therefore produce strong products