Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.

Slides:



Advertisements
Similar presentations
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect
Advertisements

Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect.
Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
Murray Sargent III Microsoft Corporation Text Services Group, Word Tips & Tricks on Editing and Displaying Unicode Text.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
June 2004 Adil Allawi Technical Director
OMT II Mam Saima Gul. * Static web page * a web page with contents that remain fixed and unchanged once it has been created by the author Web server Client.
QIF Hilton Head, SC. Larry Maggiano Mitutoyo America Corporation June 13, 2012 Unicode for GD&T Symbols?
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Solutions for Multilingual Literature by XSL Formatter 6,800 known languages.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
1/25 Writing Character sets Unicode Input methods.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Developing a Basic Web Page with HTML
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using XML Parsers and Unicode Ellen Pearlman Eileen Mullin Programming.
Glencoe Digital Communication Tools Create a Web Page with HTML Chapter Contents Lesson 4.1Lesson 4.1 Get Started with HTML (85) Lesson 4.2Lesson 4.2 Format.
Overview of Search Engines
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
Moving a Large Scale University to Unicode Elizabeth J. Pyatt, Ph.D. Teaching and Learning with Technology Penn State University
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
Expression Web 2 Concepts and Techniques Expression Web Design Feature Web Design Basics.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Problemsolving 2 Problem Solving: Designing a website solution Identifying how a solution will function Taking into account the technical constraints a.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
ASCII and Unicode.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Arabization of Computer Systems نظم تعريب الحاسب Abdelkarim Abdelkader
Week 4 Number Systems.
October 2005CSA3180: Text Processing I1 CSA3180: Natural Language Processing Text Processing 1 Language Encoding Issues Common Corpora Handling Large Document.
Spring /6.831 User Interface Design and Implementation1 Lecture 22: Internationalization.
XP 1 HTML: The Language of the Web A Web page is a text file written in a language called Hypertext Markup Language. A markup language is a language that.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
B.Sc. Multimedia ComputingMedia Technologies Character Representation & Font Technology.
1 An ICU Library Supporting the Display of Complex Text Eric Mader Globalization Center of Competency, Cupertino, CA.
Creating User Interfaces [Catch up presentations]. Language. Localization. Homework: Work on teaching projects. Post comments on source for localization,
PRACTICAL ISSUES AND GUIDELINES FOR INTERNATIONAL INFORMATION DISPLAY NURAY AYKIN AND ALLEN MILEWSKI, AUTHORS Misha Jameson.
The Internet and the World Wide Web. The Internet A Network is a collection of computers and devices that are connected together. The Internet is a worldwide.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
XML – Tools and Trends Schematron Tim Bornholtz Session 55.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Document Formats How to Build a Digital Library Ian H. Witten and David Bainbridge.
Introduction to Interactive Media Interactive Media Components: Text.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
UNICODE & Indic Scripts
XML stands for Extensible Mark-up Language XML is a mark-up language much like HTML XML was designed to carry data, not to display data XML tags are not.
Formatting Tags. HTML Page Structure Demo Page Aloha, this is a demo page.
Microsoft Expression Web 3 Expression Web Design Feature Web Design Basics.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
DATA REPRESENTATION 4 Y. Colette Lemard February 2009.
Formatting Tags. HTML Page Structure Demo Page Aloha, this is a demo page.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
Assistive Technology for Information Access (Visual Impairments) UNDERSTANDING ACCESSIBLE FORMATS.
1 Non-Numeric Data Representation V1.0 (22/10/2005)
INTERNATIONALIZATION
TOPICS Information Representation Characters and Images
Web Programming– UFCFB Lecture 9
Text.
An Introduction to HTML Pages
Web Programming– UFCFB Lecture 9
And Mobile Web Browsers
ASCII and Unicode.
Presentation transcript:

Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics

Finish presentations Everyone post constructive comments on at least 2 other projects. (Note: catch up on other postings.)

Many, interconnected issues Create web site for use in several specific 'local' places. Create multiple web sites, each for use in specific place. –in an efficient, effective manner so any underlying common content does not need to be duplicated (and commonality diluted). Develop tools (networking s/w, standards, etc.) that promote Web as "global, interoperable tool of communication" –

Localization not just language –language is not just character code –UCS (universal character set) and UNICODE, many, many related standards to address encoding issues. dates –local date and also way to express 'western' date time money position on and flow across page acceptable images, photography, icons ?

Character code Note: European languages plus several other 'small' alphabets easily handled. We/I (typical monolingual American) can't hardly appreciate the challenge: –two Chinese (kanji) character sets: modern (China) and traditional (Taiwan + most of the Chinese diaspora) –'ruby': symbols 'over' ideographs

character repertoire: A set of distinct characters. character code: A mapping, often presented in tabular form, which defines a one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers. character encoding: A method (algorithm) for presenting characters in digital form by mapping sequences of code numbers of characters into sequences of octets. In the simplest case, each character is mapped to an integer in the range according to a character code and these are used as such as octets. Naturally, this only works for character repertoires with at most 256 characters. For larger sets, more complicated encodings are needed. Encodings have names, which can be registered.

charset Using the terms just defined, the charset attribute in an HTML meta tag means encoding

Language Attribute of html tag MAY be used by browsers (spell-check, hyphenation, speech synthesizers), search engines, other tools. See two-letter codes:

… more A glyph is a presentation of a particular shape which a character may have when rendered or displayed. –speak of same glyph in italic, bold, etc. A repertoire of glyphs comprises a font. In a more technical sense, as the implementation of a font, a font is a numbered set of glyphs. The numbers correspond to code positions of the characters (presented by the glyphs). Thus, a font in that sense is character code dependent. An expression like "Unicode font" refers to such issues and does not imply that the font contains glyphs for all Unicode characters.

Examples ASCII is a character repertoire, code and encoding. Note: confusion about 7 vs 8 bit ASCII ISO Latin 1 alias ISO standard defines a repertoire, code and encoding of which ASCII is a subset. ISO 8859 is a family of many encodings, indicated by the –n. ISO handles Cyrillic.

Unicode … provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. This is the goal. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.

Note Unicode goal is universal coverage… Unicode is product of a consortium of 'mostly US companies'. Some controversy in its treatment of things –Combining certain kanji characters

Unicode consortium Go to Unicode.html Examine the Translations on the left. See what language characters do not appear on your computer. –Select one and –Go to Display Problems and see if you can fix it.

XML progress XML 1.0 to XML 1.1 Issue: complaint that new standard had features to suit IBM The IBM-specific problem that XML 1.1 aims to fix has to do with a special character that designates to IBM mainframe systems the end of a line of text. XML 1.0 chokes on that character, but version 1.1 would recognize it. –ZDNet News: html

Techniques One web site / screen provide options to go to different pages –use symbols/icons that are meaningful to audience tricky. Flags may not be appropriate. –use images containing text in the specific language –risky choice: hope that computer/platform/browser has character encoding and font to display language –poor choice: use English word for other language. Example of company/site supporting 'global reach'.

quiz What is the word in that language for –Spanish –Chinese (Mandarin? Hainese?) –Korean –Japanese –Hebrew –Russian –French –Finnish –Arabic (Classical?, ?) –Hindi (Urdu?, ?) What is the direction of text? What is the format for dates? Time? Money?, relevant cultural issues?

Homework Next: Accessibility discussion, exercises Prepare –download Instant Saxon: standalone translator for xml and xslt. –download Nokia Mobile Internet Toolkit. Need to register (no costs). –register with studio.tellme.com