Download presentation
Presentation is loading. Please wait.
Published byMalcolm Jenkins Modified over 9 years ago
1
LRC-XI-11th Annual Internationalisation and Localisation Conference
A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering Vishwakarma Institute of Technology, Pune, India Organised By: Localisation Research Centre (LRC), Department of Computer Science and Information Systems (CSIS), University of Limerick,Limerick,Ireland.
2
Agenda Introduction System Design
Why Web Page Localisation? Borderless Integration Why Multilingual Web Sites? What is Locale and multi-locale Operation? Internationalisation and Key Challenges I18n Standard: Important Issues and Business Context Variance : Regional and Cultural Issues System Design Web Localisation and Rural India Localization Approaches Architecture of Servers System Implementation and Test Results Configuration of Server Localisation Test Results Alternative Approach Conclusion References
3
Why Web Page Localisation?
International Market and Customers Service Sector Web Localisation Online Business Internet Increased Sales Leads Advantage of Global growth Reduce Marketing Costs Information Repository Banking Sector Open Linguistic Barriers Closed Linguistic Barriers Objective Information Convenience
4
Borderless Integration
Model Business Process Local Business Entities Customer Integration Logic Resource Mapping Global Global Integration Deployment Business Logic Market Research Analyse Optimize Process Internet Framework
5
Why Multilingual Websites?
Over 100 million people access the Internet in a language other than English. Over 50% of web users speak native language other than English According to Forrester research, 50% of all online sales are expected to occur outside USA. Web users are four times more likely to purchase from a site that communicates in the customer’s native language. “Your website is your window to the world…”
6
Basic Terminology Locale Internationalisation Localisation
Set of features that can be varied depending on the language and culture of the user or the data Internationalisation The process of designing software so that it can be easily adapted to different locales Localisation The process of adapting software to a locale
7
What is Locale? A locale is an abstraction: a data processing structure that identifies a collection of culturally and linguistically affected preferences. Java locales are associated with upwards of 300 pieces of data time zone names collation sequences the infinity symbol Number formats Days of the week Locales generally do not contain this data themselves. They represent a way of obtaining “localized behavior” in the system. Locales are generally part of the programming context or environment.
8
Multi-Locale Operation
Server Processes Message Passing Logic Execution Client Locale System Context Context Separation Message Passing Logic Execution Client Locale Design Policy APIs provide late binding localisation
9
Internationalisation
"I18n" is an abbreviation for the word "Internationalisation". The term "i18n" is derived from its spelling as the letter "i" plus 18 letters plus the letter "n". I+n1t2e3r4n5a6t7i8o9n10a11l12i13s14a15t16i17o18+n The extension of this naming convention to the terms Localisation (l10n), Europeanisation (e13n), Japanisation (j10n), Globalisation (g11n), seemed to come somewhat after the invention of "i18n". Potentially handle multiple languages, customs in the world Displaying/ Inputting characters for the users' native languages. Handling popular encoding for the users' native languages. Native characters for file names and other items. Character classification & sorting. Typesetting and hyphenation rules.
10
Locale and Parameterisation
Key Challenges Unicode support and implementation Use of language specific encoding Configuring encoding Encoding and Character Set Availability, Performance Continuity of i18n features Translation Locale and Parameterisation UI design Handling collation Migration of existing data Presentation, Processing Standards Data Correspondence Reference Information
11
Important Issues in I18n Currency Language rules UI preferences
Localization Culture context Date/Time Character encodings Business impact Content management
12
Internationalisation
Business Context of I18n Internationalisation Old Application New Product To improve effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces To reach out to global customer base by providing language/culture specific interfaces and allow for international preferences. Mergers / Acquisitions. To consolidate same functionality application/service developed and maintained separately for separate language/region. To support region specific functionality (due to legal aspects, financial practice etc.). To provide region specific value added services (like UI, look and feel, Sorting/Searching). Service Existing
13
Regional and Cultural Differences
Software solutions should be designed to fit into the cultural context of the user Examples Naming of the product Differences in the meanings of jargons Confusing graphical symbols National rules, conventions Religious beliefs and assumptions Basic cultural values and customs No appropriate translations available for phrases and slogans Favorite sports and slangs cultural anachronisms Reading left-to-right, top-to-bottom etc… Cultural context: far more than just language and notational conventions…basic language to slang, national conventions etc. Most companies outsource localization of their products to the local market to fit into the cultural context of the user.
14
Language and Character Encoding
Language peculiarities Hyphenation Collation Spelling Transliteration English: ABC...RSTUVWXYZ German: AÄB...NOÖ...SßTUÜV…YZ Swedish/Finnish: AB...STUVWXYZÅÄÖ Norwegian: AB…VWXYÜZÆØÅ There are various “standards” and they are varied for different languages ISO standards: ISO ,2,3,4,5,6,7, Windows-1252 Chinese encodings: Big5, Big5-HKCS, GB18030, GB2312 Japanese and Korean: EUC-JP,EUC-KR, ISO-2022-JP, ISO-2022-KR
15
Unicode Character Standard
Developed by the Unicode Consortium Covers all major living scripts Version 4.0 has 96,000+ characters Capacity for 1 million+ characters Unicode Character Set = ISO 10646 Unicode adds character properties and algorithms ISO and Unicode work together to synchronize ISO support enhances international acceptance
16
Date / Time Formats Variance
Locale Example Format U. S. A. 2/16/05 mdy, / France dmy, . dmy, - CJKT 2005/2/16 ymd, / Japan 17/2/16 ¥md, / Hour minute separators,AM,PM,TimeZone India : 4:00 P.M. U.S.A. : 4:00 p.m. France : 16.00 Japan : 1600 Japan : 4:00 Dates and Times use different formats around the world. In Japan, if AM (Gozen) or PM (Gogo) are used, they are positioned before the time not after it. There is more information on Japanese Emperor dates at
17
Numbers / Currency Variance
Varieties in group and fractional separators India : 12,34,567.89 England : 12,345.67 Germany : ,67 Switzerland: 12’345,67 Swiss money: 12’345.67 France : ,67 Varieties in symbol placement, symbol length, precision, number width, rounding rules India : Rs. 12,34, ; Re. 1 U.S.A : US $1,234,567.89 France : ,67 € Portuguese : $34ESC Portuguese : $34€
18
System Design
19
Indian Languages Profile
20
Percentage Languages Usage Index
Data Source : 2001 Census of India Number Percentage Hindi 337,272,114 40.22% Bengali 69,595,738 8.30% Telugu 66,017,615 7.87% Marathi 62,481,681 7.45% Tamil 53,006,368 6.32% Urdu 43,406,932 5.18% Gujarati 40,673,814 4.85% Kannada 32,753,676 3.91% Malayalam 30,377,176 3.62% Oriya 28,061,313 3.35% Punjabi 23,378,744 2.79% Assamese 13,079,696 1.56% Sindhi 2,122,848 0.25% Nepali 2,076,645 Konkani 1,760,607 0.21% Manipuri 1,270,216 0.15% Kashmiri 56,693 0.01% Sanskrit 49,736 Other Languages 31,142,376 3.71% Total : 838,583,988 100.00% Language
21
Indian Currency Example
Indian Currency (Value Rs. 10) Population resides in villages of India : 70% Total number of Languages in India : 40 Official Languages : 22 Language Panel Overall Literacy Rate : % English Language Literacy : % 15 major Indian Languages
22
Information Channelisation
Internationalisation Prepare material for localisation (account for text expansion, avoid embedded text..) Text Extraction Extract text from source Files (graphics, PDFs etc.) Translation Translate content from Extracted materials Localisation Replace graphics, change colors, redesign layout to accommodate target culture.
23
Web page is “dynamically” converted into target language
Localisation Process Web page is “dynamically” converted into target language Language selection Static web page is selected and displayed Translation Localisation Site Acceptance Factors Color Image Representation Translation Errors Text Placement in Separate File Late Binding Mapping Techniques
24
Server Architecture Client Browser_1 S O C K E T Client A Browser_2 P
Default Alternative Language Response Localised Content Client Browser_2 Parse Request Module HTML Server Client Browser_3 Client Browser_n Property File
25
Implementation: Parse Request Module
Definition To parse the request header Responsibilities To analyze and forward the request Provide log to the administrator Compositions Main server loop Threads Interfaces/Ports Socket APIs
26
Parse Request Module Architecture
Main Server Loop Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread n
27
HTML Server Definition Responsibilities Compositions Interfaces/Ports
Default implementation of HTTP protocol Processes static HTML requests Responsibilities Process static HTML request Process dynamic Internationalisation request Compositions Server Processes Interfaces/Ports Socket APIs
28
HTML Server Architecture
Parse Protocol GET/POST Default Language Alternative Static Response GET Request Processor POST Request Processor .properties
29
System Implementation and Test Results
30
Java Support for Internationalisation
The Locale class lets applications identify locales, allowing for truly multilingual applications. The ResourceBundle class provides the foundation for localisation, including localization for multiple locales in a single application container. The Date, Calendar, and TimeZone classes provide the basis for time handling around the globe. The String and Character classes as well as the java.text package contain rich functionality for text processing, formatting, and parsing. Text stream input and output classes support converting text between Unicode and other character encoding.
31
Conversion Process Character conversion is a pretty straightforward process as long as there is a one-to-one mapping between sequences of Unicode characters on one side and sequences of bytes in another encoding on the other side, and the input only consists of characters or bytes that have mappings. The reality is : A single character in a non-Unicode encoding may have multiple equivalent representations (say, a precomposed character and a sequence of base character and combining mark). A character in one encoding may not have an equivalent in the other encoding. An invalid sequence of bytes or characters may show up in the input.
32
Process: Configure Server
33
Process: Register
34
Process: Log
35
Process: Localise Servlet
36
Web Page in English with IE
37
Web Page in Spanish with IE
38
Web Page in Dutch with IE
39
Web Page in French with IE
40
Web Page in Italian with IE
41
Web Page in Portuguese with IE
42
Web Page in German with IE
43
Web Page in English with IE
44
Web Page in Marathi with IE
45
Conclusion The Java Localisation API`s come in handy to dynamically localise the web page into alternative languages The rich set of Java class libraries such as java.util.ResourceBundle and java.util.Locale provide an efficient approach to work with locale specific information More manageable workspace for users in native language Regional Settings, Colour, Image representation not disturbed Improves effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces Supports region specific functionality (due to legal aspects, financial practice etc.). Provides region specific value added services (like UI, look and feel, Sorting/Searching). consolidate same functionality application/service developed and maintained separately for separate language/region.
46
References [1]. Fernandez, N. C. (2000), Web Site Localisation and
Internationalisation: A Case study, published, City University [2]. Khachane, J, (2005), Web Page Localisation, published Pune University [3]. DEPALMA, D.A. (1999), Strategies for Global Sites, Forrester Research Inc, May 1998 and The eBusiness Report. In: eMarketer [4]. ROCHE, M. (2000) Managing Multilingual Web Applications. 16th International Unicode Conference, Amsterdam [5]. NIELSEN, J. (1999) Designing Web Usability, Indianapolis: New Riders Publishing [6]. Deitsch, Loukides, M, Java Internationalisation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.