By : Swaran Lata Country Manager,W3C India Office 6,CGO complex, Electronics Niketan New Delhi

Slides:



Advertisements
Similar presentations
XP New Perspectives on Microsoft Office Word 2003 Tutorial 7 1 Microsoft Office Word 2003 Tutorial 7 – Collaborating With Others and Creating Web Pages.
Advertisements

By : Swaran lata Country Manager W3C India Office 6,CGO Complex, Electronics Niketan, New Delhi E-Publishing standard 1.
Unicode and Windows XP Cathy Wissink Program Manager Globalization Infrastructure, Design and Development Windows International Microsoft.
Copyright © 2009 W3C (MIT, ERCIM, Keio) Government for All: Including People with Disabilities Shadi Abou-Zahra W3C Web Accessibility Initiative (WAI)
® Copyright 2008 Adobe Systems Incorporated. All rights reserved. ADOBE® ACCESSIBILITY Achieving Accessibility with PDF Greg Pisocky Accessibility Specialist.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
© 2011 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Kiran Kaja | Accessibility Engineer Ensuring Accessibility in Document Conversion.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
1 Web Accessibility Challenges in Multilingual web access Somanth Chandra Dy. Country Manager W3C India Office 6,CGO Complex, Electronics Niketan, New.
Web Accessibility Web Services Office of Communications.
Project 1 Introduction to HTML.
WMC  “Web standards” can refer to the actual specification of how a language or technology works.  An industry standards body, such as the.
Breaking the Barriers Making digital content accessible to people with special needs Jukka K. Korpela.
Internationalization of Java Platform Presenter: Ataru Nakazawa Advisor: Xiaoping Jia Date: January 23, 2004.
The W3C Web Accessibility Initiative (WAI) Inclusive learning through technology Damien French.
1 Computing for Todays Lecture 4 Yumei Huo Fall 2006.
Understanding Universal Web Accessibility Bebo White SLAC 27 Sept 2002.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Developing a Basic Web Page with HTML
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Planning and Designing a Website Session 8. Designing a Website Like all technical artefacts a website needs to be carefully planned and designed to be.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
26 April 2001 Unicode and Windows XP, IUC 18 (Hong Kong) Unicode and Windows XP Cathy Wissink Program Manager, Globalization Windows Division Microsoft.
Web Content internationalization & E-Publication Presentation by : Prashant Verma, W3C India 1.
Swaran Lata, Director and HoD Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology, Govt. of India.
Chapter 1 Internet & Web Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D. 1.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 5: Setting Up Global Accessibility.
 Using Microsoft Expression Web you can: › Create Web pages and Web sites › Set what you site will look like as you design it › Add text, images, multimedia.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Chapter 1 Introduction to HTML, XHTML, and CSS
1 HTML/XHTML Objectives Explain what HTML is and how Web pages use HTML Explain what HTML is and how Web pages use HTML Demonstrate how to create Web pages.
Internationalized Domain Names (IDNs) Yale A2K2 Conference New Haven, USA April 27, 2007 Ram Mohan Building a Sustainable Framework.
Encoding and fonts Edward Garrett Software Developer, ELAR.
Enlightening minds. Enriching lives. Tamil Digital Industry Badri Seshadri K.S.Nagarajan New Horizon Media.
XP 1 HTML: The Language of the Web A Web page is a text file written in a language called Hypertext Markup Language. A markup language is a language that.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information.
Chapter 1 Internet & Web Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D. Revised 1/12/2015 by William Pegram 1.
Language Technologies for Multilingual Societies META-FORUM 2011, June 27/28, 2011, Budapest, Hungary Swaran Lata Director & Head, Technology Development.
The Internet Writer’s Handbook 2/e Web Accessibility Writing for the Web.
CNIT 133 Interactive Web Pags – JavaScript and AJAX JavaScript Environment.
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
Week 1 Understanding the Web Design Environment. 1-2 HTML: Then and Now HTML is an application of the Standard Generalized Markup Language Intended to.
Modular InfoTech’s Modular Infotech is proud to offer Tools and Components enabled with Indian language so as to address each & every client located across.
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 8 1 Creating Effective Web Pages Creating HTML Documents Tutorial 8.
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
Evolution of Web Accessibility Meenakshi Sripal COMS E6125.
XHTML By Trevor Adams. Topics Covered XHTML eXtensible HyperText Mark-up Language The beginning – HTML Web Standards Concept and syntax Elements (tags)
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Group 3: Art Gallery Monica Almendarez Content/Project Manager Willliam Egle Technology Manager Christina Pié Usability/ADA Compliance Manager Mirjana.
Your Search for Indian languages ends at Modular InfoTech, Pune Web-Samhita from Modular InfoTech Pvt. Ltd. Modular InfoTech is proud to offer various.
Copenhagen, 6 June 2006 EC CHM Multilinguality Anton Cupcea Finsiel Romania.
UNICODE & Indic Scripts
An ISO 9001:2008 Company With all the tools you need to compute in Indian Languages.
Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.
Internet & World Wide Web How to Program, 5/e © by Pearson Education, Inc. All Rights Reserved.
A centre of expertise in digital information managementwww.ukoln.ac.uk Accessibility and Usability For Web Sites: An Introduction to Web Accessibility.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Leveraging Web Content Management in SharePoint 2013 Christina Wheeler.
Web Design Principles 5 th Edition Chapter 3 Writing HTML for the Modern Web.
How HTML responsiveness translates to PDF
Mobile web in Indian languages and its implementation challenges
Web Content internationalization
International University of Japan
Structuring Content in a Web Document
Indian Languages Market: The Complex Script
Presentation transcript:

By : Swaran Lata Country Manager,W3C India Office 6,CGO complex, Electronics Niketan New Delhi

Localization Internationalization Taking a product and making it linguistically and culturally appropriate to the target locale (country/ region and language) where it will be used and sold" Process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re-design.

 Designing and developing in a way that removes barriers to localization or international deployment.  Providing support for features that may not be used until localization occurs.  Enabling code to support local, regional, language, or culturally related preferences.  Separating localizable elements from source code or content, such that localized alternatives can be loaded or selected based on the user's international preferences as needed.  It can be localized quickly.

Gather Information Globally Enabling : the same code supports multiple regions or cultures Enabling : the same code supports multiple regions or cultures Externalize : makes localization for specific languages, regions Externalize : makes localization for specific languages, regions Customize : add culturally specific functionality, presentation, or content to an application. Customize : add culturally specific functionality, presentation, or content to an application. Test and Support globally LocalizeLocalize

Localization activities Localized Application Internationalized application to be Localized

• India is Multilingual Multi script Country with 22 languages and 11 scripts; population over 1 Billion • Less than 5 percent of people can read & write English. Over 95 percent population is deprived of the benefits of English-based Information Technology Issues regarding Indian Languages Issues regarding Indian Languages • Orthography – Spelling issues • Pronunciation – may be directly mapped but not always • One script-many languages • Many languages – one Script

The Tree of Localization Complexities •Presentation of dates, times, numbers, lists, and other values. •Collation and sorting •Alternate calendars, which may include holidays, work rules, weekday/weekend. •Currency •Tax or regulatory regime •Machine Translation •Optical Character Recognition •Speech Technologies •Cross Lingual Information Retrieval •Project Management •Translation Memory •Translation Tools •Natural language for text processing: parsing, spell checking, and grammar checking etc •Automatic Testing Tools •Encoding Standards •Multimodal input device standards •Fonts & Rendering Engines •Transliteration & Translation •Guidelines •Best Practices •Case Studies •Consultancy •Showcasing of Tools & Technologies •Parallel Corpora •Speech Corpora •Lexical resources •Ontologies •Dictionaries •Thesaurus •Reference Terminologies •Certified Localization professionals •PG Specialization in Localization •PhD Programmes •Minimizing Time lag •Benchmarking w.r.t. English version •Political sensitivity •Pricing issues •Testing methodologies •Metrics for Linguistic Testing •Certification by Government for linguistic compliance

The W3C Internationalization (I18n) Activity works with W3C working groups to make it possible to use Web technologies with different languages, scripts, and cultures. It is to ensure that W3C's formats and protocols are usable worldwide in all languages and in all writing systems. The Internationalization (I18n) Activity statement explains concepts relating to internationalization, as well as the current situation and the role within the W3C of the Internationalization Activity.

Internationalization of Web design & Applications Character Model for World Wide Web Authoring Techniques for XHTML & HTML Authoring CSS Unicode in XML Internationalization of Web Architecture Language tags and Local Identifiers Internationalization Tag Set Internationalization of XML Best practices for XML Internationalization Internationalization of Web Services Language tags and Local Identifiers for World Wide Web

E-GovernmentE-Government Web of Devices Web of Services XMLXML Semantic Web  Internationalization  Internationalization Tag Set  Web Design and Applications  Styling  Html  Xhtml  Wai  Web Architecture  XML  Semantic Web  OWL and RDF  XML Technology  XML associated standards  Web of Services  SOA  Web of Devices  Mobile Web Initiative  Voice  E-Government  Use cases Web Design & Applications Web Architecture InternationalizationInternationalization

Challenges : • Adopting right encoding scheme • Availability on handsets • Usability of Mobile Web Browser • Web support of all Indian languages • Study on specific requirements for Indian languages for W3C Mobile Web Best Practices • Must support standards and specifications • Access to all handset features

Issues on Mobile Web • Character Encoding • Bandwidth and Cost • Presentation Issues • Input • Device Limitations • Lack of standardization • Fonts • Backward Compatibility with Legacy Devices • Lack of standardization • Rendering Issues Messaging Issues • Lack of availability for all characters. • There is no guarantee that a message encoded will be displayed properly at the receiving terminal. • Issue of Multiple Script -one language not addressed. • Standardization of glyph support, syllable composition logic is also an important aspect and is dependent on the implementation level of handset manufacturer. • Legacy Systems

Issues in Mobile Keypads • Multi-tap issues • Too many taps per key for each char No way to know which char is on which Key • Dictionary Based issues • Difficult to learn and operate for the target segments. • Different spelling for मुर्ती, मुरती, मूर्ति even मुरथी, many permutations. Which is the one to be mapped

Road Map : • Character encoding as per Unicode Standard • Enable Mobile Web in Indian languages • Initiation of study for Mobile Web Best Practices 1.0 with respect to requirements for 4 Indian languages : Hindi, Bangla, Marathi, Tamil Suggestions • In terms of internationalization, operators must support appropriate character encoding on the signaling channel which would allow all characters of the world to be represented. • Need of investigation and study of major issues for enabling Mobile web in Indian languages • Standardization of mobile media also required to be addressed taking into consideration of specific requirements of each of Indic languages.

Roadmap : Initiation of study for PLS 1.0 with respect to requirements for 3 Indian languages : Hindi and Bangla, Tamil Issues :

Drop Letter in Indian languages • Issues for Indian Languages with respect to first character used in Hindi, Malayalam, Bengali, Telugu and Gujarati etc Issues : :

• Underlining of characters • There is some examples of Indian languages in which Matra’s are not readable due to underlining of characters

• Vertical arrangements • Formatting issues :  Horizontal justification • Bullets and Numbering

• Indentation of character Challenges : • Implementation of CSS standards developed by W3C regarding Indian languages • Standards however need to be provided to those developing CSS so that by default user could have the facility to use bulleting in his own Indic languages. Roadmap : Initiation of study for CSS 2.0 specifications with respect to requirements for 5 Indian languages : Hindi, Bangla, Punjabi, Kannada, Tamil

Issues : • Some applications are Completely in English. • Some applications have static content in local language but forms in English. • Some applications are multi-lingual but only in limited languages (e.g., English and only one local language). Use cases : Use Case for Land Records • As all type of data related to land, available in these land records • They are used for various planning processes although the manual maintenance of this land record does hinder in effective collation and analysis of the data contained in them.

Roadmap :  Major E-Gov Application in Indian languages need to be studied for improving access to Government through better use of the web. Target languages : Hindi, Bangla, Marathi,Telugu, Tamil Challenges : • To enable e-governance applications in Indian languages. • Compliance with W3C standards.

Screen shots of CLDR Updation : • CLDR HINDI

• Some of the Screen shots of CLDR Updation : CLDR Bengali

• Some of the Screen shots of CLDR Updation : CLDR Malayalam CLDR Assamese

DRAFT LOCALE DATA for HINDI :

Draft locale data for Bengali :

Draft locale data for Assamese :

Draft locale data for Malayalam :

Challenges : • Make Web content accessible to people with disabilities w.r.t Indian languages •WCAG 2.0 Guidelines for success criteria vis-a-vis selected recommendations relevant to Indian context Initiative in India : • “Guidelines for Indian Government websites” by NIC, Govt. of India •STQC Implementing WCAG 2.0 Accessibility through Website Quality Certification •Centre for Internet and Society developing authorized translation of WCAG 2.0 Guidelines Roadmap : • Meet WCAG 2.0 guidelines & techniques w.r.t Indian languages • Initiation with Hindi, Bangla, Marathi, Telugu, Tamil

Issues •Spoofing issues- Homographs •Characters looks similar in address bar eg. 1. क & फ कमल फमल • No two scripts should get mixed • Normal generic rules have to be there with some added restrictions as per language demands are required •Spelling Variants eg.

•Browsers related Issues •No backward compatibility •Conversion from Unicode to Punycode is available in IE7 and onwards •Firefox directly converts Unicode to Punycode •Some rendering issues in different browsers for different Indian languages •Vertical conjuncts DN Draft policy for Indian Language

Language experts from state government examined the Script Grammar of Bangla, Marathi, Gujarati, Konkani and Maithili. Some of the screen shots are shown below: Script Grammar – Bangla

Script Grammar – Gujarati

Script Grammar – Marathi

Multiplicity of Languages Evolution of Orthography Lack of Standardization Enabling Mobile web in Indian languages Initiation of study for a W3C recommendations with respect to requirements for Indian languages Adoption of W3C standards in terms of Internationalization Indian Websites should be fully W3C Complaint E-Gov Application in Indian languages need to be studied for better use of the web.

Language Tag • Initiative in vetting / modification / developing Language Tags in all 22 official Indian languages as well as regional dialectical variation of Indian languages. CLDR • Six Languages in CLDR Hindi, Nepali, Bengali, Assamese, Malayalam and Gujarati are finalized. Other languages are in process. Revised Inscript Keyboard Layout – Enhanced to incorporate additional characters as per Unicode 5.1. C-DAC, IBM, Microsoft & Redhat involved in this initiative.

Thank You