Presentation is loading. Please wait.

Presentation is loading. Please wait.

Swaran Lata, Director and HoD Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology, Govt. of India.

Similar presentations


Presentation on theme: "Swaran Lata, Director and HoD Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology, Govt. of India."— Presentation transcript:

1 Swaran Lata, Director and HoD slata@mit.gov.in Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology, Govt. of India

2 Organization of Presentation India – cultural diversity Linguistic Diversity in India Present Knowledge Society and Indian Scenario ICT scenario in India Internet penetration – Haves & Have Not-s Mind-set - Still an inhibition Bridging the gap – Service delivery –reaching the citizens doorsteps Localization – Key enabler Challenges and Issues TDIL’s efforts National Roll Out Plan – A big Step forward Localization of Applications Putting Standards in place Collaboration and Hand-holding

3 India – A civilization of more than 5000 years old Vast ancient knowledge base Diverse culture and heritage –probably one of the most spectacular in the world One of largest economy in the present world Rapid strides in Information and communications technology Yet.. Widening divide in terms of knowledge amongst various strata of citizens

4 Linguistic Diversity in India According to Census 2001 India has 122 major languages and 2371 dialects. Out of 122 languages 22 are constitutionally recognized languages. Linguistic Diversity is very rich and wide in India One Language –many script Many Language –one script Culturally different depending on region though using same script for different languages. Even wide difference for same language across different country

5 Though same script – Devanagari – Content wise variation for Hindi and Marathi – Depicting cultural and linguistic difference Marathi Hindi

6 Present ICT scenario in India Despite a reputation as an emerging technology powerhouse, India’s scores on the 2009 Connectivity Scorecard are poor in the vital consumer and business segments. These poor scores should not be surprising, since many of the individual metrics that we utilise are effectively measuring “penetration rates.” This means that India is judged as a whole, and not by the pockets of ICT excellence that it undoubtedly possesses. India scores especially low on broadband and Internet penetration rates. Broadband penetration in India is below 2 percent of households compared to 20 percent of households or more in Turkey, Chile, and Mexico. On the consumer usage front, India is not a strong performer in terms of Internet usage, with below 10 percent of the population regularly using the Internet. The country is hampered by a relatively low literacy rate

7 India still in low broad-band penetration region http://www.itu.int Global Broadband divide

8 Low Rural Tele-density. Compared to urban one

9 Mind-set : Still favouring English as medium of excellence English and Hindi serves and link languages English Learning viewed as a passport to better economic and social prospects. - Even people from low income strata now considers this. Due to surge in the ICT and ICT enabled services in recent time, English now has become 2 nd highest medium of instruction from school level Study by National University for Education Planning and Administration (NUEPA): -- In Sarba Siksha Abhiyan no of students opting for English grew by 150% between 2003-08 while the corresponding fig of Hindi is only 32% Example : Uttar-Pradesh, West Bengal and.. Now using English medium of instruction for schools and colleges Primary school students in Eng medium school (in Lakhs) 2005-062007-08growth Haryana0.191.56721 WB0.292.31704 Punjab0.932.78197 UP0.120.37193 India52.00153.70196

10 Result :  Though, Hindi (ranked 3rd) and Bengali (ranked 8th) are among the top 10 language spoken across the world- but, no Indian language is in the top 10 languages used in the Internet.  Minuscule Internet usage in Indian Languages  Confinement of Knowledge  Low usage of knowledge sources and applications

11  Language constitutes the foundation of communication and is fundamental to cultural and historical heritage.  Increasingly, knowledge and information are key determinants of wealth creation, social transformation and human development.  Language is the primary vector for communicating knowledge and traditions, thus the opportunity to use one’s language on global information networks such as the Internet will determine the extent to which one can participate in the emerging knowledge society.  Thousands of languages worldwide are absent from Internet content and there are no tools for creating or translating information into these excluded tongues.  Huge sections of the world’s population are thus prevented from enjoying the benefits of technological advances and obtaining information essential to their wellbeing and development. UNESCO’s VISION for Multilingualism in Cyberspace

12 An uneven growth Indian Software Export Industry growing at a very fast pace in their global presence However, Root is not expanding its base within the country Fallout : Domestic requirement is not being looked into within the country using Indian Languages Result : Non-availability of Information and Knowledge to the vast section of the citizen Expanding Software Export Low penetration in Indian Market

13 Requirements : Reaching out to the door steps of citizens offering better services for wider dissemination of knowledge. Localization of Software Solutions, contents and services as per local requirements.

14 Common Services Centre –Its objectives CSC is a strategic cornerstone of the National e-Governance Plan (NeGP) – Front end service Interface for major G2C services CSC is one of the three infrastructure pillars of e-governance which the government is committed to building, to ensure “anytime anywhere” web enabled delivery of government services. To provide e-governance services. 100,000 CSCs for 600,000 village clusters To cater to service needs of major rural areas Being implemented in PPP Model

15 Local Language Interface – Not a desirable but An essential Component The success of CSC hinges upon effective delivery of the G2C applications to rural masses Since most of the citizens communicate in their local languages – Local Language Interface to G2C solutions at CSC is essential Hosting of content in local languages helps citizens to interact in a better way in today’s knowledge society Thus, Local Language Interface is “ Not a desirable but An essential Component ”

16 Land Records Road Transport Police Land Regn Treasuries Comrl Taxes Agriculture Gram Pts Munici palities Employment Exchanges Civil Supplies Education Income Tax Passport Visa MCA21 Insurance Banking National ID Central Excise Pensions GIS e-Posts Common Service Centres Gateway e-Procure e-Office eBiz EDI e-Courts India Portal Core Policies NeGP – Mission Mode Projects Initiatives already taken to enable G2C applications such as Land Records, Civil Supplies and Municipal applications with Indian Language Interface

17 Service Delivery Model of CSC Requires Language Interface

18 Localization Requirements for Service Delivery Applications To ensure seamless access of services, language Component /Localization and interface requires at: Storage level – Server end Date Exchange – Traffic (Language tags needs to be properly embedded Display & Rendering Language Interface for differently -abled citizens for more inclusive societal benefits

19 Globalization of IT

20 Process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re-design. Taking a product and making it linguistically and culturally appropriate to the target locale (country/ region and language) where it will be used and sold" I18NL10N Globalization & Localization

21 Locale Data Repository Linguistic Resources Standards Certification Localization Tools Training Awareness Technologies Key Enablers Localization

22 The Tree of Localization Complexities Presentation of dates, times, numbers, lists, and other values. Collation and sorting Alternate calendars, which may include holidays, work rules, weekday/weekend. Currency Tax or regulatory regime Machine Translation Optical Character Recognition Speech Technologies Cross Lingual Information Retrieval Machine Translation Optical Character Recognition Speech Technologies Cross Lingual Information Retrieval Project Management Translation Memory Translation Tools Natural language for text processing: parsing, spell checking, and grammar checking etc Automatic Testing Tools Encoding Standards Multimodal input device standards Fonts & Rendering Engines Transliteration & Translation Guidelines Best Practices Case Studies Consultancy Showcasing of Tools & Technologies Parallel Corpora Speech Corpora Lexical resources Ontologies Dictionaries Thesaurus Reference Terminologies Certified Localization professionals PG Specialization in Localization PhD Programmes Minimizing Time lag Benchmarking w.r.t. English version Political sensitivity Pricing issues Testing methodologies Metrics for Linguistic Testing Certification by Government for linguistic compliance Complexities

23 Globalization and Localization Issues Language Issues Language issues are the result of differences in how languages around the world differ in display, alphabets, grammar, and syntactical rules. Bidirectional scripts Capitalization, Uppercasing and Lowercasing Code Pages Complex Script Awareness Fonts Input Method Editors Keyboards Line and Word Breaks Mirroring Awareness Unicode

24 Formatting Issues From the user's perspective, formatting issues are the primary source of discrepancies when working with applications originally written for another language or culture/locale. Developers should use the National Language Support (NLS) APIs in Windows or the System. Globalization Namespace to handle most of these issues automatically. Globalization Namespace. Addresses Currency Dates Numerals Paper Sizes Telephone Numbers Time Units of Measure

25 Localization- Tool for increasing Financial Sustainability Training of local youth in Localized Content Creation Working with Self Help Groups to up-lift their business Identify Dynamically changing Local Content which helps in their local professions E-Tutor Entertainment during non-official hours

26  TDIL’s Efforts More than a decade’s sustained and major national initiative Leading to development and consolidation of various language Tools, resources and components Continuous and untiring representation in various International and National Standards bodies- ISO,UNICODE, W3C, IETF, ELRA and BIS Represented and included 22Indian Languages in UNICODE First time in India to launch consortium mode projects in the technology intensive areas of Machine Translation, Cross- lingual Information Access, Text to Speech etc - to develop state of the art technologies in Indian languages Promotes futuristic research in Language Technology

27  National Roll-Out Plan –A Big Step Forward CDs containing Software Tools and Fonts for all 22 Officially Recognized Languages released in public domain for free use Contains Fonts, Localized Open Office, Keyboard drivers, E-mail clients and Firefox browsers in Indian languages Freely downloadable from Indian Language Data centre – http://www.ildc.gov.in http://www.ildc.gov.in Already crossed ~ 41 lakhs downloads and 7.0 lakhs shipments NASSCOM may take active role towards proliferating the benefits of these language CDs These free CDs would also benefit NGOs and CSC operators for developing and promoting local language contents.

28 CDs containing Indian Language Software Tools

29  Putting Standards in place UNICODE UNICODE – Default Text Encoding Standard. Compatible with ISO 10646 Seamless data storage and search if data is stored in UNICODE All 22 Officially recognized Indian Languages including Vedic Sanskrit represented in UNICODE Declared as Text Encoding Standard for All E-Governance Applications

30 Extracting Knowledge from our vast ancient knowledge base UNICODE Encoding for Vedic Sanskrit, Grantha scripts : Key towards computerization of knowledge base

31 Capturing Region Specific Requirements : Common Locale Data Repository (CLDR) The Unicode CLDR provides key building blocks for software to support the world's languages. CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; etc. Locale Data for Indian Languages are in the process of modification Six Languages CLDR Hindi, Nepali, Bengali, Assamese, Malayalam and Gujarati are finalized. Other languages in process

32 All Region specific requirements have been captured and put in Hindi Locale repository Example of CLDR: Hindi

33  Putting Standards in place… Contd. W3C W3C World-Wide –web Consortium (W3C) develops web standards for interoperable web solutions across platform, devices and access methodology Ensures interoperability across major browsers, IE, Firefox, Opera etc. Work already started to represent all Indian Language representation in W3C standards. Desirable – Pro-active Industry & Industry Body like NASSCOM participation

34 Keyboard Layouts Open Type Fonts.. Sakal Bharti Fonts Locale Data Language Tag. (For Language Negotiation in Internet) Domain Names in Indian Languages IT Terminology … and Standards for major Linguistic Resources and Tools  Putting Standards in place…Contd.

35  Collaboration and Hand Holding Collaborative efforts required for wider proliferation and sustained initiatives. Govt., Industry Bodies and Academia needs to join hand to address the challenges of Local Language Computing and to promote and bring services closer to doorsteps to millions of citizens in their own languages

36 धन्यवाद Thank You Swaran Lata, Director and HoD slata@mit.gov.in Contact:011-24364365


Download ppt "Swaran Lata, Director and HoD Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology, Govt. of India."

Similar presentations


Ads by Google