W3C WORKSHOP II Internationalizing Speech Synthesis Markup Language W3C Office in Heraklion, Crete, Greece, 30-31 May 2006.

Slides:



Advertisements
Similar presentations
By : Swaran Lata Country Manager,W3C India Office 6,CGO complex, Electronics Niketan New Delhi
Advertisements

Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
CHAPTER 2 GC101 Program’s algorithm 1. COMMUNICATING WITH A COMPUTER  Programming languages bridge the gap between human thought processes and computer.
Using technology to enhance the teaching of South Asian Languages Steve Cushion.
Pearson Test of English (PTE)
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
Phonetics and Phonology.
सुस्वागतम् Welcome Technology Development for Indian Languages
1. 2 Indian Languages AdiGaroKolamiMaltoRengma Afghani / Kabuli / PashtoGondiKomMaramSangtam AnalHalabiKondaMaringSavara AngamiHalamKonyakMiri.
Languages of Asia Part 2: South Asia
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
Language, Ethnicity, and Disparities in Contemporary India
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. How the structures of Indian subcontinent languages specifically.
Indo-US Workshop, June 25, 2003 XML-Unicode environment for creating and accessing of Indian language theses: Vidyanidhi experiences Shalini R. Urs Vidyanidhi.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.
Natural Language Processing DR. SADAF RAUF. Topic Morphology: Indian Language and European Language Maryam Zahid.
Web Content internationalization & E-Publication Presentation by : Prashant Verma, W3C India 1.
MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian.
1 SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML May 2006, Greece Nixon Patel and Kishore Prahallad Bhrigus.
How IPA is Used in SSML and PLS Paolo Baggia, Loquendo Wed. August 9 th, 2006.
Internationalized Domain Names (IDNs) Yale A2K2 Conference New Haven, USA April 27, 2007 Ram Mohan Building a Sustainable Framework.
Phonetics and Phonology
Enlightening minds. Enriching lives. Tamil Digital Industry Badri Seshadri K.S.Nagarajan New Horizon Media.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Language Technologies for Multilingual Societies META-FORUM 2011, June 27/28, 2011, Budapest, Hungary Swaran Lata Director & Head, Technology Development.
Modular InfoTech’s Modular Infotech is proud to offer Tools and Components enabled with Indian language so as to address each & every client located across.
India Jai Hind!. Cuisine Places Culture Languages Dresses Traditions.
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
Kannada Text to Speech Synthesis Systems: Emotion Analysis By D.J. RAVI Research Scholar, JSS Research Foundation, S.J College of Engg, Mysore-06.
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
21st September 2004localisation and the digital divide1 and the Development and the Information Society Economic divides Language divides Cultural divides.
Dr. Harriet J. Ottenheimer Chapter 7 Writing and Literacy.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
UNICODE & Indic Scripts
An ISO 9001:2008 Company With all the tools you need to compute in Indian Languages.
TYPES OF BOOKS.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
Sorting it all out: An introduction to collation Cathy Wissink Michael Kaplan Globalization Infrastructure and Font Technology Windows International Microsoft.
© 2015 albert-learning.com Indian languages Indian Languages.
ALR 2013 Some observations Pushpak Bhattacharyya, ALR Chair.
Proposed Vedic Sanskrit Coding Scheme: Some suggestions Akshar Bharati Amba Kulkarni Department of Sanskrit Studies University of Hyderabad Hyderabad
Topic: Classification of World Languages
INDIA - Where is it? - It is in Asia.. What is the population? What is the population? ► -There are people.
Outline  I. Introduction  II. Reading fluency components  III. Experimental study  1) Method and participants  2) Testing materials  IV. Interpretation.
Speech Science History Perry C. Hanavan, Au.D.. Early roots of phonetics India Korea.
Introduction to Indian language computing 20 th MAR 2014.
TKT Tutoring Class Phonology.
An Efficient Hindi-Urdu Transliteration System Nisar Ahmed PhD Scholar Department of Computer Science and Engineering, UET Lahore.
A Multilingual Internet for South Asia
Chapter 4: The Sounds of American English
Language-in-education policies in Southeast Asia
ADDITION OF IPA TRANSCRIPTION TO THE BELARUSIAN NOOJ MODULE
Kindergarten Scope & Sequence Unit 10: School’s Out!
Web Content internationalization
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
PRESENTATION 19/01/2014 Resista Vikaliana.
India Geography and Languages
S. M. Joshi College, Hadapsar, Pune-28.
Indian Institute of Technology Bombay
Oral Language Grammar – basic understanding & rules that regulate the usage of language. Diction – degree of clarity and distinctness in a persons speech.
India Geography and Languages
A Case Study of the Collaboration between the British Library Sound Archive and the Archives and Research Center for Ethnomusicology, New Delhi Christian.
Presentation transcript:

W3C WORKSHOP II Internationalizing Speech Synthesis Markup Language W3C Office in Heraklion, Crete, Greece, May 2006

The Phonemic model from India for Bi-modal applications Prof. R. K. Joshi Visiting Design Specialist Centre for Development of Advanced Computing Mumbai (formerly NCST), India Ex HOD,Professor IDC,IIT Mumbai. Ex Chief Art Director,FCB-Ulka Advertising - Session 4: IPA/Phonetic Alphabets W3C WORKSHOP II, Heraklion, Crete, Greece, 30th May 2006

Multilingual communication (textual/verbal) <> Oriya. Punjabi. Sanskrit. Santhali. Sindhi. Tamil. Telugu. Urdu. Assamese. Bengali. Bodo. Dogri. Gujarati. Hindi. Kannada. Kashmiri. Konkani. Maithili. Malayalam. Manipuri. Marathi. Nepali.

Deshanagari – a common script for all Indian languages Multilingual Happenings <> <> Best wishes for the new year

Sound Poems <>

Collaborative research on notation system for Indian music with Dr. Ashok Ranade <>

Concrete text <>

Indian oral tradition <>

Rigveda Shakha Branches Sanhita Text Pratishakhya Phonetics Shiksha Grammar LakshangranthaBrahmanAranyankaUpanishadShroutasutra Kaushitaki Ashvalayan Shankhayan Shakal Bashkal Atharvaveda Samaveda Yajurveda Shukla -Madhyandina Kanva Krishna -Taitariya Maitrayani Shaunakiya Paipaladi Jaiminiya Ranayaniya Kauthum Rugveda Pratishakhya Yajurveda Pratishakhya Taitariya Pratishakhya Shaunakiya Chaturadhyayik Atharva Pratishakhya Samaveda Pratishakhya Pushpasutra Paniniya Apishali Audavraji Manduki Shodashashloki Yadnyavalka Naradiya etc. <> <>

Varna definitions Varnamala/Matruka Brahmamatruka = 65 Varnas ( ) Akshamatruka = 51 Varnas Rudramatruka = 43 Varnas Kulakramasiddha = 42 Varnas Varnas (Phonemes) Swara Varna (Vowel phonemes) Vyanjana Varna (Consonant phonemes) <>

Taitariya Pratishakhya Sadharanavidhi Samhitadhikarana Uccaranakalpa Formation of articulate sounds and mode of their production. Varnakrama Shuddha, Swara, Matra, Anga Varnasarabhuta Varnasara Bhutavarna Kramah 1. Dhvani 2. Sthanam 3. Karnam 4. Prayatnah 5. Kalatah 6. Swarah 7. Devata 8. Jati Saraswati Sakaara Akaara Reph Akaara Sakaaradvitta Vakaara Akaara Takaara Iikaara <> <> VedaLakshanam

Speech related issues <> <>

Spoken Utterances and written marks <>

<> <>

<> Brahmi script

<> <> Consonant sound + Vowel sound renderings in different Indian scripts Consonant sound + consonant sound + vowel sound renderings in Devanagari script

Consonant sound + Vowel sound + accent marks renderings in Vedic Sanskrit Consonant sound + Vowel sound + Samavedic intonation marks renderings in Vedic Sanskrit <> <>

The concept of InPho Phonemic Model from India for Bi-modal applications <>

InPho To identify the total range of InPho: vowel phonemes, consonant phonemes, modifiers, tonal marks and other signs with phonetic and orthographic definitions. To evolve standardized codes for InPho. To document bi-modal models: written and spoken. Bases: 1. Vedic Sanskrit 2. Manak Hindi 3. International Phonetic Alphabet is an attempt to identify and realize strong Indian oral tradition with the phonemic root base of Indian languages and to explore the correlation between verbal and written expressions. (Phonemic model from India for Bi-modal applications - Bharatiya Varna Samamnaya) <> <>

Proposal for phonemic codes <>

Testing of Vedic Sanskrit, A part from "Naradiya Siksa“ Bhandarkar Oriental Research Institute, Poona , INDIA How codes work A Glyph FullKa Glyph Ku AE Glyph KOUdatta <> <> Syllables of consonant sound K Samavedic syllabic text output from phonemic input V VV VC CV CCC….V Vm Vm C Cm V Cm Cm Cm...Vm m - modifier V – Vowel C - Consonant

The Oral Tradition (K model) The Writing Tradition (K model) Phonetic classification of linguistic sounds based on ‘Varna’ concept. Phonetic nuances in terms of pitch, time, stress in order to achieve intonations. Oral modes: recitation, repetition, memorization and oral reproductions through correct pronunciations of syllabic structures and attentive listening (shravana – shruti). The identification of orthographic marks for graphic structures of syllabic utterances. A simple addition representing Consonant sound + Vowel sound. A simple model of V, VV, C, CV, CC….V to create syllabic writing structure. <>

InPho and IPA <>

Range of InPho Vyanjana Varna - InPho Swara Varna - InPho <> <>

Range of IPA <>

IPA issues 1.Based on extended (Manak Hindi) and phonemic structure of Sanskrit, InPho with the combinations of 52 consonant phonemes and 22 vowel phonemes (CV), provides a basic bi-modal common K model for writing as well as speech in Indian languages. The model has a possibility of extension with the provision of tonal marks. 2.Vedic tonal marks and other intonation signs can have multipurpose usage for present linguistic and dialectal scene of India InPho issues 1.Based on Latin script, IPA has 516 code points in Unicode. Majority of code description is of graphic nature. No equivalences are marked for Asian/Indian languages variations of lower case ‘a’ and 16 variations of capital ‘A’ are provided without phonetic description. 3.IPA has its own coding scheme and is being used at higher scholarly level for transliteration purpose. <>

The proposal In realization of the further potential of exploratory applications it is proposed that: Varna should be treated as a basic smallest unit of input, encoding, output and transmission environment ('K' K model). At present the basic smallest unit for encoding is treated as KA (consonant varna K + vowel varna A). The K model should be used to draw parallelence between International Phonetic Alphabet (IPA) and Indian Phonemic Alphabet (InPho). The authentic phonetic breakup of words in Indian language dictionaries can be considered using ‘K’ model. To adopt ‘K' model for Indian Languages for bi-modal applications to realize element as part of SSML <>

Position Statement – Prof R. K. Joshi Writing was a means of recording speech. Writing as an art was practiced by calligraphers under calligraphy. Text printing was a means of spreading writing. Text planning as an art was practiced by Typographers under typography. Computing is a means of processing digital data of multi-modal linguistic/graphic expressions (writing, speech and graphics). Computing is an art practiced by compugraphers under compugraphy. Compugraphy should aim at a new idiom of writing - writing with graphic elemental changes. a new idiom of printing - dynamically changing printed text in time. a new idiom of processing - multi-modal conversions through natural language processing for cross cultural exchange. <> From pen to print (Analog era) Processing of spoken utterances and/or written marks was carried out internally by human brain/mind/intelligence. From print to processing (Digital era) Processing of spoken utterances and/or written marks is carried out externally by standardized processing mechanism with artificial intelligence.

<> Simple text Formatted text Concrete text Monotone speech Synthetic speech Stylistic speech

<>

Concrete text <>

Conclusion The multitude of Indian language and dialects are written using different scripts. These scripts have been allotted distinct code pages in the Unicode scheme. Bi-modal rendering and processing of Indian languages is complex and mandates distinctly different techniques than Latin script. InPho scheme caters standards to both writing and speech and is based on Indian oral and written tradition. InPho scheme is well suited for other processing tasks such as transliteration, sorting, searching, speech synthesis, speech recognition etc. InPho scheme with its K model to be propagated as a ‘new media’ standard for Indian language Information Interchange. <>

References Kelkar A.R Transliteration of South Asian languages, a brief review and a proposal for a standard. Centre of advanced study in linguistics, Deccan College, Pune. Handbook of the International Phonetic Association Cambridge University Press. Naravane V.D Bharatiya Vyavahar Kosh. Triveni Sangam. Peter Ladefoged, 2001, Vowels and Consonants. Blackwell publishers, UK. Prabodh Primer, Department of official languages, Ministry of home affairs, Govt. of India. R. K. Joshi, Prague 2003, A unified phonemic code based scheme for effective processing of Indian languages. 23rd Internationalization and Unicode. R. K. Joshi. October 2002, Vedic Code, a draft, Vishwabharat No. 7. R. Shama Sastri, K. Rangacharya, 1985 reprint, The Taittiriya Pratishakhya, Motilal Banarasidas, Delhi. Shrinath Shanbaug, Durgesh Rau, R. K. Joshi 2002, An intelligent multi-layered input scheme for phonetic scripts. ACM International conference proceeding series, Hawthorne, New York. Unicode The Unicode Consortium The Unicode standard is the universal character-encoding scheme for written characters and text. It defines a consistent way of encoding multilingual text through software. Vinod Kumar, 2005, IndiX information leaflet, C-DAC Mumbai. W.S. Allen, 1961, Phonetics in Ancient India, London, Oxford University Press. Gajendragadakar S. N., 1972, Bhasha Va Bhashashastra, Venus Prakashan, Pune, November Acknowledgements Executive Director, Director Admin, C-DAC Mumbai. IndiX technology team. IndiX design team: Jui Mhatre, Supriya Kharkar. Vaidik Samshodhan Mandal, Pune. Bharati Sanskrut Niketanam, Mumbai/Lonavala. <> <>

designer calligrapher type designer revivalist poet academician typographer compugrapher