1 SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML 30-31 May 2006, Greece Nixon Patel and Kishore Prahallad Bhrigus.

Slides:



Advertisements
Similar presentations
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Advertisements

Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech.
1 SSML The Internationalization of the W3C Speech Synthesis Markup Language SpeechTek 2007 – C102 – Daniel C. Burnett.
The sound patterns of language
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
SSML extensions for multi-language usage Davide Bonardo W3C Workshop on Internationalizing SSML Crete, May 2006.
SIE 550 – Formal Languages Lecture for SIE 550 Matt Dube Doctoral Student – Spatial IGERT Fellow.
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
Phonology Phonology is essentially the description of the systems and patterns of speech sounds in a language. It is, in effect, based on a theory of.
Can Non-Native English Speakers Detect and Identify Native English Speakers’ Dialectal Variations? Rebecca Austerman.
Bits and the "Why" of Bytes: Representing Information Digitally
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Chapter three Phonology
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
By: Tashawna King Phonics concepts include:  consonants  vowels  blending sounds into words  phonograms  phonics rules  Phonics is the key to reading.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
CONFIDENTIAL | © Nuance Communications, Inc. All rights reserved. ENTERPRISE SOLUTIONS 1 Parteek Singh.
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML Myoung-Wan.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
The development of writing
Chapter 3 The Development of Writing. Is Writing as early as speaking? Writing is relatively new - it was invented for the first time by the Sumerians.
Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Toshiba (China) R&D Center LOU Xiaoyan, LI Jian Research and Development Center, Toshiba China Suggestions on Tone and Word Boundary of Mandarin for SSML.
Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen
How IPA is Used in SSML and PLS Paolo Baggia, Loquendo Wed. August 9 th, 2006.
Objectives This chapter will help you: Review the two major rules for dividing words into syllables Apply the two rules to specialized terms in different.
Enlightening minds. Enriching lives. Tamil Digital Industry Badri Seshadri K.S.Nagarajan New Horizon Media.
W3C Workshop, Beijing, 2nd of November 2005 An extension to the SSML for diacritics auto-completion R&D Centre Vocal Services Section.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
Project proposal for standardization of Ethiopic script encoding, Keyboard layout and transliteration to Latin Dawit Bekele Mathematics and Computer Science.
1 W3C Workshop on Internationalizing SSML SSML Extension for Korean Workshop : 2005/11/02 (Wed) Sang-Jin Kim
Se Over the past decade, there has been an increased interest in providing new environments for teaching children about computer programming. This has.
Language and Orthography Instructor: Tsueifen Chen.
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
Introduction to Linguistics Ms. Suha Jawabreh Lecture 9.
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
OPTIMAL TEXT SELECTION ALGORITHM ASR Project Meetings Dt: 08 June Rohit Kumar - LTRC, IIIT Hyderabad.
Dr. Harriet J. Ottenheimer Chapter 7 Writing and Literacy.
American Speechsounds How to Use the Program. AmericanSpeechsounds Why use American Speechsounds? Practice the problem sounds of American English Learn.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 8.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Phonetics, part III: Suprasegmentals October 19, 2012.
An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin.
Proposed Vedic Sanskrit Coding Scheme: Some suggestions Akshar Bharati Amba Kulkarni Department of Sanskrit Studies University of Hyderabad Hyderabad
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design Second Edition by Tony Gaddis.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Phonetics, part III: Suprasegmentals October 18, 2010.
Writing System Implementation On-the-Fly Extensibility for the common man Sharon Correll, SIL International Copyright © 2001.
An Efficient Hindi-Urdu Transliteration System Nisar Ahmed PhD Scholar Department of Computer Science and Engineering, UET Lahore.
PLS for SSML Paolo Baggia Loquendo Workshop II on Internationalizing SSML.
TECHNICAL SEMINAR ON IMPLEMENTATION OF PHONETICS IN CRYPTOGRAPHY BY:- VICKY AGARWAL (4JN03CS078) GUIDED BY:- SREEDEVI.S LECTURER DEPT OF CS&E.
Introduction to Linguistics
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
an Introduction to English
Text-To-Speech System for English
Bits and the "Why" of Bytes: Representing Information Digitally
Job Google Job Title: Linguistic Project Manager
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Phonetics & Phonology 2.
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
ASCII and Unicode.
Presentation transcript:

1 SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML May 2006, Greece Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

© Copyright 2006, Bhrigus Software Private Limited. 2 About Bhrigus Collaborative Efforts between Bhrigus and IIIT Hyderabad Nature of Indian language scripts – convergence and divergence Issues across TTS rendering in all these languages Proposed solutions/tags: Syllable Element Alien Element Dialect Element Topics

© Copyright 2006, Bhrigus Software Private Limited. 3 Bhrigus voice & data solutions

© Copyright 2006, Bhrigus Software Private Limited. 4 Established :2002 Business : Providing IVR, Speech & Enterprise solutions to BFSI, Telco’s, contact centers & manufacturing companies. Key Customers : Hewitt Associates, AT&T, Pfizer, Merrill Lynch, Union pacific railroad, CDIA, South western energy, Orange county, Stryker SEI CMM Level 4 Process Implementation undergoing, ISO 9001: 2000 – KPMG certified. About Bhrigus

© Copyright 2006, Bhrigus Software Private Limited. 5 Playing a leadership role in the development of ASR and TTS for all official Indian languages to provide voice solutions for Indian market Collaborations: IIIT Hyderabad, & Carnegie Mellon University 10 member team + board of advisors 3 PhDs and 4 Masters Synthesis team, Recognition team, Linguist team and Language resources team Initiating SSML and VXML chapters in India Speech and Language Technology Bhrigus

© Copyright 2006, Bhrigus Software Private Limited. 6 Bhrigus Inc. Hyderabad – Voice based solution providers IIIT Hyderabad – one of the leading universities in India doing speech research Telugu TTS – Collaborative Efforts between Bhrigus Inc. and IIIT Goal: Develop ASR and TTS for all official Indian languages Collaborative Efforts

© Copyright 2006, Bhrigus Software Private Limited. 7 Basic units of the writing system are Aksharas An Akshara is an orthographic representation of a speech sound Akshara is syllabic in nature, typical forms are V, CV, CCV and CCCV (C – consonant, V – vowel) Always ends with a vowel (or nasalized vowel) in written form ~1652 dialects/native languages 22 languages officially recognized Nature of Indian Language (IL) Scripts

© Copyright 2006, Bhrigus Software Private Limited. 8 Aksharas are syllabic in nature Common phonetic base Share a common set of speech sounds across all languages Fairly good (though not exact) correspondence between sequence of Aksharas and the corresponding sequence of sounds Often referred to as Letter-to-sound rules Written from left-to-right as in European languages Words are separated by space as in European languages Convergence of IL Scripts

© Copyright 2006, Bhrigus Software Private Limited. 9 Each IL has its own script All IL share a common phonetic base – however, Phonotactics in each IL are different from each other IL are non-tonal languages unlike eastern languages such as Chinese Divergence of IL Scripts

© Copyright 2006, Bhrigus Software Private Limited. 10 Unicode Useful for *rendering* the Indian language scripts Not suitable for keying-in through QWERTY key board Not suitable to build modules such as text-normalization (can’t see the Unicode characters on many editors) Itrans-3 / OM - A transliteration scheme by IISc Bangalore, India and Carnegie Mellon University Useful for *keying-in and store* the scripts of Indian language using QWERTY keyboards Useful for processing and writing modules/rules for letter-to- sound, text normalization etc. How to represent Indian language Scripts

© Copyright 2006, Bhrigus Software Private Limited. 11 Itrans-3 / OM Notation

© Copyright 2006, Bhrigus Software Private Limited. 12 Developed from the user readability aspects – Easier to read and type It is case-insensitive. This scheme is phonetic in nature, the characters corresponds to the actual sound that is being spoken. Thus a single transliteration scheme is used for all the Indian languages, as they share the same set of sounds. Each character (corresponding to a phone/sound) is not more than three letters length. Adapted across Universities in India/Abroad and some industrial labs such as Bhrigus Inc. Why Itrans-3/OM?

© Copyright 2006, Bhrigus Software Private Limited. 13 TTS should be able to pronounce words as Akshara (syllable) by Akshara (syllable) Languages have heavy influence of English (alien) words Alien words occur in between the sentences Each language has its own dialect Issues in TTS rendering in IL

© Copyright 2006, Bhrigus Software Private Limited. 14 naatoo Ph attribute specifies phoneme/phone string Rendering “n” “aa” “t” “oo” individually does not make sense to the native speakers of Indian languages Sounds needs to be rendered in terms of syllables SSML Tag: Phoneme Element

© Copyright 2006, Bhrigus Software Private Limited. 15 naatoo Render “naa” and “too” which are Aksharas (syllables) Syllable Element

© Copyright 2006, Bhrigus Software Private Limited. 16 Informal experiments suggested 33% of errors of TTS of IL occur while rendering alien (non-native) words Such alien words could be automatically detected due to syllabic properties of the Indian languages Motivation for Loan Word

© Copyright 2006, Bhrigus Software Private Limited. 17 BANK has to be pronounce as /B/ /AE/ /N/ /K/ /AE/ phoneme does not exist in Indian language phone set baank Alien (non-native) words could be rendered using different pronunciation dictionaries or letter-to-sound rules Example of loan word

© Copyright 2006, Bhrigus Software Private Limited. 18 Each language has its own dialect TTS should be able to handle dialects without unloading the language resources Dialect Element

© Copyright 2006, Bhrigus Software Private Limited. 19 yekkad’iki vel’laali Dialect Element

© Copyright 2006, Bhrigus Software Private Limited. 20 Bhrigus Inc. Hyderabad taking lead position to develop ASR and TTS for Indian languages Proposed elements for SSML extensions Conclusions

© Copyright 2006, Bhrigus Software Private Limited. 21 References 1. Prahallad Lavanya, Prahallad Kishore and GanapathiRaju Madhavi, A Simple Approach for Building Transliteration Editors for Indian Languages, Journal of Zhejiang University Science, vol.6A, no.11, pp , Oct 2005.