MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.

Slides:



Advertisements
Similar presentations
IATI Technical Advisory Group Technical Proposals Simon Parrish IATI Technical Advisory Group, DIPR March 2010.
Advertisements

Wincite Knowledge Warehousing and Networking Sophisticated Simplicity.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Bonrix Advance Voice Call Manager Bonrix Software Systems Call us : , us at A-801, Samudra Complex,
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
Libraries and Institutional Content Management Systems
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
Overview of Search Engines
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.
CONFIDENTIAL | © Nuance Communications, Inc. All rights reserved. ENTERPRISE SOLUTIONS 1 Parteek Singh.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Numerical Text-to-Speech Synthesis System Presentation By: Sevakula Rahul Kumar.
1 SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML May 2006, Greece Nixon Patel and Kishore Prahallad Bhrigus.
Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Enlightening minds. Enriching lives. Tamil Digital Industry Badri Seshadri K.S.Nagarajan New Horizon Media.
Du “Text-to-Speech” au multilinguïsme Isabel Meurisse Babel Technologies
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
Modular InfoTech’s Modular Infotech is proud to offer Tools and Components enabled with Indian language so as to address each & every client located across.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Towards optimal TTS corpora CADIC Didier BOIDIN Cedric D'ALESSANDRO Christophe.
Your Search for Indian languages ends at Modular InfoTech, Pune Web-Samhita from Modular InfoTech Pvt. Ltd. Modular InfoTech is proud to offer various.
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
INTRODUCTION TO APPLIED LINGUISTICS
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
How can speech technology be used to help people with disabilities?
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
G. Anushiya Rachel Project Officer
Text-To-Speech System for English
E-Commerce Lecture 8.
3.0 Map of Subject Areas.
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.
Testing Challenges in Indic Languages
Business Information Systems
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Multilingual Information Access in a Digital Library
Voice Activation for Wealth Management
Presentation transcript:

MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

ABSTRACT This paper describes our work in developing multilingual speech recognition and speech synthesis systems in Indian Languages. This paper describes our work in developing multilingual speech recognition and speech synthesis systems in Indian Languages. Existing speech technologies are TTS and ASR in US-Eng, Ind –Eng, Hindi no such systems exist for any other Indian languages. Existing speech technologies are TTS and ASR in US-Eng, Ind –Eng, Hindi no such systems exist for any other Indian languages. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Introduction Voice enabled services are rapidly growing and high margin opportunity, specifically in multilingual country such as India. Voice enabled services are rapidly growing and high margin opportunity, specifically in multilingual country such as India. It is very difficult to have one speech synthesizer for each language. It is very difficult to have one speech synthesizer for each language. The focus is also to develop common multilingual corpora with support for multiple Indian languages and to build appropriate language specific linguistic analysis modules for text-to-speech synthesis. The focus is also to develop common multilingual corpora with support for multiple Indian languages and to build appropriate language specific linguistic analysis modules for text-to-speech synthesis. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Important issues involved Enumerating a phone set to represent Indian languages. Enumerating a phone set to represent Indian languages. Selection of basic unit for synthesis - half- phones, diphones, syllables. Selection of basic unit for synthesis - half- phones, diphones, syllables. Creating a generic acoustic database that covers language variations. Creating a generic acoustic database that covers language variations. Modeling language specific prosody. Modeling language specific prosody. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Our approaches A common notation for graphemes is developed using IT-3 transliteration. A common notation for graphemes is developed using IT-3 transliteration. Di phone based speech synthesis. Di phone based speech synthesis. Data-driven prosody modeling using Classification and Regression Trees (CART). Data-driven prosody modeling using Classification and Regression Trees (CART). Concatenative synthesis using cluster unit selection techniques with syllable-like units. Concatenative synthesis using cluster unit selection techniques with syllable-like units. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Our Current research work Our Current research work  Text to speech synthesis TTS is a multi lingual Text–To–Speech Engine which would enable speech applications to be built in local Indian languages using unit selection algorithm and large corpus. TTS is a multi lingual Text–To–Speech Engine which would enable speech applications to be built in local Indian languages using unit selection algorithm and large corpus. A Telugu TTS system has been built and a voice portal which reads out the local language news in Telugu has been developed. A Telugu TTS system has been built and a voice portal which reads out the local language news in Telugu has been developed.  Speech recognition ASR is a multi lingual automatic speech recognition System that in conjunction with our TTS will enable full fledged speech solutions, the advance features of this engine would allow customization to a vertical within a few hours. ASR is a multi lingual automatic speech recognition System that in conjunction with our TTS will enable full fledged speech solutions, the advance features of this engine would allow customization to a vertical within a few hours. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

 Search engine This is a cross-lingual search engine capable of searching through the content of all Indian languages. This is a cross-lingual search engine capable of searching through the content of all Indian languages. This advanced cross-lingual search engine makes use of several novel features of Indian language scripts including phonetic nature, common phonetic base and syllabic structure of Indian languages. This advanced cross-lingual search engine makes use of several novel features of Indian language scripts including phonetic nature, common phonetic base and syllabic structure of Indian languages. The other novelty of this search engine is that it uses phonetic level units for indexing which enable seamless cross-lingual search across the languages. The other novelty of this search engine is that it uses phonetic level units for indexing which enable seamless cross-lingual search across the languages.  Phonetic typing tool This tool make use of an intuitive and advanced readable transliteration scheme and phonetic properties to key-in scripts in Indian languages. This tool make use of an intuitive and advanced readable transliteration scheme and phonetic properties to key-in scripts in Indian languages. The Bhrigus phonetic typing tool comes with a friendly user interface as well as with APIs to get integrated in applications such as , Blogging framework etc. The Bhrigus phonetic typing tool comes with a friendly user interface as well as with APIs to get integrated in applications such as , Blogging framework etc. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

 Font converters  There is a chaos as far as the Indian languages in electronic form are concerned. Neither can one exchange the notes in Indian languages as conveniently as in English language, nor can one perform search on texts in Indian languages available over the web. Neither can one exchange the notes in Indian languages as conveniently as in English language, nor can one perform search on texts in Indian languages available over the web. This is so because the texts are being stored in font dependent glyph codes. This is so because the texts are being stored in font dependent glyph codes. The glyph coding schemes for these fonts is typically different for different fonts. The glyph coding schemes for these fonts is typically different for different fonts. To view the content of these sites then one requires these fonts on local machine. To view the content of these sites then one requires these fonts on local machine. We are building the font converters for almost all Indian languages. We are building the font converters for almost all Indian languages.  Multi lingual dictionary We are developing a multi lingual dictionary which consists of English as source language and the target languages are Indian languages such as Telugu, Tamil, Gujarathi, Hindi etc. We are developing a multi lingual dictionary which consists of English as source language and the target languages are Indian languages such as Telugu, Tamil, Gujarathi, Hindi etc. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Bhrigus ASR and TTS Process Framework The project components of a TTS system could be divided into language-independent component (LIC) and language-dependant component (LDC). The project components of a TTS system could be divided into language-independent component (LIC) and language-dependant component (LDC). LIC consists of speech synthesis engine dealing with unit selection algorithm and signal processing. LIC consists of speech synthesis engine dealing with unit selection algorithm and signal processing. LDC deals with building language specific resources such as pronunciation dictionary, unit selection database to build a synthetic voice. LDC deals with building language specific resources such as pronunciation dictionary, unit selection database to build a synthetic voice. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Language Dependant (LDC) and Language independent (LIC) components of a TTS system Linguistic resources Text data collection Text normalization Pronunciation dictionary Letter to sound rules Syllabification, Stress Prosodic Pause pred. Unit-selection Synthesis engine Speech resources 1. Unit-selection database 2. Prosodic modeling LDC LIC (Bhrigus TTS) Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Language Dependant (LDC) and Language Independent components (LIC) of an ASR system Language Dependant (LDC) and Language Independent components (LIC) of an ASR system Linguistic resources 1. Text data collection 2. Pronunciation dictionary 3. Letter to sound rules 4. Language Model Speech Recognition Engine Speech resources 1. Acoustic Models LDC LIC (Bhrigus ASR) Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

The development time for building a TTS and an ASR system should consists of developing LIC components and LDC components. The development time for building a TTS and an ASR system should consists of developing LIC components and LDC components. The LIC component of ASR systems is Bhrigus ASR speech recognition-engine, while the LIC component of TTS system is Bhrigus TTS unit-selection-engine. The LIC component of ASR systems is Bhrigus ASR speech recognition-engine, while the LIC component of TTS system is Bhrigus TTS unit-selection-engine. To build LDC components for ASR and TTS, it is suggested to build them together as it would decrease the development time primarily due to sharing of language dependent resources across TTS and ASR systems. To build LDC components for ASR and TTS, it is suggested to build them together as it would decrease the development time primarily due to sharing of language dependent resources across TTS and ASR systems. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

The LDC resources that could be shared across TTS and ASR systems are text data, pronunciation dictionary and letter-to- sound rules. The LDC resources that could be shared across TTS and ASR systems are text data, pronunciation dictionary and letter-to- sound rules. The collected text would be used to build language models for ASR and at the same time would be used to extract a set of optimal sentences to be recorded in the case of TTS system. The collected text would be used to build language models for ASR and at the same time would be used to extract a set of optimal sentences to be recorded in the case of TTS system. Similarly pronunciation dictionary and letter-to-sound rules could be shared across the TTS and ASR system. Similarly pronunciation dictionary and letter-to-sound rules could be shared across the TTS and ASR system. It should also be noted that there exists several modules inside the TTS and ASR engines which could be shared too. It should also be noted that there exists several modules inside the TTS and ASR engines which could be shared too. Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Demos Demos are at Demos are at Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco

Conclusion Conclusion Four basic principles are to create and sustain the leading market solution for professional services. Four basic principles are to create and sustain the leading market solution for professional services. text-to-speech, text-to-speech, speech-to-text, speech-to-text, search, machine translation search, machine translation natural dialogue management for Indian languages including Indian-English; interface that solution into the vast majority of technical environments relevant to these types of applications; provide skilled services; and provide services at differentiated low rates natural dialogue management for Indian languages including Indian-English; interface that solution into the vast majority of technical environments relevant to these types of applications; provide skilled services; and provide services at differentiated low rates Multilingual & International Speech Applications, SpeechTek West 2007, Hilton San Francisco