EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

Slides:



Advertisements
Similar presentations
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Advertisements

S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Information Retrieval in Practice
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Overview on Text to Speech Systems (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009) Kishore Prahallad International Institute.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
ÓC-DAC Noida’2004 Efforts in Language & Speech Technology Natural Language Processing Lab Centre for Development of Advanced Computing (Ministry of Communications.
Overview of Search Engines
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Assistive Technology By: Roxanne Majeski, Oscar Guerin, Tasha Reaves, Elias Luna.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian.
1 SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML May 2006, Greece Nixon Patel and Kishore Prahallad Bhrigus.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
Arabic TTS (status & problems) O. Al Dakkak & N. Ghneim.
Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Focus Education Assessing Reading: Meeting Year 2 Expectations Year 2 Expectations: Word Reading Decode automatically and fluently Read accurately.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Prepared by: Waleed Mohamed Azmy Under Supervision:
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
OPTIMAL TEXT SELECTION ALGORITHM ASR Project Meetings Dt: 08 June Rohit Kumar - LTRC, IIIT Hyderabad.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
NLP Research Group Meeting ( 27. March ) Text Processing Front End for Indian Language TTS System Text Processing Front End Speech Synthesizer Phonetic.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Introduction to Computational Linguistics
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
How can speech technology be used to help people with disabilities?
Information Retrieval in Practice
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
G. Anushiya Rachel Project Officer
Teaching pronunciation
Techniques and Principles in Language Teaching
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Linguistic knowledge for Speech recognition
Search Engine Architecture
Mr. Darko Pekar, Speech Morphing Inc.
Text-To-Speech System for English
Kindergarten Scope & Sequence Unit 10: School’s Out!
Year 2 Objectives: Writing
Speech Technology for Language Learning
Job Google Job Title: Linguistic Project Manager
Speech and Language Processing
Ch 15 –part 3 -design evaluation
Lecture 12: Data Wrangling
Audio Books for Phonetics Research
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
Applied Linguistics Chapter Four: Corpus Linguistics
University of Illinois System in HOO Text Correction Shared Task
Indian Institute of Technology Bombay
Presentation transcript:

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES S P Kishore*+, Alan W Black#, Rohit Kumar*, Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology, Hyderabad # Language Technologies Institute, Carnegie Mellon University + Institute of Software Research International, Carnegie Mellon University 11/19/2018

ORGANIZATION OF THE TALK Role of Language Technologies Text to Speech Systems Text Processing Front End Speech Generation Component Unit Selection Approach Experiments Choice of Unit Size Generation of Databases – Content & Size of Database Evaluation of Hindi Speech Synthesis System Applications Conclusion 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

ROLE OF LANGUAGE TECHNOLOGIES Natural Interfaces for Information Access Crucial Role for Multilingual Societies Integration of Speech Recognition, Machine Translation and Speech Synthesis For Interaction between 2 people speaking different languages 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS A Text to Speech System converts an arbitrary given text into a corresponding spoken waveform. Why Text to Speech Synthesis ? Basic Blocks of a Text to Speech System Basic Units Sequence Prosody Information Text Processing Front End Speech Generation Component Text Speech 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS TEXT PROCESSING FRONT END Nature of Indian Scripts Basic units of Indian writing system are Aksharas An Akshara is typically of the form V, CV, CCV Common Phonetic Base About 35 Consonants and 18 Vowels Phonetic nature of languages - What is written is what is spoken Exception: Schwa Deletion (Inherent Vowel Suppression) 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS TEXT PROCESSING FRONT END Format of Input Text ISCII, Unicode, Various Fonts Can be handled by use of appropriate conversion module(s) Mapping Non Standard Words to Standard Words NSW: Symbols, digits, initials, abbreviations, Punctuations, non-native words etc. 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS TEXT PROCESSING FRONT END Standard Words to Phoneme Sequence Involves Lexicon Lookup and use of Letter to Sound rules for English Due to phonetic nature of Indian scripts, simple letter to sound rules can be used Problems with some languages Inherent Vowel Suppression (schwa deletion) e.g. ratana (rtana) is spoken as ratan Presently we are using set of Heuristic Rules 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS SPEECH GENERATION COMPONENT ARTICULATORY MODEL BASED SYNTHESIS: Involves simplistic modeling of human speech production mechanism Difficult to accurately model the motion of articulators PARAMETER BASED SYNTHESIS: Speech segments are parameterized in terms of formant frequencies or linear prediction coefficients Difficult to come up with large number of rules to accurately manifest co – articulation and prosody CONCATENATION BASED SYNTHESIS: Inventory of recorded speech segments (units) used Prosodic Variations: Intonation and duration could be acquired and incorporated in the form of rules Store multiple realizations of units with differing prosody 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS SPEECH GENERATION COMPONENT Unit Selection (Data Driven) Approach Multiple realizations of basic units with varying prosodic features are stored in the speech database Storage and retrieval of large number of recorded units is feasible in real time due to availability of cheap memory and computation power 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

UNIT SELECTION APPROACH Building up of Speech Databases Collection of optimal text corpuses Recording the text corpuses Automatic labeling followed by manual correction of labels Extraction of units features Clustering units to facilitate selection 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

UNIT SELECTION APPROACH ISSUES INVOLVED Choice of Unit Size Sub words units: half phone, phone, diphone, syllable Larger the unit size: lesser the joins and lesser the discontinuities Also wide coverage of units in various contexts desirables Generation of Speech Databases Approach for Optimal Selection of Utterances Criteria for Unit Selection Most suitable units are selected from the database on basis of minimization of target and concatenation costs 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS CHOICE OF UNIT SIZE Hindi Synthesizers using different choices of unit sizes built Syllable, diphone, phone, half phones 24 sentences from Hindi news bulletin synthesized Perceptual Test on Native Hindi Speaking Subjects conducted AB – Test Results: Syllables performed better than diphones, phones and half phones Half phones performed better than diphones and phones Ref.: S. P. Kishore, Alan W. Black, “Unit Size in Unit Selection Speech Synthesis”, Eurospeech 2003, Geneva 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS CHOICE OF UNIT Example Utterances Half Phones «««« Phones «« Diphones ««« Syllables ««««« Do send me the wave files for this slide for any one sentence. 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

GENERATION OF SPEECH DATABASES Selection of utterances with wide phonetic and prosodic coverage High Frequency Syllables: Syllable with relatively high occurrence in a corpus A sentence is selected if it has at least one high frequency syllable not present in the previous selected sentences Utterances Recorded and Labeled 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

GENERATION OF SPEECH DATABASES SYLLABLE COVERAGE AND DURATION OF SPEECH DATABASES Language No. of Utterances No. of Uniq. Syllables Total Duration of Speech Hindi 620 2324 22960 90 m Telugu 1100 3394 32295 125 m To Study: Dependency of Quality on Coverage >> 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS GENERATION OF SPEECH DATABASES 6 databases with varying syllable coverage built DataBase Duration (in Minutes) No. of Utterances Unique Syllables Total Sylllables D1 10 100 725 2681 D2 30 300 1548 8032 D3 52 500 2187 13738 D4 76 700 2622 19665 D5 99 900 3019 25450 D6 125 1100 3394 32295 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS GENERATION OF SPEECH DATABASES PERCEPTUAL TESTS 5 Subjects asked to listen to 5 sentences and score them on a scale of 0 (worst) to 5 (Best). System Mean Score Variance S1 1.80 0.80 S2 2.64 0.32 S3 2.92 0.21 S4 3.36 0.06 S5 3.52 0.05 S6 3.56 0.08 Example 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS GENERATION OF SPEECH DATABASES 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM Text Processing Front End developed Support of Hindi text in Unicode Handles Non – Standard words like Date, Currency, Digits, Address Abbreviations, etc. Schwa Deletion using Heuristic Rules 200 Sentences Synthesized 9 Native hindi speaking subjects evaluated perceptual quality of the synthesizer Each Subject evaluated nearly 40 sentences out of the 200 Scoring on a scale of 0 (worst) to 5 (Best) Words “Not Sounding Natural” were marked 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM Subject No. of Sentences Mean Score No. of NSN Words Total Words Error in % H1 42 3.14 119/1397 8.52 % H2 3.26 115/1420 8.10 % H3 45 3.18 120/1336 8.98 % H4 40 2.75 193/1156 16.7 % H5 39 3.33 90/1126 7.99 % H6 3.95 87/1315 6.62 % H7 3.30 113/1349 8.38 % H8 38 2.61 162/1126 14.3 % H9 41 3.39 113/1188 9.51 % Overall   3.21 1112/11413 9.74 % 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM OBSERVATIONS 30% of “Not Sounding Natural” words were loan words from English Proper Nouns not being pronounced correctly Schwa Deletion rules not successfully deleting schwa in some places Some punctuations characters not getting handled properly LESSONS Additional Phonetic Coverage for proper nouns and loan words required Good text processing component needed for high quality speech synthesis E.g. of proper nouns ?? Is it Names of persons and places or things like cows, balloons, etc. 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES APPLICATIONS Talking Tourists Aid Limited Domain Synthesis Allows person to communicate queries about city, travel, accomodation, etc. News Reader Reading news from a Hindi News Portal Screen Reader for Visually Impaired 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES CONCLUSION Syllables are better units for Indian Language Speech Synthesis Syllable > Half Phone > Diphone > Phone High coverage of units produces high quality speech. Also there would be less variance marking higher consistency of results Effects of Loan words should be considered in design of speech corpus Good text processing front end needed for high quality synthesis 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES

EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES QUESTIONS 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES