Download presentation
Presentation is loading. Please wait.
Published byShana Turner Modified over 6 years ago
1
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
S P Kishore*+, Alan W Black#, Rohit Kumar*, Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology, Hyderabad # Language Technologies Institute, Carnegie Mellon University + Institute of Software Research International, Carnegie Mellon University 11/19/2018
2
ORGANIZATION OF THE TALK
Role of Language Technologies Text to Speech Systems Text Processing Front End Speech Generation Component Unit Selection Approach Experiments Choice of Unit Size Generation of Databases – Content & Size of Database Evaluation of Hindi Speech Synthesis System Applications Conclusion 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
3
ROLE OF LANGUAGE TECHNOLOGIES
Natural Interfaces for Information Access Crucial Role for Multilingual Societies Integration of Speech Recognition, Machine Translation and Speech Synthesis For Interaction between 2 people speaking different languages 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
4
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS
A Text to Speech System converts an arbitrary given text into a corresponding spoken waveform. Why Text to Speech Synthesis ? Basic Blocks of a Text to Speech System Basic Units Sequence Prosody Information Text Processing Front End Speech Generation Component Text Speech 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
5
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS TEXT PROCESSING FRONT END
Nature of Indian Scripts Basic units of Indian writing system are Aksharas An Akshara is typically of the form V, CV, CCV Common Phonetic Base About 35 Consonants and 18 Vowels Phonetic nature of languages - What is written is what is spoken Exception: Schwa Deletion (Inherent Vowel Suppression) 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
6
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS TEXT PROCESSING FRONT END
Format of Input Text ISCII, Unicode, Various Fonts Can be handled by use of appropriate conversion module(s) Mapping Non Standard Words to Standard Words NSW: Symbols, digits, initials, abbreviations, Punctuations, non-native words etc. 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
7
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS TEXT PROCESSING FRONT END
Standard Words to Phoneme Sequence Involves Lexicon Lookup and use of Letter to Sound rules for English Due to phonetic nature of Indian scripts, simple letter to sound rules can be used Problems with some languages Inherent Vowel Suppression (schwa deletion) e.g. ratana (rtana) is spoken as ratan Presently we are using set of Heuristic Rules 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
8
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS SPEECH GENERATION COMPONENT ARTICULATORY MODEL BASED SYNTHESIS: Involves simplistic modeling of human speech production mechanism Difficult to accurately model the motion of articulators PARAMETER BASED SYNTHESIS: Speech segments are parameterized in terms of formant frequencies or linear prediction coefficients Difficult to come up with large number of rules to accurately manifest co – articulation and prosody CONCATENATION BASED SYNTHESIS: Inventory of recorded speech segments (units) used Prosodic Variations: Intonation and duration could be acquired and incorporated in the form of rules Store multiple realizations of units with differing prosody 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
9
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
INDIAN LANGUAGE TEXT TO SPEECH (TTS) SYSTEMS SPEECH GENERATION COMPONENT Unit Selection (Data Driven) Approach Multiple realizations of basic units with varying prosodic features are stored in the speech database Storage and retrieval of large number of recorded units is feasible in real time due to availability of cheap memory and computation power 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
10
UNIT SELECTION APPROACH
Building up of Speech Databases Collection of optimal text corpuses Recording the text corpuses Automatic labeling followed by manual correction of labels Extraction of units features Clustering units to facilitate selection 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
11
UNIT SELECTION APPROACH
ISSUES INVOLVED Choice of Unit Size Sub words units: half phone, phone, diphone, syllable Larger the unit size: lesser the joins and lesser the discontinuities Also wide coverage of units in various contexts desirables Generation of Speech Databases Approach for Optimal Selection of Utterances Criteria for Unit Selection Most suitable units are selected from the database on basis of minimization of target and concatenation costs 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
12
EXPERIMENTS CHOICE OF UNIT SIZE
Hindi Synthesizers using different choices of unit sizes built Syllable, diphone, phone, half phones 24 sentences from Hindi news bulletin synthesized Perceptual Test on Native Hindi Speaking Subjects conducted AB – Test Results: Syllables performed better than diphones, phones and half phones Half phones performed better than diphones and phones Ref.: S. P. Kishore, Alan W. Black, “Unit Size in Unit Selection Speech Synthesis”, Eurospeech 2003, Geneva 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
13
EXPERIMENTS CHOICE OF UNIT
Example Utterances Half Phones «««« Phones «« Diphones ««« Syllables ««««« Do send me the wave files for this slide for any one sentence. 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
14
GENERATION OF SPEECH DATABASES
Selection of utterances with wide phonetic and prosodic coverage High Frequency Syllables: Syllable with relatively high occurrence in a corpus A sentence is selected if it has at least one high frequency syllable not present in the previous selected sentences Utterances Recorded and Labeled 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
15
GENERATION OF SPEECH DATABASES
SYLLABLE COVERAGE AND DURATION OF SPEECH DATABASES Language No. of Utterances No. of Uniq. Syllables Total Duration of Speech Hindi 620 2324 22960 90 m Telugu 1100 3394 32295 125 m To Study: Dependency of Quality on Coverage >> 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
16
EXPERIMENTS GENERATION OF SPEECH DATABASES
6 databases with varying syllable coverage built DataBase Duration (in Minutes) No. of Utterances Unique Syllables Total Sylllables D1 10 100 725 2681 D2 30 300 1548 8032 D3 52 500 2187 13738 D4 76 700 2622 19665 D5 99 900 3019 25450 D6 125 1100 3394 32295 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
17
EXPERIMENTS GENERATION OF SPEECH DATABASES
PERCEPTUAL TESTS 5 Subjects asked to listen to 5 sentences and score them on a scale of 0 (worst) to 5 (Best). System Mean Score Variance S1 1.80 0.80 S2 2.64 0.32 S3 2.92 0.21 S4 3.36 0.06 S5 3.52 0.05 S6 3.56 0.08 Example 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
18
EXPERIMENTS GENERATION OF SPEECH DATABASES
11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
19
EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM
Text Processing Front End developed Support of Hindi text in Unicode Handles Non – Standard words like Date, Currency, Digits, Address Abbreviations, etc. Schwa Deletion using Heuristic Rules 200 Sentences Synthesized 9 Native hindi speaking subjects evaluated perceptual quality of the synthesizer Each Subject evaluated nearly 40 sentences out of the 200 Scoring on a scale of 0 (worst) to 5 (Best) Words “Not Sounding Natural” were marked 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
20
EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM
Subject No. of Sentences Mean Score No. of NSN Words Total Words Error in % H1 42 3.14 119/1397 8.52 % H2 3.26 115/1420 8.10 % H3 45 3.18 120/1336 8.98 % H4 40 2.75 193/1156 16.7 % H5 39 3.33 90/1126 7.99 % H6 3.95 87/1315 6.62 % H7 3.30 113/1349 8.38 % H8 38 2.61 162/1126 14.3 % H9 41 3.39 113/1188 9.51 % Overall 3.21 1112/11413 9.74 % 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
21
EVALUTION OF HINDI SPEECH SYNTHESIS SYSTEM
OBSERVATIONS 30% of “Not Sounding Natural” words were loan words from English Proper Nouns not being pronounced correctly Schwa Deletion rules not successfully deleting schwa in some places Some punctuations characters not getting handled properly LESSONS Additional Phonetic Coverage for proper nouns and loan words required Good text processing component needed for high quality speech synthesis E.g. of proper nouns ?? Is it Names of persons and places or things like cows, balloons, etc. 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
22
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
APPLICATIONS Talking Tourists Aid Limited Domain Synthesis Allows person to communicate queries about city, travel, accomodation, etc. News Reader Reading news from a Hindi News Portal Screen Reader for Visually Impaired 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
23
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
CONCLUSION Syllables are better units for Indian Language Speech Synthesis Syllable > Half Phone > Diphone > Phone High coverage of units produces high quality speech. Also there would be less variance marking higher consistency of results Effects of Loan words should be considered in design of speech corpus Good text processing front end needed for high quality synthesis 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
24
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
QUESTIONS 11/19/2018 EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.