Download presentation
1
A Text-to-Speech Synthesis System
Presented By: Michael Beddaoui Abdel-Aziz El-Solh
2
Presentation Outline Introduction Background
3 Components of TTS System Text Pre-processing Aziz Prosody Mike Concatenation Mike Summary What has been done / Future Work Conclusion Questions
3
What is a TTS System? Definition:
A system which takes as input a sequence of words and converts them to speech Applications: Services for the hearing impaired Reading aloud Commercial TTS Systems: Festival Bell Labs TTS
4
Different TTS Systems Phoneme-Based TTS System Phonemes are:
The minimal distinctive phonetic units Relatively small in number (39 phonemes in English) Disadvantage: Phonemes ignore transitional sound !!!
5
Different TTS Systems (cont’d)
Diphone-Based TTS System Diphones are: Made up of 2 phonemes Incorporate transitional sound Make for better sounding speech Disadvantage: Over 1500 diphones in the English language !!!
6
Fundamental Components
TTS System words Text Pre-processing Prosody Concatenation
7
Text Pre-Processing Input Output Objective
String of characters (sentence) Output String of diphone symbols Objective Perform sentence level analysis Punctuation marks Pauses between words Convert all input to corresponding diphones
8
Text Pre-Processing (Block Diagram)
Number Converter Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
9
Number Converter Replace numerals with their textual versions
one hundred Handle fractional and decimal numbers point two five
10
Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
11
Acronym Converter Replace acronyms with single letter components
A.B.C A B C Change abbreviations to full textual format Mr Mister
12
Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Word Segmenter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
13
Word Segmenter Divide sentence into word segments Segments can be:
Special delimiter to separate segments (i.e. ‘||’) Segments can be: A single word An acronym A numeral Identify punctuation marks
14
Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary
15
Word To Diphone Converter (Phonetization)
Purpose Translate words to their diphone representations Resource Dictionary of words and their diphones (derived from CMU phoneme database) Over 175,000 words supported
16
start index, end index, middle index
W-to-D Converter Cont’d Implementation Binary Search Algorithm in C Start with whole dictionary as search range start index, end index, middle index If target word alphabetically less then middle word, then ignore second half (i.e. end index = middle index) else ignore first half (i.e. start index = middle index) Repeat until word found or range contains zero words
17
W-to-D Converter Cont’d
Advantages Fast search times Search range decreases exponentially with each iteration (max of 1 sec currently) Less complicated to implement Compared to indexing dictionary or Importing the dictionary to an internal structure
18
Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS MLDS Diphone Dictionary
19
The Multi-Level Data Structure
Contains all necessary data for the next sub-system: Word Diphone representation Prosodic parameters for each diphone This reflects both word-level and sentence- level prosody Allows for modularization
20
Prosody done Diphone Acoustic MLDS Concatenation Retrieval
Manipulation Diphone Retrieval Concatenation yes no Diphone Database
21
Diphone Retrieval Database of recorded diphones
Every diphone matched with txt file Distinguished by type (CC, CV, VC, VV) References to specific components within waveform Store diphone waveform and prosodic parameters in variables
22
Properties of Speech Signals
eg. cat.wav c a t Non- Periodic Periodic Non- Periodic
23
Acoustic Manipulation - MATLab
Recognizes wave files (.WAV) load, play, write Vast array of signal processing tools Built-in functions Ease of debugging GUI-capable
24
Pitch/Duration/Amplitude Alteration
Pitch – vowels only As pitch increases, pitch period shrinks As pitch decreases, pitch period expands Need to alter length between pitch marks in order to alter pitch of speech signal
25
Altering Pitch = X Original diphone Extracted pitch period Hanning
window Hanned pitch period ‘C_A’
26
Altering Pitch Cont’d PSOLA – Pitch Synchronous Overlap and Add =
50% Overlap + Add Pitch Up > 50% Pitch Down < 50%
27
Altering Pitch Cont’d = X Kaiser window X 12 -naturally spoken
vowels contain 12-18 pitch marks =
28
Altering Duration Altering Amplitude
Increase number of PSOLA iterations (overlaps) to increase duration Decrease number of PSOLA iterations (overlaps) to decrease duration Altering Amplitude Multiplying the signal by a constant If constant > 1, amplitude increase If constant < 1, amplitude decrease
29
Concatenation Diphones Words Using PSOLA at the joining ends Ensures smooth transition Words Sentence Straight joining at the end points due to presence of pauses
30
Summary TTS System System modularized words Text Pre-processing
Prosody Concatenation System modularized
31
Progress Work Completed / Current Status Work To Be Done
Text pre-processing and prosodic manipulation for a multi-syllable word Diphone concatenation 200+ diphones in database Fully functional GUI implemented Work To Be Done Sentence level synthesis Expand diphone database Fine-tuning and enhancing Prepare for Poster Fair Write final report
32
Questions? Contact Information Michael Beddaoui Abdel-Aziz El-Solh
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.