Download presentation
Presentation is loading. Please wait.
Published byGriselda Alberta Banks Modified over 9 years ago
1
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed
2
HMM Based Speech Synthesis 2 Agenda Speech Synthesis HMM Based Speech Synthesis Proposed System Challenges
3
HMM Based Speech Synthesis 3 Speech Synthesis What is speech synthesis? –Generating human like speech using computers. Applications –Text To Speech. –Conversation systems. –Speech to speech translation. –Concept to speech. Systems built since late 1970s. –MITTALK 1979 –Klattalk 1980
4
HMM Based Speech Synthesis 4 Speech Synthesis, Cont. Challenges: –Intelligibility. –Naturalness. –Pleasantness. –Emotions.
5
HMM Based Speech Synthesis 5 Speech Synthesis, Techniques Techniques Formant BasedConcatenativeHMM Based Rule Based Difficult to make Machine Like Instance Based Based on corpus Better quality Not flexible Statistical Based Based on corpus Newest technique More flexible
6
HMM Based Speech Synthesis 6 Agenda Speech Synthesis HMM Based Speech Synthesis Proposed System Challenges
7
HMM Based Speech Synthesis 7 HMM Based Speech Synthesis Overview HMM has been used successfully in speech recognition. In Recogntion In Speech Synthesis:
8
HMM Based Speech Synthesis 8 HMM Based Speech Synthesis Overview, Cont. Include delta and acceleration to get smooth output
9
HMM Based Speech Synthesis 9 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part
10
HMM Based Speech Synthesis 10 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Modeled using MSD-HMM 25 Mel-Cepstral
11
HMM Based Speech Synthesis 11 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Context Dependant Models Each model 5 States
12
HMM Based Speech Synthesis 12 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part
13
HMM Based Speech Synthesis 13 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Each Frame is either voiced or unvoiced
14
HMM Based Speech Synthesis 14 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part
15
HMM Based Speech Synthesis 15 Advantages 1.Its voice characteristics can be easily modified, 2.It can be applied to various languages with little modification, 3.A variety of speaking styles or emotional speech can be synthesized using the small amount of speech data, 4.Techniques developed in ASR can be easily applied, 5.Its footprint is relatively small. An HMM based TTS system produced best results in Blizzard challenge.
16
HMM Based Speech Synthesis 16 Agenda Speech Synthesis HMM Based Speech Synthesis Proposed System Challenges
17
HMM Based Speech Synthesis 17 Problems we tried to solve 1.Marking each frame as either voiced or unvoiced degrades quality, because there are some unvoiced components on most voiced speech parts, and there are mixed- excitation phonemes. 2.Used speech signal analysis / synthesis techniques and parameters degrades quality.
18
HMM Based Speech Synthesis 18 Multi-Band Excitation In MBE (Multi-Band Excitation) speech is divided into a number of frequency bands, and voicing is estimated in each band (used 17 bands).
19
HMM Based Speech Synthesis 19 Mixed Excitation In synthesis periodic and noise excitations are mixed according to voicing parameters
20
HMM Based Speech Synthesis 20 Spectral Envelop Estimation Find values for a fixed number of samples Use sinusoidal model for synthesis
21
HMM Based Speech Synthesis 21 Modified System Synthesis Part Speech Database F0 Extraction Spectral Envelop Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Spectral Envelop Samples f0f0 Training Part Bands Voicing detection Bands Voicing Noise + STFT filter Harmonics Synthesis Bands Mixing Spec. Env. Samples + f 0 Bands Voicing Voiced Speech Unvoiced Speech Speech
22
HMM Based Speech Synthesis 22 Result MOS scores
23
HMM Based Speech Synthesis 23 Agenda Speech Synthesis HMM Based Speech Synthesis Proposed System Challenges
24
HMM Based Speech Synthesis 24 Other Challenges Speech is overly smoothed –Use global variance. Modeling accuracy, the system uses same modeling as recognition. –Hidden semi markov models (duration). –Trajectory HMMs, –Minimum Generation error training –More states clusters and use acoustic context (under research).
25
HMM Based Speech Synthesis 25 More States Clusters Instead of computing one Gaussian per state, we store all occurrences. And record the context of each occurrence. At synthesis we get the best sequence using dynamic programming. PreviousNextCurrent …
26
HMM Based Speech Synthesis 26 Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.