Presentation is loading. Please wait.

Presentation is loading. Please wait.

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

Similar presentations


Presentation on theme: "December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed."— Presentation transcript:

1 December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed

2 HMM Based Speech Synthesis 2 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

3 HMM Based Speech Synthesis 3 Speech Synthesis  What is speech synthesis? –Generating human like speech using computers.  Applications –Text To Speech. –Conversation systems. –Speech to speech translation. –Concept to speech.  Systems built since late 1970s. –MITTALK 1979 –Klattalk 1980

4 HMM Based Speech Synthesis 4 Speech Synthesis, Cont.  Challenges: –Intelligibility. –Naturalness. –Pleasantness. –Emotions.

5 HMM Based Speech Synthesis 5 Speech Synthesis, Techniques Techniques Formant BasedConcatenativeHMM Based Rule Based Difficult to make Machine Like Instance Based Based on corpus Better quality Not flexible Statistical Based Based on corpus Newest technique More flexible

6 HMM Based Speech Synthesis 6 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

7 HMM Based Speech Synthesis 7 HMM Based Speech Synthesis Overview  HMM has been used successfully in speech recognition.  In Recogntion  In Speech Synthesis:

8 HMM Based Speech Synthesis 8 HMM Based Speech Synthesis Overview, Cont.  Include delta and acceleration to get smooth output

9 HMM Based Speech Synthesis 9 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part

10 HMM Based Speech Synthesis 10 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Modeled using MSD-HMM 25 Mel-Cepstral

11 HMM Based Speech Synthesis 11 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Context Dependant Models Each model 5 States

12 HMM Based Speech Synthesis 12 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part

13 HMM Based Speech Synthesis 13 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Each Frame is either voiced or unvoiced

14 HMM Based Speech Synthesis 14 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part

15 HMM Based Speech Synthesis 15 Advantages 1.Its voice characteristics can be easily modified, 2.It can be applied to various languages with little modification, 3.A variety of speaking styles or emotional speech can be synthesized using the small amount of speech data, 4.Techniques developed in ASR can be easily applied, 5.Its footprint is relatively small.  An HMM based TTS system produced best results in Blizzard challenge.

16 HMM Based Speech Synthesis 16 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

17 HMM Based Speech Synthesis 17 Problems we tried to solve 1.Marking each frame as either voiced or unvoiced degrades quality, because there are some unvoiced components on most voiced speech parts, and there are mixed- excitation phonemes. 2.Used speech signal analysis / synthesis techniques and parameters degrades quality.

18 HMM Based Speech Synthesis 18 Multi-Band Excitation  In MBE (Multi-Band Excitation) speech is divided into a number of frequency bands, and voicing is estimated in each band (used 17 bands).

19 HMM Based Speech Synthesis 19 Mixed Excitation  In synthesis periodic and noise excitations are mixed according to voicing parameters

20 HMM Based Speech Synthesis 20 Spectral Envelop Estimation Find values for a fixed number of samples Use sinusoidal model for synthesis

21 HMM Based Speech Synthesis 21 Modified System Synthesis Part Speech Database F0 Extraction Spectral Envelop Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Spectral Envelop Samples f0f0 Training Part Bands Voicing detection Bands Voicing Noise + STFT filter Harmonics Synthesis Bands Mixing Spec. Env. Samples + f 0 Bands Voicing Voiced Speech Unvoiced Speech Speech

22 HMM Based Speech Synthesis 22 Result  MOS scores

23 HMM Based Speech Synthesis 23 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

24 HMM Based Speech Synthesis 24 Other Challenges  Speech is overly smoothed –Use global variance.  Modeling accuracy, the system uses same modeling as recognition. –Hidden semi markov models (duration). –Trajectory HMMs, –Minimum Generation error training –More states clusters and use acoustic context (under research).

25 HMM Based Speech Synthesis 25 More States Clusters  Instead of computing one Gaussian per state, we store all occurrences. And record the context of each occurrence.  At synthesis we get the best sequence using dynamic programming. PreviousNextCurrent …

26 HMM Based Speech Synthesis 26 Thank You


Download ppt "December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed."

Similar presentations


Ads by Google