December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

Slides:



Advertisements
Similar presentations
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Advertisements

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Reengineering Classification at the USPTO Marti Hearst, Chief IT Strategist, USPTO PIUG Conference May 4, 2010.
Speech Recognition in Noise
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
7-Speech Recognition Speech Recognition Concepts
Prepared by: Waleed Mohamed Azmy Under Supervision:
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Signal Processing I
HMM-Based Synthesis of Creaky Voice
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Agenda TTS Introduction HTS Q & A.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
G. Anushiya Rachel Project Officer
Mr. Darko Pekar, Speech Morphing Inc.
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
8-Speech Recognition Speech Recognition Concepts
Missing feature theory
Speech recognition, machine learning
Artificial Intelligence 2004 Speech & Natural Language Processing
Presenter: Shih-Hsiang(士翔)
Speech recognition, machine learning
Presentation transcript:

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed

HMM Based Speech Synthesis 2 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

HMM Based Speech Synthesis 3 Speech Synthesis  What is speech synthesis? –Generating human like speech using computers.  Applications –Text To Speech. –Conversation systems. –Speech to speech translation. –Concept to speech.  Systems built since late 1970s. –MITTALK 1979 –Klattalk 1980

HMM Based Speech Synthesis 4 Speech Synthesis, Cont.  Challenges: –Intelligibility. –Naturalness. –Pleasantness. –Emotions.

HMM Based Speech Synthesis 5 Speech Synthesis, Techniques Techniques Formant BasedConcatenativeHMM Based Rule Based Difficult to make Machine Like Instance Based Based on corpus Better quality Not flexible Statistical Based Based on corpus Newest technique More flexible

HMM Based Speech Synthesis 6 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

HMM Based Speech Synthesis 7 HMM Based Speech Synthesis Overview  HMM has been used successfully in speech recognition.  In Recogntion  In Speech Synthesis:

HMM Based Speech Synthesis 8 HMM Based Speech Synthesis Overview, Cont.  Include delta and acceleration to get smooth output

HMM Based Speech Synthesis 9 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part

HMM Based Speech Synthesis 10 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Modeled using MSD-HMM 25 Mel-Cepstral

HMM Based Speech Synthesis 11 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Context Dependant Models Each model 5 States

HMM Based Speech Synthesis 12 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part

HMM Based Speech Synthesis 13 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part Each Frame is either voiced or unvoiced

HMM Based Speech Synthesis 14 The Overall System Synthesis Part Speech Database F0 Extraction Mel-Cepstral Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Pulse or Noise Excitation f0f0 MLSA filter Speech Mel-cepstrum Excitation Mel-cepstrumf0f0 Training Part

HMM Based Speech Synthesis 15 Advantages 1.Its voice characteristics can be easily modified, 2.It can be applied to various languages with little modification, 3.A variety of speaking styles or emotional speech can be synthesized using the small amount of speech data, 4.Techniques developed in ASR can be easily applied, 5.Its footprint is relatively small.  An HMM based TTS system produced best results in Blizzard challenge.

HMM Based Speech Synthesis 16 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

HMM Based Speech Synthesis 17 Problems we tried to solve 1.Marking each frame as either voiced or unvoiced degrades quality, because there are some unvoiced components on most voiced speech parts, and there are mixed- excitation phonemes. 2.Used speech signal analysis / synthesis techniques and parameters degrades quality.

HMM Based Speech Synthesis 18 Multi-Band Excitation  In MBE (Multi-Band Excitation) speech is divided into a number of frequency bands, and voicing is estimated in each band (used 17 bands).

HMM Based Speech Synthesis 19 Mixed Excitation  In synthesis periodic and noise excitations are mixed according to voicing parameters

HMM Based Speech Synthesis 20 Spectral Envelop Estimation Find values for a fixed number of samples Use sinusoidal model for synthesis

HMM Based Speech Synthesis 21 Modified System Synthesis Part Speech Database F0 Extraction Spectral Envelop Analysis HMM Training Models Labels and context features Text Analysis Text Text Analysis Parameters Generation Labels and context features Spectral Envelop Samples f0f0 Training Part Bands Voicing detection Bands Voicing Noise + STFT filter Harmonics Synthesis Bands Mixing Spec. Env. Samples + f 0 Bands Voicing Voiced Speech Unvoiced Speech Speech

HMM Based Speech Synthesis 22 Result  MOS scores

HMM Based Speech Synthesis 23 Agenda  Speech Synthesis  HMM Based Speech Synthesis  Proposed System  Challenges

HMM Based Speech Synthesis 24 Other Challenges  Speech is overly smoothed –Use global variance.  Modeling accuracy, the system uses same modeling as recognition. –Hidden semi markov models (duration). –Trajectory HMMs, –Minimum Generation error training –More states clusters and use acoustic context (under research).

HMM Based Speech Synthesis 25 More States Clusters  Instead of computing one Gaussian per state, we store all occurrences. And record the context of each occurrence.  At synthesis we get the best sequence using dynamic programming. PreviousNextCurrent …

HMM Based Speech Synthesis 26 Thank You