HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Building an ASR using HTK CS4706
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speech Recognition. What makes speech recognition hard?
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Back-End Synthesis* Julia Hirschberg (*Thanks to Dan, Jim, Richard Sproat, and Erica Cooper for slides)
Communications & Multimedia Signal Processing Formant Based Synthesizer Qin Yan Communication & Multimedia Signal Processing Group Dept of Electronic.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
Automatic Continuous Speech Recognition Database speech text Scoring.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Introduction to Automatic Speech Recognition
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Speech Recognition Application
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
7-Speech Recognition Speech Recognition Concepts
Prepared by: Waleed Mohamed Azmy Under Supervision:
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Synthesis of Child Speech With HMM Adaptation and Voice Conversion Oliver Watts, Junichi Yamagishi, Member, IEEE, Simon King, Senior Member, IEEE, and.
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Speech recognition and the EM algorithm
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
HMM training strategy for incremental speech synthesis.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Performance Comparison of Speaker and Emotion Recognition
Discriminative Phonetic Recognition with Conditional Random Fields Jeremy Morris & Eric Fosler-Lussier The Ohio State University Speech & Language Technologies.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Olivier Siohan David Rybach
G. Anushiya Rachel Project Officer
Mr. Darko Pekar, Speech Morphing Inc.
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
Automatic Speech Recognition: Conditional Random Fields for ASR
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
Presentation transcript:

HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011

Concatenative Synthesis

HMM Synthesis A parametric model Can train on mixed data from many speakers Model takes up a very small amount of space Speaker adaptation

HMMs Some hidden process has generated some visible observation.

HMMs Some hidden process has generated some visible observation.

HMMs Hidden states have transition probabilities and emission probabilities.

HMM Synthesis Every phoneme+context is represented by an HMM. The cat is on the mat. The cat is near the door. Acoustic features extracted: f0, spectrum, duration Train HMM with these examples.

HMM Synthesis Each state outputs acoustic features (a spectrum, an f 0, and duration)

HMM Synthesis Each state outputs acoustic features (a spectrum, an f 0, and duration)

HMM Synthesis Many contextual features = data sparsity Cluster similar-sounding phones e.g: 'bog' and 'dog' the /aa/ in both have similar acoustic features, even though their context is a bit different Make one HMM that produces both, and was trained on examples of both.

Experiments: Google, Summer 2010 Can we train on lots of mixed data? (~1 utterance per speaker) More data vs. better data 15k utterances from Google Voice Search as training data ace hardware rural supply

More Data vs. Better Data Voice Search utterances filtered by speech recognition confidence scores 50%, 6849 utterances 75%, 4887 utterances 90%, 3100 utterances 95%, 2010 utterances 99%, 200 utterances

Future Work Speaker adaptation Phonetically-balanced training data Listening experiments Parallelization Other sources of data Voices for more languages

Reference