Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.

Slides:



Advertisements
Similar presentations
1 Gesture recognition Using HMMs and size functions.
Advertisements

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Hidden Markov Models (HMM) Rabiner’s Paper
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
An Introduction to Hidden Markov Models and Gesture Recognition Troy L. McDaniel Research Assistant Center for Cognitive Ubiquitous Computing Arizona State.
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dynamic Time Warping Applications and Derivation
A PRESENTATION BY SHAMALEE DESHPANDE
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Toshiba Update 14/09/2005 Zeynep Inanoglu Machine Intelligence Laboratory CU Engineering Department Supervisor: Prof. Steve Young A Statistical Approach.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Online Chinese Character Handwriting Recognition for Linux
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Graphical models for part of speech tagging
7-Speech Recognition Speech Recognition Concepts
HMM - Basics.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Hidden Markov Models for Information Extraction CSE 454.
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Generation of F 0 Contours Using a Model-Constrained Data- Driven Method Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki Minematsu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Detecting Accent Sandhi in Japanese Using a Superpositional F0 Model Atsuhiro Sakurai Hiromichi Kawanami Keikichi Hirose Depart. of Communication and Information.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Data-Driven Intonation Modeling Using a Neural Network and a Command Response Model Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki.
Statistical Models for Automatic Speech Recognition
CONTEXT DEPENDENT CLASSIFICATION
Handwritten Characters Recognition Based on an HMM Model
CPSC 503 Computational Linguistics
Presentation transcript:

Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Koji Iwano (currently with Tokyo Institute of Technology, Japan) Keikichi Hirose (Dep. of Frontier Eng., The University of Tokyo, Japan)

Introduction to Corpus-Based Intonation Modeling Traditional approach: rules derived from linguistic expertise Human-dependent (too complicated and not satisfactory, because the phenomena involved are not completely understood) Corpus-based approach: modeling derived from statistical analysis of speech corpora Automatic (potential to improve as better speech corpora become available)

Background HMMs are widely used in speech recognition, and fast learning algorithms exist Macroscopic discrete HMMs associated to accentual phrases can store information such as accent type and prosodic structure Morae are extremely important to describe Japanese intonation - sequences of high and low mora can characterize accent types

Overview of the Method Definition of HMM and alphabet: –Accent types modeled by discrete HMMs –2-code mora F 0 contour alphabet used as output symbols –State transitions sychronized with mora transitions Classification of HMMs and training: –HMMs classified according to linguistic attributes –Training by usual FB algorithm Generation of F 0 contours: –Best sequence of symbols generated by a modified Viterbi algorithm

The Mora-F 0 Alphabet Two codes: stylized mora F 0 contours and mora- to-mora  F 0 : 34 symbols each Obtained by LBG clustering from a 500-sentence database (ATR continuous speech database, speaker MHT) All the database is labeled using the 2-code symbols.

State transition Mora transition Accentual phrase The Accentual Phrase HMM HMM Accentual phrases are classified according to: –Accent type –Position of accentual phrase in the sentence –(Optional: number of morae, part-of-speech, syntactic structure)

Example: Example: ‘Karewa Tookyookara kuru. (He comes from Tokyo) Accent typePosition Label sequence [],[],[] [],[],[],[],[],[] [],[] shape 1  F 01, shape 2  F 02 M1: M2: M3:

HMM Topologies (a) Accent types 0 and 1 (a) Other accent types

Training Database ATR Continuous Speech Database (500 sentences, speaker MHT) Segmented in mora and accentual phrases Mora labels using the mora-F 0 alphabet: shape (stylized F 0 contour), mora  F 0. Accentual phrase labels: number of morae, position in the sentence

Output Code Generation How to use the HMM for synthesis? A) Recognition B) Synthesis 1 output sequence Likelihood Best path Best output sequence Best path

Intonation Modeling Using HMM for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +[-log b(y(t)| i t )]}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +[-log b(y(t)| i t )]} next i t next t Viterbi Search for the Recognition Problem:

Intonation Modeling Using HMM for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +[-log b(y max (t)| i t )]}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +[-log b(y max (t)| i t )]} next i t next t Modified Viterbi Search for the Synthesis Problem:

Use of Bigram Probabilities for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}} next i t next t k=1,…,K (dimension of y)

Accent Type Modeling Using HMM

Phrase Boundary Level Modeling Using HMM J-TOBI B.I. Pause Y/N Bound. Level YNNYNN

PH1_0.originalPH1_0.bigram PH1_1.originalPH1_1.bigram PH1_2.originalPH1_2.bigram The Effect of Bigrams

Comments We presented a novel approach to intonation modeling for TTS synthesis based on discrete mora- synchronous HMMs. For now on, more features should be included in the HMM modeling (phonetic context, part-of-speech, etc.), and the approach should be compared to rule- based methods. Training data scarcity is a major problem to overcome (by feature clustering, an F 0 contour generation model, etc.)

Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2,..., K a 12 a 23 a 34 a 22 a 33 b(1|1)~b(K|1)b(1|2)~b(K|2) b(1|3)~b(K|3) a 44 b(1|4)~b(K|4) a 11 a

ステップ1:データベース作 成 ATR の連続音声データベースを使用(500文, 話者 MHT) モーラ単位に分割 モーララベルの付与 F 0 パターンを抽出 LBG 法によるクラスタリング 全データベースにクラスタクラスを付与

Bigramの導入 for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}} next i t next t k=1,…,K (dimension of y)

考察・今後の展望 学習データが少ない TTS システムへの組込みにはさらなる工夫が必要 他の言語情報を考慮(音素、モーラ数、品詞等) データ不足を克服するための工夫(クラスタリング等) モデルの接続に関する検討

Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2,..., K a 12 a 23 a 34 a 22 a 33 b(1|1)~b(K|1)b(1|2)~b(K|2) b(1|3)~b(K|3) a 44 b(1|4)~b(K|4) a 11 a