Download presentation
Presentation is loading. Please wait.
Published byEvangeline Riley Modified over 9 years ago
2
A Bayesian Approach to HMM-Based Speech Synthesis Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Takashi Masuko, and Keiichi Tokuda Nagoya Institute of Technology Tokyo Institute of Technology 1 2 11 1 1 2
3
2 Background HMM-based speech synthesis system Spectrum, excitation and duration are modeled Speech parameter seqs. are generated Maximum likelihood (ML) criterion Train HMMs and generate speech parameters Point estimate ⇒ The over-fitting problem Bayesian approach Estimate posterior dist. of model parameters Prior information can be use ⇒ Alleviate the over-fitting problem
4
Outline Bayesian speech synthesis Variational Bayesian method Speech parameter generation Bayesian context clustering Prior distribution using cross validation Experiments Conclusion & Future work 3
5
Model training and speech synthesis Bayesian speech synthesis (1/2) 4 : Model parameters : Label seq. for synthesis : Label seq. for training: Training data seq. : Synthesis data seq. ML Bayes
6
Bayesian speech synthesis (2/2) Predictive distribution (marginal likelihood) 5 : HMM state seq. for synthesis data Variational Bayesian method [Attias; ’99] : HMM state seq. for training data : Likelihood of synthesis data : Likelihood of training data : Prior distribution for model parameters
7
Estimate approximate posterior dist. ⇒ Maximize a lower bound Variational Bayesian method (1/2) 6 : Expectation w.r.t. ( Jensen’s inequality ) : Approximate distribution of the true posterior distribution
8
Random variables are statistically independent Optimal posterior distributions Variational Bayesian method (2/2) 7 : normalization terms Iterative updates as the EM algorithm
9
Approximation for speech synthesis is dependent on synthesis data ⇒ Huge computational cost in the synthesis part Ignore the dependency of synthesis data ⇒ Estimation from only training data 8
10
Prior distribution Conjugate prior distribution ⇒ Posterior dist. becomes a same family of dist. with prior dist. Determination using statistics of prior data 9 : Dimension of feature : Covariance of prior data : # of prior data : Mean of prior data Conjugate prior distribution Likelihood function
11
Speech parameter generation Speech parameter Consist of static and dynamic features ⇒ Only static feature seq. is generated Speech parameter generation based on Bayesian approach ⇒ Maximize the lower bound 10
12
Relation between Bayes and ML Compare with the ML criterion Use of expectations of model parameters Can be solved by the same fashion of ML 11 Output dist. ML ⇒ Bayes ⇒
13
Outline Bayesian speech synthesis Variational Bayesian method Speech parameter generation Bayesian context clustering Prior distribution using cross validation Experiments Conclusion & Future work 12
14
Bayesian context clustering Context clustering based on maximizing 13 yes no Select question Gain of Stopping condition ⇒ Split node based on gain : Is this phoneme a vowel?
15
Impact of prior distribution Affect model selection as tuning parameters ⇒ Require determination technique of prior dist. Conventional: maximize the marginal likelihood Lead to the over-fitting problem as the ML Tuning parameters are still required Determination technique of prior distribution using cross validation [Hashimoto; ’08] 14
16
15 Bayesian approach using CV Prior distribution based on Cross Validation 2,31,3 Cross valid prior dist. Calculate likelihood Training data is randomly divided into K groups Posterior dist. 1,2
17
Outline Bayesian speech synthesis Variational Bayesian method Speech parameter generation Bayesian context clustering Prior distribution using cross validation Experiments Conclusion & Future work 16
18
17 Experimental conditions (1/2) DatabaseATR Japanese speech database B-set SpeakerMHT Training data450 utterances Test data53 utterances Sampling rate16 kHz WindowBlackman window Frame size / shift25 ms / 5 ms Feature vector 24 mel-cepstrum + Δ + ΔΔ and log F0 + Δ + ΔΔ (78 dimension) HMM 5-state left-to-right HMM without skip transition
19
18 Experimental conditions (2/2) Compared approach Mean Opinion Score (MOS) test Subjects were 10 Japanese students 20 sentences were chosen at random TrainingContext clustering# of states ML-MDLMLMDL2,491 Bayes-BayesBayesBayes using CV25,911 Bayes-MDLBayes Bayes using CV Adjust threshold 2,553 ML-BayesML MDL Adjust threshold 27,106
20
Mean opinion score Subjective listening test 19 2,49125,91127,1062,553
21
20 Conclusions and future work A new framework based on Bayesian approach All processes are derived from a single predictive distribution Improve the naturalness of synthesized speech Future work Introduce HSMM instead of HMM Investigate the relation between the speech quality and model structures
22
21
23
Marginal likelihood using cross validation Alleviate over-fitting problem Cross valid prior distribution Cross valid prior distribution 22
24
23 Experimental conditions ( 2/2 ) Compared approach Number of states TrainingContext clustering ML-MDLMLMDL Bayes-BayesBayesBayes using cross validation Bayes-MDLBayesBayes using threshold ML-BayesMLMDL using threshold SpectrumF0DurationSum ML-MDL9561,1512802,491 Bayes-Bayes9,07012,8364,00525,911 Bayes-MDL1,941565472,553 ML-Bayes15,0778,8443,18527,106
25
24 Bayesian context clustering using CV を最大化するモデル構造を選択 yes no : 先行音素は母音? 汎化性能の高いモデル構造を選択 分割停止条件 : 各ノードで を計算 分割前後での増加量 が最大となる分割を行う
26
ベイズ基準 モデルパラメータを確率分布によって表現 25 事前分布 : 事後分布 : 予測分布 : 学習データ 合成データ 全てのモデルパラメータを考慮 ⇒ 高い汎化性能
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.