Download presentation
Presentation is loading. Please wait.
Published byMoris Arthur McLaughlin Modified over 9 years ago
1
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi Tokuda (Nagoya Institute of Technology) 1. Introduction 6. Experimental Results 4. Hyperparameter Estimation 3. Variational Bayesian Approach Recent speech recognition systems ML ( Maximum Likelihood ) criterion ⇒ Reduce the estimation accuracy MDL ( Minimum Description Length ) criterion ⇒ Based on an asymptotic approximation Objective 2. Bayesian Framework DatabaseJNAS (Japanese Newspaper Article Sentences) Training dataJNAS 20,000 / 2,500 / 200 sentences Test dataJNAS 100 sentences Sampling rate16 KHz WindowHamming window Frame size/shift25 ms / 10 ms Feature vector12 order MFCC + ΔMFCC + ΔEnergy (25 dimensions) Relationships between F and recognition Acc. Variational Bayes [Attias;1999] Approximate posterior distributions by variational method Define a lower bound on the log-likelihood ⇒ Maximize F w.r.t. variational posteriors Use a conjugate prior distribution Output probability distribution ⇒ Gaussian distribution Conjugate prior distribution ⇒ Gauss-Wishart distribution Likelihood function ⇒ Proportional to a Gauss-Wishart distribution Define new hyperparameter T representing the amount of prior data : Number of dimensions : Hyperparameters Context Clustering based on Variational Bayes [Watanabe et al. ;2002] Disadvantage Include integral and expectation calculations ⇒ Effective approximation technique is required : Hidden variables : Variational posterior, Tying structure of prior distributions leaf state phone all … … /a/ /N/ /a/.state[2]/a/.state[4] 5. Experimental Conditions Proposed all phone state leaf ML ConventionalProposed all phone state leaf Relationships between tying structure and the amount of training data Variational Bayesian ( VB ) approach Higher generalization ability Appropriate model structures can be selected Performance depends on hyperparameters Estimating appropriate hyperparameters Maximize F w.r.t. hyperparameters Using the statistics of all leaf nodes ⇒ Maximizing F of the tree structure ⇒ ⇒ Using monophone HMM state statistics ⇒ Maximizing F at the root node Tree Size (%) 200 sentences Phoneme Acc. (%) 46 54 50 58 02040 60 80 100 Phoneme Acc. (%) Tree Size (%) 20,000 sentences 75 76 77 79 78 020406080100 Tree Size (%) 2,500 sentences 71 72 73 74 02040 60 80100 Phoneme Acc. (%) F and recognition accuracy behaved similarly Stop node split if Yes No Y Q F N Q F P Q F Q : Phonetic question P Q N Q Y QQ FFFΔF Prior distributionPosterior distributionPredictive distribution : Observation vectors : Model parameters : Input vectors Based on the posterior distributions Model parameters are regarded as probabilistic variables If prior distributions have tying structure ⇒ F is good for model selection Otherwise ⇒ F increases monotonically as T increases Appropriate tying structure of prior distributions ⇒ Depend on the amount of training data Advantages Prior knowledge can be integrated Model structure can be selected Robust classification The VB clustering with appropriate prior distribution improves the recognition performance Estimate hyperparameters maximizing marginal likelihood Proposed technique gives consistent improvement at the value of F ・ Large training data set ⇒ Tying few prior distributions ・ Small training data set ⇒ Tying many prior distributions Conventional Proposed EvidenceLikelihood : Root node mean vector : Root node covariance matrix : Leaf node index : Leaf node occupation: Posterior parameters : Leaf node mean vector : Mean vector : Inverse covariance : Observation vector T The value of F 8642010 20.2 20.3 20.4 20,000 sentences T Phoneme Acc. (%) 8642010 76 77 78 79 75 20,000 sentences Maximize F w.r.t. variational posteriors Consider four kinds of tying structure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.