Download presentation
Presentation is loading. Please wait.
Published byAdriana Chinnock Modified over 10 years ago
1
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein
2
Acoustic Modeling
3
Motivation Standard acoustic models impose many structural constraints We propose an automatic approach Use TIMIT Dataset MFCC features Full covariance Gaussians (Young and Woodland, 1994)
4
Phone Classification ??????????
5
æ
6
HMMs for Phone Classification
7
Temporal Structure
8
Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures Model Error rate HMM Baseline25.1%
9
Our Model Standard Model Single Gaussians Fully Connected
10
Hierarchical Baum-Welch Training 32.1% 28.7% 25.6% HMM Baseline25.1% 5 Split rounds21.4% 23.9%
11
Phone Classification Results MethodError Rate GMM Baseline (Sha and Saul, 2006) 26.0 % HMM Baseline (Gunawardana et al., 2005) 25.1 % SVM (Clarkson and Moreno, 1999) 22.4 % Hidden CRF (Gunawardana et al., 2005) 21.7 % Our Work21.4 % Large Margin GMM (Sha and Saul, 2006) 21.1 %
12
Phone Recognition ?????????
13
Standard State-Tied Acoustic Models
14
No more State-Tying
15
No more Gaussian Mixtures
16
Fully connected internal structure
17
Fully connected external structure
18
Refinement of the /ih/-phone
22
Refinement of the /l/-phone
23
Hierarchical Refinement Results HMM Baseline41.7% 5 Split Rounds28.4%
24
Merging Not all phones are equally complex Compute log likelihood loss from merging Split modelMerged at one node t-1tt+1t-1tt+1
25
Merging Criterion t-1tt+1 t-1tt+1
26
Split and Merge Results Split Only28.4% Split & Merge27.3%
27
HMM states per phone
30
Alignment Hand Aligned27.3% Auto Aligned26.3% Results
31
Alignment State Distribution
32
Inference State sequence: d 1 -d 6 -d 6 -d 4 -ae 5 -ae 2 -ae 3 -ae 0 -d 2 -d 2 -d 3 -d 7 -d 5 Phone sequence: d - d - d -d -ae - ae - ae - ae - d - d -d - d - d Transcription d - ae - d Viterbi Variational ???
33
Variational Inference Variational Approximation: Viterbi26.3% Variational25.1% : Posterior edge marginals Solution:
34
Phone Recognition Results MethodError Rate State-Tied Triphone HMM (HTK) (Young and Woodland, 1994) 27.7 % Gender Dependent Triphone HMM (Lamel and Gauvain, 1993) 27.1 % Our Work26.1 % Bayesian Triphone HMM (Ming and Smith, 1998) 25.6 % Heterogeneous classifiers (Halberstadt and Glass, 1998) 24.4 %
35
Conclusions Minimalist, Automatic Approach Unconstrained Accurate Phone Classification Competitive with state-of-the-art discriminative methods despite being generative Phone Recognition Better than standard state-tied triphone models
36
Thank you! http://nlp.cs.berkeley.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.