Download presentation
Presentation is loading. Please wait.
1
Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate linear representations http://web.bham.ac.uk/p.jackson/balthasar/
2
Speech dynamics into ASR INTRODUCTION
3
Conventional model INTRODUCTION 1 acoustic observations HMM acoustic PDF 1111234222223334442
4
Linear-trajectory model INTRODUCTION 2341 W acoustic observations articulatory-to- intermediate layer segmental HMM acoustic PDF acoustic mapping
5
Multi-level Segmental HMM segmental finite-state process intermediate “articulatory” layer –linear trajectories mapping required –linear transformation –radial basis function network INTRODUCTION
6
Estimation of linear mapping Matched sequences and THEORY
7
Linear-trajectory equations Defined as: THEORY
8
Training the model parameters For optimal least-squares estimates (acoustic domain): THEORY midpoint slope
9
THEORY midpoint slope For optimal least-squares estimates (articulatory domain): Training the model parameters
10
THEORY midpoint slope For optimal maximum-likelihood estimates (articulatory domain): Training the model parameters
11
Tests on MOCHA S. British English, at 16kHz (Wrench, 2000) –MFCC13 acoustic features, incl. zero’ th –articulatory x - & y -coords from 7 EMA coils –PCA9+Lx: first nine articulatory modes plus the laryngograph log energy METHOD
12
MOCHA baseline performance RESULTS Constant-trajectory SHMM (ID_0) Linear-trajectory SHMM (ID_1)
13
Performance across mappings RESULTS
14
Phone categorisation No.No.Description A 1all data B 2silence; speech C 6linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate D 10as (Deng and Ma, 2000) : silence; vowel; liquid; nasal; UV fric; /s,ch/; V fric; /z,jh/; UV stop; V stop E 10discrete articulatory regions F 49silence; individual phones METHOD
15
Tests on TIMIT N. American English, at 8kHz –MFCC13 acoustic features, incl. zero’ th a)F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker b)F1-3+BE5: five band energies added c)PFS12: synthesiser control parameters METHOD
16
TIMIT baseline performance Constant-trajectory SHMM (ID_0) Linear-trajectory SHMM (ID_1) RESULTS
17
Performance across feature sets RESULTS
18
Performance across groupings RESULTS
19
Results across groupings RESULTS
20
Model visualisation Original acoustic data Constant- trajectory model Linear- trajectory model (c,F) DISCUSSION
21
Conclusions Developed framework for speech dynamics in an intermediate space Linear traj. + piecewise linear mapping bounded by performance of linear traj. in acoustic space Near optimal performance achieved –For more than 3 formant parameters –For 6 or more linear mappings Formants and articulatory parameters gave qualitatively similar results What next? SUMMARY
22
Complete experiments with lang. model Include segment duration models Derive pseudo-articulatory representations by unsupervised (embedded) training Implement non-linear mapping (i.e., RBF) Further information: –here and now –p.jackson@bham.ac.uk –web.bham.ac.uk/p.jackson/balthasar SUMMARY Further work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.