Presentation is loading. Please wait.

Presentation is loading. Please wait.

Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate.

Similar presentations


Presentation on theme: "Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate."— Presentation transcript:

1 Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate linear representations http://web.bham.ac.uk/p.jackson/balthasar/

2 Speech dynamics into ASR INTRODUCTION

3 Conventional model INTRODUCTION 1 acoustic observations HMM acoustic PDF 1111234222223334442

4 Linear-trajectory model INTRODUCTION 2341 W acoustic observations articulatory-to- intermediate layer segmental HMM acoustic PDF acoustic mapping

5 Multi-level Segmental HMM segmental finite-state process intermediate “articulatory” layer –linear trajectories mapping required –linear transformation –radial basis function network INTRODUCTION

6 Estimation of linear mapping Matched sequences and THEORY

7 Linear-trajectory equations Defined as: THEORY

8 Training the model parameters For optimal least-squares estimates (acoustic domain): THEORY midpoint slope

9 THEORY midpoint slope For optimal least-squares estimates (articulatory domain): Training the model parameters

10 THEORY midpoint slope For optimal maximum-likelihood estimates (articulatory domain): Training the model parameters

11 Tests on MOCHA S. British English, at 16kHz (Wrench, 2000) –MFCC13 acoustic features, incl. zero’ th –articulatory x - & y -coords from 7 EMA coils –PCA9+Lx: first nine articulatory modes plus the laryngograph log energy METHOD

12 MOCHA baseline performance RESULTS Constant-trajectory SHMM (ID_0) Linear-trajectory SHMM (ID_1)

13 Performance across mappings RESULTS

14 Phone categorisation No.No.Description A 1all data B 2silence; speech C 6linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate D 10as (Deng and Ma, 2000) : silence; vowel; liquid; nasal; UV fric; /s,ch/; V fric; /z,jh/; UV stop; V stop E 10discrete articulatory regions F 49silence; individual phones METHOD

15 Tests on TIMIT N. American English, at 8kHz –MFCC13 acoustic features, incl. zero’ th a)F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker b)F1-3+BE5: five band energies added c)PFS12: synthesiser control parameters METHOD

16 TIMIT baseline performance Constant-trajectory SHMM (ID_0) Linear-trajectory SHMM (ID_1) RESULTS

17 Performance across feature sets RESULTS

18 Performance across groupings RESULTS

19 Results across groupings RESULTS

20 Model visualisation Original acoustic data Constant- trajectory model Linear- trajectory model (c,F) DISCUSSION

21 Conclusions Developed framework for speech dynamics in an intermediate space Linear traj. + piecewise linear mapping bounded by performance of linear traj. in acoustic space Near optimal performance achieved –For more than 3 formant parameters –For 6 or more linear mappings Formants and articulatory parameters gave qualitatively similar results What next? SUMMARY

22 Complete experiments with lang. model Include segment duration models Derive pseudo-articulatory representations by unsupervised (embedded) training Implement non-linear mapping (i.e., RBF) Further information: –here and now –p.jackson@bham.ac.uk –web.bham.ac.uk/p.jackson/balthasar SUMMARY Further work


Download ppt "Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate."

Similar presentations


Ads by Google