Presentation is loading. Please wait.

Presentation is loading. Please wait.

Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.

Similar presentations


Presentation on theme: "Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear."— Presentation transcript:

1 Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear representations http://web.bham.ac.uk/p.jackson/balthasar/

2 Abstract INTRODUCTION

3 Speech dynamics into ASR dynamics of speech production to constrain recognizer –noisy environments –conversational speech –speaker adaptation efficient, complete and trainable models –for recognition –for analysis –for synthesis INTRODUCTION

4 Articulatory trajectories from West (2000) INTRODUCTION

5 Articulatory-trajectory model INTRODUCTION

6 intermediate finite-state surface Level source dependent Articulatory-trajectory model INTRODUCTION

7 Multi-level Segmental HMM segmental finite-state process intermediate “articulatory” layer –linear trajectories mapping required –linear transformation –radial basis function network INTRODUCTION

8 Linear-trajectory model INTRODUCTION acoustic layer articulatory-to- acoustic mapping intermediate layer segmental HMM 23451

9 Linear-trajectory equations Defined as where Segment probability: THEORY

10 Linear mapping Objective function with matched sequences and THEORY

11 Trajectory parameters Utterance probability, and, for the optimal (ML) state sequence THEORY

12 Non-linear (RBF) mapping... acoustic layer formant trajectories THEORY

13 Trajectory parameters With the RBF, the least-squares solution is sought by gradient descent: THEORY

14 Tests on TIMIT N. American English, at 8kHz –MFCC13 acoustic features (incl. zero’ th ) a)F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker b)F1-3+BE5: five band energies added c)PFS12: synthesiser control parameters METHOD

15 TIMIT baseline performance Constant-trajectory SHMM (ID_0) Linear-trajectory SHMM (ID_1) RESULTS

16 Performance across feature sets RESULTS

17 Phone categorisation No.No.Description A 1all data B 2silence; speech C 6linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate D 10as Deng and Ma (2000): silence; vowel; liquid; nasal; UV fric; /s,ch/; V fric; /z,jh/; UV stop; V stop E 10discrete articulatory regions F 49silence; individual phones METHOD

18 Discrete articulatory regions FeaturesDescription 0 -voiceSilence, non-speech 1 +voice, VT openVowel, glide 2 +voice, VT part.Liquid, approximant 3 +voice, VT closed, +velumNasal 4 +voice, VT closedVoiced plosive (closure) 5 -voice, VT closedVoiceless plosive (closure) 6 +voice, VT open, +plosionVoiced plosive (release) 7 -voice, VT open, +plosionVoiceless plosive (release) 8 +voice, VT part., +fric/aspVoiced fricative 9 -voice, VT part., +fric/aspVoiceless fricative METHOD

19 Performance across groupings RESULTS

20 Results across groupings RESULTS

21 Tests on MOCHA S. British English, at 16kHz –MFCC13 acoustic features (incl. zero’ th ) –articulatory x - & y -coords from 7 EMA coils –PCA9+Lx: first nine articulatory modes plus the laryngograph log energy METHOD

22 MOCHA baseline performance RESULTS

23 Performance across mappings RESULTS

24 Model visualisation DISCUSSION Original acoustic data Constant- trajectory model Linear- trajectory model, (F) PFS12 (c)

25 Conclusions Theory of Multi-level Segmental HMMs Benefits of linear trajectories Results show near optimal performance with linear mappings Progress towards unified models of the speech production process What next? –unsupervised (embedded) training, to derive pseudo-articulatory representations –implement non-linear mapping (i.e., RBF) –include biphone language model, and segment duration models SUMMARY


Download ppt "Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear."

Similar presentations


Ads by Google