Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.

Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear representations http://web.bham.ac.uk/p.jackson/balthasar/

Abstract INTRODUCTION

Speech dynamics into ASR dynamics of speech production to constrain recognizer –noisy environments –conversational speech –speaker adaptation efficient, complete and trainable models –for recognition –for analysis –for synthesis INTRODUCTION

Articulatory trajectories from West (2000) INTRODUCTION

Articulatory-trajectory model INTRODUCTION

intermediate finite-state surface Level source dependent Articulatory-trajectory model INTRODUCTION

Multi-level Segmental HMM segmental finite-state process intermediate “articulatory” layer –linear trajectories mapping required –linear transformation –radial basis function network INTRODUCTION

Linear-trajectory model INTRODUCTION acoustic layer articulatory-to- acoustic mapping intermediate layer segmental HMM 23451

Linear-trajectory equations Defined as where Segment probability: THEORY

Linear mapping Objective function with matched sequences and THEORY

Trajectory parameters Utterance probability, and, for the optimal (ML) state sequence THEORY

Non-linear (RBF) mapping... acoustic layer formant trajectories THEORY

Trajectory parameters With the RBF, the least-squares solution is sought by gradient descent: THEORY

Tests on TIMIT N. American English, at 8kHz –MFCC13 acoustic features (incl. zero’ th ) a)F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker b)F1-3+BE5: five band energies added c)PFS12: synthesiser control parameters METHOD

TIMIT baseline performance Constant-trajectory SHMM (ID_0) Linear-trajectory SHMM (ID_1) RESULTS

Performance across feature sets RESULTS

Phone categorisation No.No.Description A 1all data B 2silence; speech C 6linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate D 10as Deng and Ma (2000): silence; vowel; liquid; nasal; UV fric; /s,ch/; V fric; /z,jh/; UV stop; V stop E 10discrete articulatory regions F 49silence; individual phones METHOD

Discrete articulatory regions FeaturesDescription 0 -voiceSilence, non-speech 1 +voice, VT openVowel, glide 2 +voice, VT part.Liquid, approximant 3 +voice, VT closed, +velumNasal 4 +voice, VT closedVoiced plosive (closure) 5 -voice, VT closedVoiceless plosive (closure) 6 +voice, VT open, +plosionVoiced plosive (release) 7 -voice, VT open, +plosionVoiceless plosive (release) 8 +voice, VT part., +fric/aspVoiced fricative 9 -voice, VT part., +fric/aspVoiceless fricative METHOD

Performance across groupings RESULTS

Results across groupings RESULTS

Tests on MOCHA S. British English, at 16kHz –MFCC13 acoustic features (incl. zero’ th ) –articulatory x - & y -coords from 7 EMA coils –PCA9+Lx: first nine articulatory modes plus the laryngograph log energy METHOD

MOCHA baseline performance RESULTS

Performance across mappings RESULTS

Model visualisation DISCUSSION Original acoustic data Constant- trajectory model Linear- trajectory model, (F) PFS12 (c)

Conclusions Theory of Multi-level Segmental HMMs Benefits of linear trajectories Results show near optimal performance with linear mappings Progress towards unified models of the speech production process What next? –unsupervised (embedded) training, to derive pseudo-articulatory representations –implement non-linear mapping (i.e., RBF) –include biphone language model, and segment duration models SUMMARY

Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.

Similar presentations

Presentation on theme: "Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.

Similar presentations

Presentation on theme: "Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear."— Presentation transcript:

Similar presentations

About project

Feedback