Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky

Outline 1.State-of-the-art 2.Modelling phoneme duration 3.Suggestions of human perception results for speech modelling 4.Conclusion I of IV

1. Current HMM state-of-the-art Technology

State-of-the-art: Overview Speech Modelling I of IV

1. Feature Extraction For speech recognition: Extract features that enable us to discriminate between different classes (phonemes) The more discriminant the features, the easier it is to do classification Usually extract frequencies contained in frame (MFCC) I of IV

2. Speech Modelling Usually uses Hidden Markov Models Characteristics Number of states Transition probabilities Model to estimate emission likelihoods (GMMs) or posterior probabilities (ANNs) I of IV

2. Modelling Phoneme duration

Problem (1) Phonemes in reality have different duration If minimum duration longer than phoneme: some states have to model context

Problems (2) Generally, if less number of states, less good performance %WER Baseline TIMIT (S6G32p4) 39.88 TIMIT S4G32p- 3 43.21 TIMIT S4G64p- 3 41.76

Possible solutions Hypothesis: choose shorter minimum duration HMM‘s for shorter phonemes (prior knowledge) 1.Other topology (jump states) 2.Less states

Test setup – TIMIT TIMIT database 8 dialectic regions, 630 speakers Without the dialect „sa“ utterances 3693 sentences for training 1344 sentences for testing Number of model parameters is constant (less states => more Gaussians per states)

Modelling phoneme duration: Results (1) %WER Baseline TIMIT (S6G32p4) 39.88 TIMIT var. no of states: S6G32-S4G32p1 40.82 TIMIT var. no of states: S6G32-S4G64 NO P 39.50 TIMIT jump model for all phonemes (S6G32p2) 41.61 TIMIT var. Topology: jump model (S6G32p1) 39.90

Modelling Phoneme duration: Results analysis If decreased number of states, an increase in Gaussians per state is neccessary to ensure comparative model complexity Insertion penalty less important Decreasing model minimum duration for short phonemes helps correct recognition Better results for variable states

2. Suggestions of human perception results for speech modelling

Human perception tests: Motivation Speech is created to be perceived by humans We know that human performance is very good and robust Simulation of the human perception may lead to improvements Testing on nonsense phoneme sequences (no language model) to isolate the „Acoustic Model“

Human stop-consonant perception (1) Tested stop-consonant perception: Identical noise burst in variable context Liberman, Cooper, Delattre in 1952 Valid test results? Implications for state-of-the-art technology

Human stop-consonant perception: Test setup Synthetic sounds (Matlab generated) 40 test persons, 2 tests each 17 English 11 French 12 others Tests on different days Headphones Technics RP-F880 Quiet room

Human stop-consonant perception: Test setup (2) 12 different noise burst frequencies 7 different two-formant vowels No transitions

Human stop-consonant perception: Selected results In front of /a/ In front of /o/

Suggestions of human perception results for speech modelling Suggests that speech data has to be analyzed in some context => well-known results that context- dependent phoneme models improve performance Suggests the neccessity of the use of Multiple Gaussians per state

4. Conclusions

Conclusions (1) Performance can be improved by introducing variable-state HMMs Context-independent phoneme model is inadequate with short-term spectral features

Conclusions (2) New features (such as TRAPS) to enable capturing of relative dependencies better? Preference to context-dependent phoneme models with multiple Gaussians

Thank you!

Human stop-consonant perception: Results 2004 EN vs.FR

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Similar presentations

Presentation on theme: "Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Similar presentations

Presentation on theme: "Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky."— Presentation transcript:

Similar presentations

About project

Feedback