Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Similar presentations


Presentation on theme: "Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky."— Presentation transcript:

1 Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky

2 Outline 1.State-of-the-art 2.Modelling phoneme duration 3.Suggestions of human perception results for speech modelling 4.Conclusion I of IV

3 1. Current HMM state-of-the-art Technology

4 State-of-the-art: Overview Speech Modelling I of IV

5 1. Feature Extraction For speech recognition: Extract features that enable us to discriminate between different classes (phonemes) The more discriminant the features, the easier it is to do classification Usually extract frequencies contained in frame (MFCC) I of IV

6 2. Speech Modelling Usually uses Hidden Markov Models Characteristics Number of states Transition probabilities Model to estimate emission likelihoods (GMMs) or posterior probabilities (ANNs) I of IV

7 2. Modelling Phoneme duration

8 Problem (1) Phonemes in reality have different duration If minimum duration longer than phoneme: some states have to model context

9 Problems (2) Generally, if less number of states, less good performance %WER Baseline TIMIT (S6G32p4) 39.88 TIMIT S4G32p- 3 43.21 TIMIT S4G64p- 3 41.76

10 Possible solutions Hypothesis: choose shorter minimum duration HMM‘s for shorter phonemes (prior knowledge) 1.Other topology (jump states) 2.Less states

11 Test setup – TIMIT TIMIT database 8 dialectic regions, 630 speakers Without the dialect „sa“ utterances 3693 sentences for training 1344 sentences for testing Number of model parameters is constant (less states => more Gaussians per states)

12 Modelling phoneme duration: Results (1) %WER Baseline TIMIT (S6G32p4) 39.88 TIMIT var. no of states: S6G32-S4G32p1 40.82 TIMIT var. no of states: S6G32-S4G64 NO P 39.50 TIMIT jump model for all phonemes (S6G32p2) 41.61 TIMIT var. Topology: jump model (S6G32p1) 39.90

13 Modelling Phoneme duration: Results analysis If decreased number of states, an increase in Gaussians per state is neccessary to ensure comparative model complexity Insertion penalty less important Decreasing model minimum duration for short phonemes helps correct recognition Better results for variable states

14 2. Suggestions of human perception results for speech modelling

15 Human perception tests: Motivation Speech is created to be perceived by humans We know that human performance is very good and robust Simulation of the human perception may lead to improvements Testing on nonsense phoneme sequences (no language model) to isolate the „Acoustic Model“

16 Human stop-consonant perception (1) Tested stop-consonant perception: Identical noise burst in variable context Liberman, Cooper, Delattre in 1952 Valid test results? Implications for state-of-the-art technology

17 Human stop-consonant perception: Test setup Synthetic sounds (Matlab generated) 40 test persons, 2 tests each 17 English 11 French 12 others Tests on different days Headphones Technics RP-F880 Quiet room

18 Human stop-consonant perception: Test setup (2) 12 different noise burst frequencies 7 different two-formant vowels No transitions

19 Human stop-consonant perception: Selected results In front of /a/ In front of /o/

20 Suggestions of human perception results for speech modelling Suggests that speech data has to be analyzed in some context => well-known results that context- dependent phoneme models improve performance Suggests the neccessity of the use of Multiple Gaussians per state

21 4. Conclusions

22 Conclusions (1) Performance can be improved by introducing variable-state HMMs Context-independent phoneme model is inadequate with short-term spectral features

23 Conclusions (2) New features (such as TRAPS) to enable capturing of relative dependencies better? Preference to context-dependent phoneme models with multiple Gaussians

24 Thank you!

25 Human stop-consonant perception: Results 2004 EN vs.FR


Download ppt "Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky."

Similar presentations


Ads by Google