Presentation is loading. Please wait.

Presentation is loading. Please wait.

Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Similar presentations


Presentation on theme: "Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):"— Presentation transcript:

1 Acoustic / Lexical Model Derk Geene

2 Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words): Language model  Idea: Maximize P(signal|words) P(words)  Today: Acoustic model

3 Variability  Variation Speaker Pronunciation Environmental Context  Static acoustic model will not work in real applications.  Dynamically adapt P(signal|words) while using the system.

4 Measuring errors (1)  500 sentences of 6 – 10 words each from 5 to 10 different speakers.  10% relative error reduction  Training set / Development set  First decide optimal parameter settings.

5 Measuring errors (2)  Word recognition errors: Substitution Deletion Insertion Correct: Did mob mission area of the Copeland ever go to m4 in nineteen eighty one? Recognized: Did mob mission area ** the copy land ever go to m4 in nineteen east one?

6 Measuring errors (3) Correct: The effect is clear Recognised:Effect is not clear Error Rate One by one: 75% Subs + Dels + Ins #words in correct sentence Word error rate=100% x  Word error rate

7 Units of speech (1)  Modeling is language dependent.fixme  Modeling unit Accurate Trainable Generalizable

8 Units of speech (2)  Whole-word models Only suitable for small vocabulary recognition  Phone models Suitable for large vocabulary recognition Problem: over-generalize  less accurate  Syllable models

9 Context dependency (1)  Recognition accuricy can be improved by using context-dependent parameters.  Important in fast / spontanious speech.  Example: the phoneme /ee/

10  Peat  Wheel

11 Context dependency (2)  Triphone model: phonetic model that takes into consideration both the left and the right neightbouring phones.  If two phones have the same identity, but different left or right contexts, there are considered different triphones.  Interword context-dependent phones.  Place in the word: Beginning Middle End

12 Context dependency (3)  Stress Longer duration Higher pitch More intensity  Word-level stress Import – Import Italy – Italian  Sentence-level stress I did have dinner.

13  Radio

14 Context dependency (4)  Vary much triphones. 50 3 = 125.000  Many phonemes have the same effects /b/ & /p/ labial (pronounces by using lips) /r/ & /w/ liquids  Clustered acoustic-phonetic units Is the left-context phone a fricative? Is the right-context phone a front vowel?

15 Acoustic model  After feature extraction, we have a sequence of feature vectors, such as the MFCC vector, as input data. Feature stream Phonemes / units Words Segmentation and labeling Lexical access problem

16 Acoustic model  Signal  Phonemes  Problem: phonemes can be pronounced differently Speaker differences Speaker rate Microphone

17 Acoustic model  Phonemes  Words  The three major ways to do this: Vector Quantization Hidden Markov Models Neural Networks

18 Acoustic model  Problem: Multiple pronunciations: owt aa ey tow t ax m aa ey tow 0,5 0,8 m Dialect variation Coarticulation 0,5 0,2

19

20 TheEnd


Download ppt "Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):"

Similar presentations


Ads by Google