Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Acoustic / Lexical Model Derk Geene

Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words): Language model  Idea: Maximize P(signal|words) P(words)  Today: Acoustic model

Variability  Variation Speaker Pronunciation Environmental Context  Static acoustic model will not work in real applications.  Dynamically adapt P(signal|words) while using the system.

Measuring errors (1)  500 sentences of 6 – 10 words each from 5 to 10 different speakers.  10% relative error reduction  Training set / Development set  First decide optimal parameter settings.

Measuring errors (2)  Word recognition errors: Substitution Deletion Insertion Correct: Did mob mission area of the Copeland ever go to m4 in nineteen eighty one? Recognized: Did mob mission area ** the copy land ever go to m4 in nineteen east one?

Measuring errors (3) Correct: The effect is clear Recognised:Effect is not clear Error Rate One by one: 75% Subs + Dels + Ins #words in correct sentence Word error rate=100% x  Word error rate

Units of speech (1)  Modeling is language dependent.fixme  Modeling unit Accurate Trainable Generalizable

Units of speech (2)  Whole-word models Only suitable for small vocabulary recognition  Phone models Suitable for large vocabulary recognition Problem: over-generalize  less accurate  Syllable models

Context dependency (1)  Recognition accuricy can be improved by using context-dependent parameters.  Important in fast / spontanious speech.  Example: the phoneme /ee/

 Peat  Wheel

Context dependency (2)  Triphone model: phonetic model that takes into consideration both the left and the right neightbouring phones.  If two phones have the same identity, but different left or right contexts, there are considered different triphones.  Interword context-dependent phones.  Place in the word: Beginning Middle End

Context dependency (3)  Stress Longer duration Higher pitch More intensity  Word-level stress Import – Import Italy – Italian  Sentence-level stress I did have dinner.

 Radio

Context dependency (4)  Vary much triphones. 50 3 = 125.000  Many phonemes have the same effects /b/ & /p/ labial (pronounces by using lips) /r/ & /w/ liquids  Clustered acoustic-phonetic units Is the left-context phone a fricative? Is the right-context phone a front vowel?

Acoustic model  After feature extraction, we have a sequence of feature vectors, such as the MFCC vector, as input data. Feature stream Phonemes / units Words Segmentation and labeling Lexical access problem

Acoustic model  Signal  Phonemes  Problem: phonemes can be pronounced differently Speaker differences Speaker rate Microphone

Acoustic model  Phonemes  Words  The three major ways to do this: Vector Quantization Hidden Markov Models Neural Networks

Acoustic model  Problem: Multiple pronunciations: owt aa ey tow t ax m aa ey tow 0,5 0,8 m Dialect variation Coarticulation 0,5 0,2

TheEnd

Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Similar presentations

Presentation on theme: "Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):

Similar presentations

Presentation on theme: "Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):"— Presentation transcript:

Similar presentations

About project

Feedback