Download presentation
Presentation is loading. Please wait.
Published byClare George Modified over 9 years ago
1
Speech Recognition MIT 6.893 SMA 5508 Spring 2004 Larry Rudolph (MIT)
2
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly A long term goal Since 1950, AI researchers claimed Crucial problem Will be solved within the decade Finally, it appears true Failure rates still too high 90% hit rate is 10% error rate want 98% or 99% success rate
3
6.893 Spring 2004: User Interface Larry Rudolph Spectrum of choices Constrained Domain Unconstrained Domain Speaker Dependent Voice tags (e.g. phone) Trained Dictation (Viavoice) Speaker Independent Galaxy (we are here) What everyone wants
4
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly Waveform to Phonemes Waveform is very fuzzy We think there is a large break between words and sentences hard to see from waveform Mapping waveform segments to phonemes is not accurate
5
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly Phonemes to words Group phonemes into words not always 1-1 mapping missing phonemes false phonemes (extra ones) accents many possible choices Word should be known to system domain or dictionary
6
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly words to sentences People do not always speak grammatically correct some invariant rules (for speech) extra or missing words phrases not always sentences Easier when sentence is in domain domain specified by grammar
7
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly sentences into meaning Dictation system: want sentences Other system: want to understand Integrate high-level processing Most applications need it anyway Helps with recognition useful to disambiguate input
8
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly meaning into action What happens after meaning? Respond to user (even a beep) Usually generate more substantial response Action should be valid in context
9
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly Disambiguation Each transformation is rarely highly accurate Lots of choices Subsequent steps can rule out choices from previous steps
10
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly disambiguation strategy Select “n-best” choices and pass on Each step restricts possible meaning Make heavy use of probability Viterby search state transitions along with probabilities. push through n choices at once
11
6.893 5508 Spring 2004: Speech Recognition Larry Rudolph DRAFT -- To Be Revised Shortly after domain dependent Handling out-of-vocabulary words Multimodal input improve recognition rates e.g. lip reading sometimes easier to point than say
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.