Download presentation
Presentation is loading. Please wait.
1
Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004
2
The problem Speech recognizers make mistakes Correcting mistakes is inefficient 140 WPM Uncorrected dictation 14 WPMCorrected dictation, mouse/keyboard 32 WPMCorrected typing, mouse/keyboard Voice-only correction is even slower and more frustrating
3
Research overview Make correction of dictation: More efficient More fun More accessible Approach: Build a word lattice from a recognizer’s n-best list Expand lattice to cover likely recognition errors Make a language model from expanded lattice Use model in a continuous gesture interface to perform confirmation and correction
4
Building lattice Example n-best list: 1: jack studied very hard 2: jack studied hard 3: jill studied hard 4: jill studied very hard 5: jill studied little
5
Insertion errors
6
Acoustic confusions Given a word, find words that sound similar Look pronunciation up in dictionary: studieds t ah d iy d Use observed phone confusions to generate alternative pronunciations:s t ah d iy d s ao d iy s t ah d iy … Map pronunciation back to words: s t ah d iy d studied s ao d iysaudi s t ah d iystudy
7
Acoustic confusions: “Jack studied hard”
8
Language model confusions: “Jack studied hard” Look at words before or after a node, add likely alternate words based on n-gram LM
9
Expansion results (on WSJ1)
10
Probability model Our confirmation and correction interface requires probability of a letter given prior letters:
11
Probability model Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”:
12
Handling word errors Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”:
13
Evaluating expansion Assume a good model requires as little information from the user as possible
14
Results on test set Model evaluated on held out test set (Hub1) Default language model 2.4 bits/letter User decides between 5.3 letters Best speech-based model 0.61 bits/letter User decides between 1.5 letters
15
“To the mouse snow means freedom from want and fear”
16
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.