Download presentation
Presentation is loading. Please wait.
Published byRobyn Porter Modified over 9 years ago
1
Speech Interfaces User Interfaces Spring 1998 Drew Roselli
2
Motivation: Mechanical Smaller devices => difficult I/O Speed, > 90 wpm (?) “Virtually unlimited” set of commands Freedom for other body parts
3
Motivation: User Natural Easy to remember Evolutionarily selected for –reading and writing are not –neither is typing
4
Speech Background Speech is faster than vocal apparatus »nasals spread Phonetic rules provide redundancy »taboo combinations, SR in Srini »contextual pronunciation: /t/ -> aspirated, flap, unreleased
5
Speech Recognition Often misunderstood by people »continuous feedback Longer words are easier Maximally different vowels: a, i, u Individual training »gender-based »“meaningless” conversation openers
6
Speech Production Three formants visible on oscilloscope Harmonics from larynx, throat, mouth Two needed for recognition but “tinny” 1989 demo –http://cahn.www.media.mit.edu/people/cahn/em ot-speech.html
7
More Gratuitous Opinions (I’m really talking out of my butt here.) Recently a visual culture TV generation require pictured textbooks Notes mean “I’ll learn it later” Oral tradition has strong history –http://www.missouri.edu/~csottime/index.html Could we go verbal?
8
Recognition Problems Poor recognition –humans < 1% error rate on dictation –Janus 7% error rate (how much context?) –Janus 20% in real time Background noise Slow –(simple matter of hardware) Homonym-rich languages (Cantonese)
9
More Recognition Problems Isolated, short words difficult –common words become short Segmentation –silly versus sill lea No semantic help Spelling –interface with printer, mail
10
UI Problems: Navigation Aural no-nos –modes –deep hierarchies Speech analog Grammar = how to re-structure linear sequence of words Is there a UI equivalent?
11
UI Problems: Feedback Verbose feedback wastes time/patience –only confirm consequential things –use meaningful, short cues Interruption –half-duplex communication –real-time scheduling
12
UI Problems: Meaning “Do what I mean not what I say” Silence means “Do the right thing”
13
VoiceNotes Voice-based file system Replacement for tapes “Hierarchical” access to voice data Thorough documentation of problems
14
SpeechActs Speech interface to computer tools –email, calendar, weather, stock quotes Conversions to canonical form –keyword based? confused by negations? Inconsistent recognition –misunderstand system –progressive assistance –implicit confirmation
15
Multimodal Error Correction Dictation error correction study Results very unclear Recognizer got it wrong the first time => will get it wrong the second time hyperarticulating aggravates Correct dictation errors with: vocal spelling, writing, typing, etc
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.