Speech Interfaces User Interfaces Spring 1998 Drew Roselli.

Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Motivation: Mechanical Smaller devices => difficult I/O Speed, > 90 wpm (?) “Virtually unlimited” set of commands Freedom for other body parts

Motivation: User Natural Easy to remember Evolutionarily selected for –reading and writing are not –neither is typing

Speech Background Speech is faster than vocal apparatus »nasals spread Phonetic rules provide redundancy »taboo combinations, SR in Srini »contextual pronunciation: /t/ -> aspirated, flap, unreleased

Speech Recognition Often misunderstood by people »continuous feedback Longer words are easier Maximally different vowels: a, i, u Individual training »gender-based »“meaningless” conversation openers

Speech Production Three formants visible on oscilloscope Harmonics from larynx, throat, mouth Two needed for recognition but “tinny” 1989 demo –http://cahn.www.media.mit.edu/people/cahn/em ot-speech.html

More Gratuitous Opinions (I’m really talking out of my butt here.) Recently a visual culture TV generation require pictured textbooks Notes mean “I’ll learn it later” Oral tradition has strong history –http://www.missouri.edu/~csottime/index.html Could we go verbal?

Recognition Problems Poor recognition –humans < 1% error rate on dictation –Janus 7% error rate (how much context?) –Janus 20% in real time Background noise Slow –(simple matter of hardware) Homonym-rich languages (Cantonese)

More Recognition Problems Isolated, short words difficult –common words become short Segmentation –silly versus sill lea No semantic help Spelling –interface with printer, mail

UI Problems: Navigation Aural no-nos –modes –deep hierarchies Speech analog Grammar = how to re-structure linear sequence of words Is there a UI equivalent?

UI Problems: Feedback Verbose feedback wastes time/patience –only confirm consequential things –use meaningful, short cues Interruption –half-duplex communication –real-time scheduling

UI Problems: Meaning “Do what I mean not what I say” Silence means “Do the right thing”

VoiceNotes Voice-based file system Replacement for tapes “Hierarchical” access to voice data Thorough documentation of problems

SpeechActs Speech interface to computer tools –email, calendar, weather, stock quotes Conversions to canonical form –keyword based? confused by negations? Inconsistent recognition –misunderstand system –progressive assistance –implicit confirmation

Multimodal Error Correction Dictation error correction study Results very unclear Recognizer got it wrong the first time => will get it wrong the second time hyperarticulating aggravates Correct dictation errors with: vocal spelling, writing, typing, etc

Speech Interfaces User Interfaces Spring 1998 Drew Roselli.

Similar presentations

Presentation on theme: "Speech Interfaces User Interfaces Spring 1998 Drew Roselli."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Interfaces User Interfaces Spring 1998 Drew Roselli.

Similar presentations

Presentation on theme: "Speech Interfaces User Interfaces Spring 1998 Drew Roselli."— Presentation transcript:

Similar presentations

About project

Feedback