Auditory User Interfaces Multimedia Auditory User Interfaces T.Sharon - A.Frank
Auditory User Interfaces An Auditory user interface (AUI) is an interface which relies primarily or exclusively on audio for interaction, including speech and sound. (Weinschenk & Barker 2000) Examples: Natural Language/Speech User Interfaces. Hands-free automobile navigational system. Interactive voice response system (IVR) like automated payment center. Products for visually impaired.
Why Audio I/O? Hands busy Eyes engaged Disabilities T.Sharon - A.Frank
Potential Applications Auditory Interface can be used in different aspects of our life: Dictation systems Navigation systems Transaction systems Operator services Recording meetings and indexing them later on. T.Sharon - A.Frank
Why Audio I/O underused till now? Needs multiple I/O channels Cost problems Technical problems Algorithmic problems T.Sharon - A.Frank
Audio I/O Main Technologies Speech synthesis Speech recognition Speaker recognition Non-speech audio T.Sharon - A.Frank
Text-to-Speech Phoneme-to-Speech Stored Messages Speech Synthesis T.Sharon - A.Frank
Basic workflow of Text-to-Speech T.Sharon - A.Frank
Phoneme-to-Speech Stored phonemes - pre-recorded. Parameterization (male/female, old/young). Combined sequence to generate words/sentences. Synthesizer chip Parameters Stored Phonemes Synthesizer Chip T.Sharon - A.Frank
Prerecorded parts Message splicing How to smooth speech? Stored Messages Prerecorded parts Message splicing How to smooth speech? Voice playback T.Sharon - A.Frank
Speech Synthesis Timeline T.Sharon - A.Frank
Speech Recognition Get acoustic patterns (sampling) Match to templates (map between acoustic patterns to known templates). Identify tokens T.Sharon - A.Frank
Speech Recognition Problems Speed talkers Words swallowing Speech problems Slang words (culture oriented) Words similarity Environmental noise T.Sharon - A.Frank
Speech Recognition Factors Speaker (in)dependant Single voice training Pre-train/generalize Vocabulary size Training cost Database complexity Pace of speech Isolated words Continuous speech Connected speech T.Sharon - A.Frank
Factors affecting error rate of speech recognition Vocabulary size Background noise Speech spontaneity Sampling rate Amount of training data available T.Sharon - A.Frank
Word error rate of speech recognition 0% 10% 30% 40% 20% Word Error Rate Level Of Difficulty Digits Continuous Command and Control Letters and Numbers Broadcast News Read Speech Conversational Speech X T.Sharon - A.Frank
Basic workflow of Speech-to-Text T.Sharon - A.Frank
Siri as an Example Siri is an intelligent personal assistant that helps you get things done just by asking. It allows you to use your voice to send messages, schedule meetings, place phone calls, search the web, and more. Siri understands your natural speech, and it asks you questions if it needs more information to complete a task. T.Sharon - A.Frank