Download presentation
1
Auditory User Interfaces
Multimedia Auditory User Interfaces T.Sharon - A.Frank
2
Auditory User Interfaces
An Auditory user interface (AUI) is an interface which relies primarily or exclusively on audio for interaction, including speech and sound. (Weinschenk & Barker 2000) Examples: Natural Language/Speech User Interfaces. Hands-free automobile navigational system. Interactive voice response system (IVR) like automated payment center. Products for visually impaired.
3
Why Audio I/O? Hands busy Eyes engaged Disabilities T.Sharon - A.Frank
4
Potential Applications
Auditory Interface can be used in different aspects of our life: Dictation systems Navigation systems Transaction systems Operator services Recording meetings and indexing them later on. T.Sharon - A.Frank
5
Why Audio I/O underused till now?
Needs multiple I/O channels Cost problems Technical problems Algorithmic problems T.Sharon - A.Frank
6
Audio I/O Main Technologies
Speech synthesis Speech recognition Speaker recognition Non-speech audio T.Sharon - A.Frank
7
Text-to-Speech Phoneme-to-Speech Stored Messages Speech Synthesis
T.Sharon - A.Frank
8
Basic workflow of Text-to-Speech
T.Sharon - A.Frank
9
Phoneme-to-Speech Stored phonemes - pre-recorded.
Parameterization (male/female, old/young). Combined sequence to generate words/sentences. Synthesizer chip Parameters Stored Phonemes Synthesizer Chip T.Sharon - A.Frank
10
Prerecorded parts Message splicing How to smooth speech?
Stored Messages Prerecorded parts Message splicing How to smooth speech? Voice playback T.Sharon - A.Frank
11
Speech Synthesis Timeline
T.Sharon - A.Frank
12
Speech Recognition Get acoustic patterns (sampling)
Match to templates (map between acoustic patterns to known templates). Identify tokens T.Sharon - A.Frank
13
Speech Recognition Problems
Speed talkers Words swallowing Speech problems Slang words (culture oriented) Words similarity Environmental noise T.Sharon - A.Frank
14
Speech Recognition Factors
Speaker (in)dependant Single voice training Pre-train/generalize Vocabulary size Training cost Database complexity Pace of speech Isolated words Continuous speech Connected speech T.Sharon - A.Frank
15
Factors affecting error rate of speech recognition
Vocabulary size Background noise Speech spontaneity Sampling rate Amount of training data available T.Sharon - A.Frank
16
Word error rate of speech recognition
0% 10% 30% 40% 20% Word Error Rate Level Of Difficulty Digits Continuous Command and Control Letters and Numbers Broadcast News Read Speech Conversational Speech X T.Sharon - A.Frank
17
Basic workflow of Speech-to-Text
T.Sharon - A.Frank
18
Siri as an Example Siri is an intelligent personal assistant that helps you get things done just by asking. It allows you to use your voice to send messages, schedule meetings, place phone calls, search the web, and more. Siri understands your natural speech, and it asks you questions if it needs more information to complete a task. T.Sharon - A.Frank
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.