Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.

Slides:



Advertisements
Similar presentations
Operating Systems Manage system resources –CPU scheduling –Process management –Memory management –Input/Output device management –Storage device management.
Advertisements

Speech Recognition There are different kinds of voice or speech “_______" that take the sounds of your voice and match it with words. The engine is software.
Natural Language Systems
                      Digital Audio 1.
Class 6 LBSC 690 Information Technology Human Computer Interaction and Usability.
Interface Design for ICT4B Speech, Dialects, and Interfaces Prof. Dan Klein and Prof. Marti Hearst.
Dialogue Design Speech, pen, and gestures Speech Output  Tradeoffs in speed, naturalness and understandability  Male or female voice? Technical issues.
1 Speech User Interfaces 2 Outline Review Review Motivation for speech UIs Motivation for speech UIs Speech recognition Speech recognition UI problems.
Auditory User Interfaces
Speech User Interfaces
HUMANOID ANIMATION DRIVEN BY HUMAN VOICE Thesis Advisor : Dr. Donald P. Brutzman Second Reader : Dr. Xiaoping Yun A Thesis By Ozan APAYDIN, Turkish Navy.
User Interfaces. User Interface What do we mean by a user interface? The user is the person who is using the computer. A user interface is what he or.
Describe the purpose, components, and use of speech recognition systems.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
1 “ Speech ” EMPOWERED COMPUTING Greenfield Business Centre, 20 th September, 2006.
Chapter 11: Interaction Styles. Interaction Styles Introduction: Interaction styles are primarily different ways in which a user and computer system can.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
Michael Margel Dec CSC 2524 SURFBRD. What is SURFBRD? SURFace-Based Remote Desktop Pronounced “Surfboard” A desktop environment that allows users.
Design of a Speech Recognition System to Assist Hearing Impaired Students Richard Kheir 2 and Thomas P. Way Department of Computing Sciences, Villanova.
Speech Recognition Application
User Interface in the Digital Decade Kai-Fu Lee Corporate Vice President Microsoft Corporation.
Speech User Interfaces Katherine Everitt CSE 490 JL Section Wednesday, Oct 27.
Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.
Modal Interfaces & Speech User Interfaces Katherine Everitt CSE 490F Section Nov 20 & 21, 2006.
Speech Interfaces User Interfaces Spring 1998 Drew Roselli.
E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
© 2013 by Larson Technical Services
March 20, 2006 © 2005 IBM Corporation Distributed Multimodal Synchronization Protocol (DMSP) Chris Cross IETF 65 March 21, 2006 With Contribution from.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
TOUCHLESS TOUCHSCREEN USER INTERFACE
Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.
computing ESSENTIALS    11 CHAPTER Information Technology, the Internet, and You computing ESSENTIALS
Using Speech Recognition to Predict VoIP Quality
Listening Comprehension in Pedagogical Research
Characteristics of Graphical and Web User Interfaces
Operating System.
Fundamentals of Information Systems
Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.
FOP: Buttons and Events
11.10 Human Computer Interface
Automatic Speech Recognition
Dialog Design 3 How to use a PDA
Difficulties in Expert System Development
GESTURE RECOGNITION TECHNOLOGY
Architecture Background
Understanding the Communication Process
Operating Systems What are they and why do we need them?
Lesson 1: Buttons and Events – 12/18
                      Digital Audio 1.
Speech Recognition There are different kinds of voice or speech "engines" that take the sounds of your voice and match it with words. The engine is software.
CS 2610 Project Presentation Presented By- Zuha Agha and Tazin Afrin
Dialog Design 4 Speech & Natural Language
SILENT SOUND TECHNOLOGY
Communication and technology
Customer Service Training
Operating Systems Lecture 3.
IST346: Operating Systems / Command Line Interfaces
Understanding the Communication Process
Human and Computer Interaction (H.C.I.) &Communication Skills
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Human/Computer Interface
Chapter 9 System Control
Computer Vision Readings
Speech, language and communication (SLC)
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Speech User Interface 10/26/2010

Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.

Motivations Smaller devices  difficult I/O People can talk at 90 words/minute. “Virtually unlimited” set of commands Freedom for other body parts People drive and talk on the phone all the time Natural: evolutionarily selected for

Why are Speech UI Hard to Get Right? Speech recognition is far from perfect: imagine inputting commands w/ the mouse & getting the wrong result 5-20% of the time Speech UIs have no visible state: can ’ t see what you have done before or what affect your commands have had Speech UIs are hard to learn: how do you explore the interface? how do you find out what you can say?

Key Components Speech recognition the computer understanding what the customer is saying Speech production (or synthesis) the computer talking to the customer

Speech Recognition Continuous vs. non-continuous Speaker independent vs. dependent Speech often misunderstood by people feedback via speech, facial expressions, & gesture Recognizers trained with real samples often get gender-based problems Based on probabilities (HMMs - Bayes) trigrams of sounds or words Several popular recognizers Nuance, Dragon Naturally Speaking, IBM ViaVoice

Speech Production Also known as text-to-speech (TTS) TTS Demo (Mandarin) NTHU MIR Lab NTU CSIE GUTTS Bell Lab Demo 工研院資通所 科大訊飛

TTS Demo (English) AT & T Natural Voices Good evening, class. Today we are going to discuss an important type of human- computer interface: speech UI, also known as voice UI. We will demonstrate a TTS engine developed by AT & T, which, in my opinion, is the best TTS so far. Good evening, class. Today we are going to discuss an important type of human- computer interface: speech UI, also known as voice UI. We will demonstrate a TTS engine developed by AT & T, which, in my opinion, is the best TTS so far.

Recognition Problems Poor recognition humans < 1% error rate on dictation top recognition systems get 5-10% error rates computers don ’ t use much context Background noise even worse recognition rates (20-40% error) Slow simple matter of hardware getting faster in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs

More Recognition Problems Isolated, short words difficult common words become short Segmentation silly versus sill lea Spelling mail vs. male -> need to understand language What about Mandarin?

Speech UI Problems Major problems: modes (no feedback) certain commands only work when in specific states deep hierarchies (also known as voice mail hell) Verbose feedback wastes time/patience only confirm consequential things use meaningful, short cues Interruption half-duplex communication (i.e., no barge-in support) Too much speech on the part of customer is tiring Speech takes up space in working memory can cause problems when problem solving

Developing VUI VoiceXML VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer.W3CXML