1 Speech User Interfaces 2 Outline Review Review Motivation for speech UIs Motivation for speech UIs Speech recognition Speech recognition UI problems.

Slides:



Advertisements
Similar presentations
Design, prototyping and construction
Advertisements

Communication Transferring information from one person to another. Communication is used to instruct, clarify interpret, notify, warn, receive feedback,
Natural Language Systems
Stanford hci group / cs376 Research Topics in Human-Computer Interaction Design Tools 01 November 2005.
Class 6 LBSC 690 Information Technology Human Computer Interaction and Usability.
HCI Issues in eXtreme Computing James A. Landay Endeavour-DARPA Meeting, 9/21/99.
Interface Design for ICT4B Speech, Dialects, and Interfaces Prof. Dan Klein and Prof. Marti Hearst.
ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen.
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.
ISTD 2003, Thoughts and Emotions Interactive Systems Technical Design Seminar work: Thoughts & Emotions Saija Gronroos Mika Rautanen Juha Sunnari.
1/20/20001 Human-Computer Interaction Research on the Endeavour Expedition James A. Landay Jack Chen, Jason Hong, Scott Klemmer, Francis Li, Mark Newman,
ITCS 6010 Speech Guidelines 1. Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
Speech User Interfaces
SM3121 Software Technology Mark Green School of Creative Media.
1 Damask A Tool for Early-Stage Design and Prototyping of Multi-Device User Interfaces G r o u p f o r User Interface Research University of California.
User Interface Design Tools for the Future Multimodal UI Research in the HCC James A. Landay Jason Hong, Scott Klemmer, Jimmy Lin, Mark Newman, & Anoop.
Web Design and Patterns CMPT 281. Outline Motivation: customer-centred design Web design introduction Design patterns.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
Chapter 11: Interaction Styles. Interaction Styles Introduction: Interaction styles are primarily different ways in which a user and computer system can.
Verbal Apraxia Marita Keane. What is Apraxia of Speech?  Apraxia of speech ( AOS ) is an oral motor speech disorder affecting an individual’s ability.
Using ICT to Support Students who are Deaf. 2 Professional Development and Support: Why? Isolation Unique and common problems Affirmation Pace of change.
Nonverbal Communication
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
Speaking to Computers Alex Acero Manager, Speech Research Group Microsoft Research Feb 14 th 2003.
Chapter 13 COMMUNICATION. CHAPTER 13 Communication Copyright © 2002 Prentice-Hall Communication The sharing of information between two or more individuals.
Computer –the machine the program runs on –often split between clients & servers Human-Computer Interaction (HCI) Human –the end-user of a program –the.
Speech User Interfaces Katherine Everitt CSE 490 JL Section Wednesday, Oct 27.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
Fall 2002CS/PSY Pervasive Computing Ubiquitous computing resources Agenda Area overview Four themes Challenges/issues Pervasive/Ubiquitous Computing.
Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Modal Interfaces & Speech User Interfaces Katherine Everitt CSE 490F Section Nov 20 & 21, 2006.
Special Education Software and Programs Demetrios Houmas
©2001 Southern Illinois University, Edwardsville All rights reserved. Today Fun with Icons Thursday Presentation Lottery Q & A on Final Exam Course Evaluations.
AVI/Psych 358/IE 340: Human Factors Interfaces and Interaction September 22, 2008.
Human-Computer Interaction
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
INFO 355Week #71 Systems Analysis II User and system interface design INFO 355 Glenn Booker.
Speech Interfaces User Interfaces Spring 1998 Drew Roselli.
University of Toronto at Scarborough © Kersti Wain-Bantin CSCC40 user dialogue 1 why good interface design? reduces input and usage errors lowers the cost.
1 User Interfaces for Pervasive Computing Devices Prof. James A. Landay January 7, 1999
MIT 6.893; SMA 5508 Spring 2004 Larry Rudolph Lecture Introduction Sketching Interface.
Introduction to IT Lecture – 01.
Input Design Lecture 11 1 BTEC HNC Systems Support Castle College 2007/8.
Stanford hci group / cs376 u Scott Klemmer · 26 October 2006 Design Tools.
1 Professional Communication. 1 Professional Communication.
Conceptual Design Dr. Dania Bilal IS588 Spring 2008.
Chapter 5:User Interface Design Concepts Of UI Interface Model Internal an External Design Evaluation Interaction Information Display Software.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Low-fidelity Prototyping. Outline Low-fidelity prototyping Wizard of OZ technique Informal user interfaces Sketching user interfaces electronically Informal.
Cs413_design02.ppt GUI Design The User Controls Navigation Traditional GUI design the designer can control where the user can go gray out menu options.
Stanford hci group / cs376 Research Topics in Human-Computer Interaction Design Tools Ron B. Yeh 26 October 2004.
Anywhere, Anytime, Anydevice Interfaces: Tools, Infrastructure, & Applications Summer 2002 BID/HCC Retreat for User Interface Research Group Prof. James.
Abdul Rauf1 Interaction Design and Evaluation Example Usability Engineering: Process, Products, and Examples Chapter 12.
COMMUNICATION SKILLS CHAPTER 9
Systems and User Interface Software. Types of Operating System  Single User  Multi User  Multi-tasking  Batch Processing  Interactive  Real Time.
Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.
6.S196 / PPAT: Principles and Practice of Assistive Technology Wed, 19 Sept Prof. Rob Miller Today: User-Centered Design [C&H Ch. 4]
EFFECTIVE PUBLIC SPEAKING HOW TO DELIVER YOUR SPEECH.
1. Chapter Preview Part 1 – Listening in the Classroom  Listening Skills: The Problem and the Goal  Listening Tasks in Class Part 2 – Listening outside.
Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.
Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.
MULTIMODAL AND NATURAL COMPUTER INTERACTION Domas Jonaitis.
Informal PUIs: No Recognition Required James Landay Jason Hong Scott Klemmer James Lin Mark Newman.
GESTURE RECOGNITION TECHNOLOGY
Design, prototyping and construction
Design Tools Jeffrey Heer · 7 May 2009.
Interactive Medium-Fi Prototype
Computer Vision Readings
Design, prototyping and construction
Presentation transcript:

1 Speech User Interfaces

2 Outline Review Review Motivation for speech UIs Motivation for speech UIs Speech recognition Speech recognition UI problems with speech UIs UI problems with speech UIs SpeechActs: Guidelines for speech UIs SpeechActs: Guidelines for speech UIs Speech UI design tools Speech UI design tools Multimodal UIs Multimodal UIs

3 Review Why do we prototype? Why do we prototype? get feedback on our design from customers – faster & cheaperget feedback on our design from customers – faster & cheaper Why use low-fi prototypes? Why use low-fi prototypes? traditional methods take too long & focus designers & customers on the wrong (visual) issuestraditional methods take too long & focus designers & customers on the wrong (visual) issues What is the Wizard of Oz technique? What is the Wizard of Oz technique? faking the interactionfaking the interaction What is the advantage of using informal tools like SILK, DENIM, & SUEDE? What is the advantage of using informal tools like SILK, DENIM, & SUEDE? advantages of electronic medium (editing, reuse, distribution, etc.)advantages of electronic medium (editing, reuse, distribution, etc.) faster than traditional UI toolsfaster than traditional UI tools do not focus designers/customers on the wrong issuesdo not focus designers/customers on the wrong issues ability to support testing & analysis of resulting dataability to support testing & analysis of resulting data

4 Motivation for Speech UIs: Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.

5 UIs in the Pervasive Computing Era Future computing devices won’t have the same UI as current PCs Future computing devices won’t have the same UI as current PCs wide range of deviceswide range of devices small or embedded in environment small or embedded in environment often w/ “alternative” I/O & w/o screens often w/ “alternative” I/O & w/o screens information appliances information appliances I-Land vision by Streitz, et. al.

6 Information Access via Speech Read my important

7 Industry Leaders Nuance Corporation Nuance Corporation Nuance Applications: TellMe, … Applications: TellMe, …TellMe Users: Government, Computers- Microsoft, IBM, Users: Government, Computers- Microsoft, IBM,

8 Speech UI Motivation Smaller devices -> difficult I/O Smaller devices -> difficult I/O people can talk at ~ 90 wpm -> high speedpeople can talk at ~ 90 wpm -> high speed “Virtually unlimited” set of commands “Virtually unlimited” set of commands Freedom for other body parts Freedom for other body parts imagine you are working on your car & need to know something from the manualimagine you are working on your car & need to know something from the manual Natural Natural evolutionarily selected forevolutionarily selected for reading, writing, & typing are not (too new) reading, writing, & typing are not (too new)

9 Why are Speech UIs Hard to Get Right? Speech recognition far from perfect Speech recognition far from perfect imagine inputting commands w/ the mouse & getting the wrong result 5- 20% of the timeimagine inputting commands w/ the mouse & getting the wrong result 5- 20% of the time Speech UIs have no visible state Speech UIs have no visible state can’t see what you have done before or what affect your commands have hadcan’t see what you have done before or what affect your commands have had Speech UIs are hard to learn Speech UIs are hard to learn how do you explore the interface? how do you find out what you can say?how do you explore the interface? how do you find out what you can say?

10 Speech recognition Speech recognition the computer understanding what the customer is sayingthe computer understanding what the customer is saying Speech production (or synthesis) Speech production (or synthesis) the computer talking to the customerthe computer talking to the customer Speech UIs Require

11 Speech Recognition Continuous vs. non-continuous Continuous vs. non-continuous Speaker independent vs. dependent Speaker independent vs. dependent Speech often misunderstood by people Speech often misunderstood by people feedback via speech, facial expressions, & gesturefeedback via speech, facial expressions, & gesture Recognizers trained with real samples Recognizers trained with real samples often get gender-based problemsoften get gender-based problems Based on probabilities (HMMs - Bayes) Based on probabilities (HMMs - Bayes) trigrams of sounds or wordstrigrams of sounds or words Several popular recognizers Several popular recognizers Nuance, SpeechWorks, IBM ViaVoiceNuance, SpeechWorks, IBM ViaVoice

12 Speech Production Three frequency regions of great intensity visible on oscilloscope Three frequency regions of great intensity visible on oscilloscope come from larynx, throat, mouthcome from larynx, throat, mouth Two needed for recognition but “tinny” Two needed for recognition but “tinny” Can generate emotion affect in speech Can generate emotion affect in speech DemoDemo anger, disgust, gladness, sadness, fear, & surprise n/emot-speech.html anger, disgust, gladness, sadness, fear, & surprise n/emot-speech.html n/emot-speech.html n/emot-speech.html

13 Recognition Problems Good recognition Good recognition humans < 1% error rate on dictationhumans < 1% error rate on dictation top recognition systems get <1-X% error ratestop recognition systems get <1-X% error rates computers don’t use much context computers don’t use much context Key is to be application specific for lower error rates Key is to be application specific for lower error rates Background noise Background noise even worse recognition rates (20-40% error)even worse recognition rates (20-40% error) Speed Speed Better as hardware getting fasterBetter as hardware getting faster in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs

14 More Recognition Problems Isolated, short words difficult Isolated, short words difficult common words become shortcommon words become short Segmentation Segmentation silly versus sill leasilly versus sill lea Spelling Spelling mail vs. male -> need to understand languag vs. male -> need to understand language

15 Speech UI Problems Speech UI no-nos Speech UI no-nos modes (no feedback)modes (no feedback) certain commands only work when in specific states certain commands only work when in specific states deep hierarchies (aka voice mail hell)deep hierarchies (aka voice mail hell) Verbose feedback wastes time/patience Verbose feedback wastes time/patience only confirm consequential thingsonly confirm consequential things use meaningful, short cuesuse meaningful, short cues Interruption Interruption half-duplex communication (i.e., no barge-in support)half-duplex communication (i.e., no barge-in support) Too much speech on the part of customer is tiring Too much speech on the part of customer is tiring Speech takes up space in working memory Speech takes up space in working memory can cause problems when problem solvingcan cause problems when problem solving

16 SpeechActs: Guidelines for Speech UIs Speech interface to computer tools Speech interface to computer tools , calendar, weather, stock quotes , calendar, weather, stock quotes Establish common ground & shared context Establish common ground & shared context make sure people know where they are in the conversationmake sure people know where they are in the conversation Pacing Pacing recog. delays are unnatural, make it clear when this occursrecog. delays are unnatural, make it clear when this occurs barge-in lets user interrupt like in real conversationsbarge-in lets user interrupt like in real conversations tapering of promptstapering of prompts progressive assistance: short errors messages at first, longer when user needs more helpprogressive assistance: short errors messages at first, longer when user needs more help implicit confirmation: include confirm in next commandimplicit confirmation: include confirm in next command

SpeechActs Video

18 Announcements Task analysis / Contextual inquiry HW Task analysis / Contextual inquiry HW average = 79/100, stdev. 8.4average = 79/100, stdev. 8.4 Low-fi user test due Monday Low-fi user test due Monday questionsquestions If you haven’t gotten a laptop yet, check with Wai-ling after class If you haven’t gotten a laptop yet, check with Wai-ling after class

19 SUEDE: Low-fi Prototyping for Speech-based UIs Supports design practice Supports design practice example scripts Wizard of Oz error simulation iterative design ( design-test- analysis ) Informal user interface Informal user interface no speech recognition/synthesis need not be programming expert fast & fluid design

machine prompt user response

21

22

23 SUEDE Summary SUEDE supports speech-based UI design SUEDE supports speech-based UI design moving from concrete examples to abstractionsmoving from concrete examples to abstractions allows designer to accept responses that aren’t exactly what they originally had in mindallows designer to accept responses that aren’t exactly what they originally had in mind embeds iterative design w/ design-test- analyzeembeds iterative design w/ design-test- analyze Designers using SUEDE need not be experts in speech recognition technology Designers using SUEDE need not be experts in speech recognition technology

24 One Vision of Future User Interfaces Star Trek style UI Star Trek style UI verbally ask the computer for informationverbally ask the computer for information may be common in mobile/hands-busy situationsmay be common in mobile/hands-busy situations problem: hard to design, build, & use!problem: hard to design, build, & use! requires perfect speech recognition & language understanding requires perfect speech recognition & language understanding

25 Our Vision of Future User Interfaces Multimodal, Context-aware UIs Multimodal, Context-aware UIs multimodalmultimodal uses multiple input modalities (speech & gesture) to disambiguate uses multiple input modalities (speech & gesture) to disambiguate user says “move it to this screen” while pointing user says “move it to this screen” while pointing context-awarecontext-aware apps can be aware of location, user, what they are doing, … apps can be aware of location, user, what they are doing, … people are talking -> don’t rely on speech I/O people are talking -> don’t rely on speech I/O Problem: how to prototype & test new ideas? Problem: how to prototype & test new ideas? Informal UI Design Tools!Informal UI Design Tools! combine Wizard of Oz & informal storyboarding combine Wizard of Oz & informal storyboarding

26 Multimodal Error Correction Dictation error correction study Dictation error correction study found users are better at correcting recognition errors with a different input modalityfound users are better at correcting recognition errors with a different input modality recognizer got it wrong the first time -> it will get it wrong the second timerecognizer got it wrong the first time -> it will get it wrong the second time hyperarticulating aggravates hyperarticulating aggravates Correct dictation errors with Correct dictation errors with vocal spelling, writing, typing, etcvocal spelling, writing, typing, etc

27 Summary Speech UIs Speech UIs may permit more natural computer accessmay permit more natural computer access allow us to use computers in more situationsallow us to use computers in more situations are hard to get to work wellare hard to get to work well lack of visible state, tax working memory, recognition problems, etc. lack of visible state, tax working memory, recognition problems, etc. UI tools are needed for speech UI design UI tools are needed for speech UI design Multimodal UIs address some of the problems with pure speech UIs Multimodal UIs address some of the problems with pure speech UIs help disambiguatehelp disambiguate help w/ correctionhelp w/ correction