Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.

Slides:

Advertisements

Similar presentations

Breakout session B questions. Research directions/areas Multi-modal perception cognition and interaction Learning, adaptation and imitation Design and.

Advertisements

Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.

Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.

Universal Design for Learning (UDL). UD in Architecture a movement of designing structures with all potential users in mind incorporated access features.

MET CS 405/605 Human Computer Interface Design. Week 5 – Design  Interaction Style ~  Command Line  Menu Selection  Form Fill  Direct Manipulation.

1 CS 430: Information Discovery Lecture 22 Non-Textual Materials 2.

Design Activities in Usability Engineering laura leventhal and julie barnes.

John Hu Nov. 9, 2004 Multimodal Interfaces Oviatt, S. Multimodal interfaces Mankoff, J., Hudson, S.E., & Abowd, G.D. Interaction techniques for ambiguity.

Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.

Ambient Computational Environments Sprint Research Symposium March 8-9, 2000 Professor Gary J. Minden The University of Kansas Electrical Engineering and.

Stanford hci group / cs376 research topics in human-computer interaction Multimodal Interfaces Scott Klemmer 15 November 2005.

Queen Mary, University of London

An Overview of QuickSet, from OGI  Cohen, P. R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. (1997). QuickSet:

CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.

TAUCHI – Tampere Unit for Computer-Human Interaction Tampere Unit for Human-Computer Interaction University of Tampere Markku Turunen MUMIN workshop, Helsinki,

CSD 5230 Advanced Applications in Communication Modalities 7/3/2015 AAC 1 Introduction to AAC Orientation to Course Assessment Report Writing.

Complementary roles of AAC and speech. AAC and Speech AAC is useful for anyone who is unable to use speech to meet all communication needs – Across all.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo

MVC pattern and implementation in java

Chapter 11: Interaction Styles. Interaction Styles Introduction: Interaction styles are primarily different ways in which a user and computer system can.

Total Physical Response (TPR)

Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.

Activity 3: Multimodality HMI for Hands-free control of an intelligent wheelchair L. Wei, T. Theodovidis, H. Hu, D. Gu University of Essex 27 January 2012.

© 2007 Tom Beckman Features:  Are autonomous software entities that act as a user’s assistant to perform discrete tasks, simplifying or completely automating.

Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.

PERVASIVE COMPUTING MIDDLEWARE BY SCHIELE, HANDTE, AND BECKER A Presentation by Nancy Shah.

ICT 1 A multimodal context aware mobile maintenance terminal for noisy environments Fredrik Vraalsen Research scientist SINTEF MOBIS’04 – Oslo, 15/9-04.

“Show me what you meant”: Mode-switching prompts in a multi-modal dialog system with distractions Thomas Harris & Hua Ai October 25, 2005.

Introduction Infrastructure for pervasive computing has many challenges: 1)pervasive computing is a large aspect which includes hardware side (mobile phones,portable.

User-System Interaction: from gesture to action Prof. dr. Matthias Rauterberg IPO - Center for User-System Interaction TU/e Eindhoven University of Technology.

Introduction Advantage of DSP: - Better signal quality & repeatable performance - Flexible  Easily modified (Software Base) - Handle more complex processing.

E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.

Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.

Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

© 2005 Prentice-Hall, Inc. 4-1 Chapter 4 Communication.

Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:

Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.

Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.

WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.

The Process of Developing a Mobile Device for Communication in a Safety-Critical Domain Jesper Kjeldskov and Jan Stage Mobile HCI Research Group Department.

Giri.K.R [4jn08ec016] Harish.Kenchangowdar[4jn10ec401] Sandesh.S[4jn08ec043] Mahabusaheb.P[4jn09ec040]

Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.

NCP meeting Jan 27-28, 2003, Brussels Colette Maloney Interfaces, Knowledge and Content technologies, Applications & Information Market DG INFSO Multimodal.

Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.

Multimodal and Natural computer interaction Evelina Stanevičienė.

MULTIMODAL AND NATURAL COMPUTER INTERACTION Domas Jonaitis.

Characteristics of Graphical and Web User Interfaces

Ten Myths of Multimodal Interaction

Talking with computers

Total Physical Response (TPR)

Introduction Characteristics Advantages Limitations

Automatic Speech Recognition

DESIGNING WEB INTERFACE Presented By, S.Yamuna AP/CSE

Software engineering USER INTERFACE DESIGN.

Evaluation of a multimodal Virtual Personal Assistant Glória Branco

Pervasive Computing Happening?

Multimodal Human-Computer Interaction New Interaction Techniques 22. 1

Professor John Canny Spring 2003

Characteristics of Graphical and Web User Interfaces

A HCL Proprietary Utility for

Keyword Spotting Dynamic Time Warping

Professor John Canny Spring 2004

Presentation transcript:

Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments Sharon Oviatt

Center for Human Computer Communication Department of Computer Science, OG I 2 Introduction to Perceptive Multimodal Interfaces Multimodal interfaces recognize combined natural human input modes (speech & pen, speech & lip movements) Radical departure from GUIs in basic features, interface design & architectural underpinnings Rapid development in 1990s of bimodal systems New fusion & language processing techniques Diversification of mode combinations & applications More general & robust hybrid architectures

Center for Human Computer Communication Department of Computer Science, OG I 3 Advantages of Multimodal Interfaces Flexibility & expressive power Support for users’ preferred interaction style Accommodate more users,** tasks, environments** Improved error handling & robustness** Support for new forms of computing, including mobile & pervasive interfaces Permit multifunctional & tailored mobile interfaces, adapted to user, task & environment

Center for Human Computer Communication Department of Computer Science, OG I 4 The Challenge of Robustness: Unimodal Speech Technology’s Achilles’ Heel Recognition errors currently limit commercialization of speech technology, especially for: –Spontaneous interactive speech –Diverse speakers & speaking styles (e.g., accented) –Speech in natural field environments (e.g., mobile) 20-50% drop in accuracy typical for real-world usage conditions

Center for Human Computer Communication Department of Computer Science, OG I 5 Improved Error Handling in Flexible Multimodal Interfaces Users can avoid errors through mode selection Users’ multimodal language is simplified, which reduces complexity of NLP & avoids errors Users mode switch after system errors, which undercuts error spirals & facilitates recovery Multimodal architectures potentially can support “mutual disambiguation” of input signals

Example of Mutual Disambiguation: QuickSet Interface during Multimodal “PAN” Command

Processing & Architecture Speech & gestures processed in parallel Statistically ranked unification of semantic interpretations Multi-agent architecture coordinates signal recognition, language processing, & multimodal integration

Center for Human Computer Communication Department of Computer Science, OG I 8 General Research Questions To what extent can a multimodal system support mutual disambiguation of input signals? How much is robustness improved in a multimodal system, compared with a unimodal one? In what usage contexts and for what user groups is robustness most enhanced by a multimodal system? What are the asymmetries between modes in disambiguation likelihoods?

Center for Human Computer Communication Department of Computer Science, OG I 9 Study 1- Research Method Quickset testing with map-based tasks (community fire & flood management) 16 users— 8 native speakers & 8 accented (varied Asian, European & African accents) Research design— completely-crossed factorial with between-subjects factors: (1) Speaker status (accented, native) (2) Gender Corpus of 2,000 multimodal commands processed by QuickSet

Center for Human Computer Communication Department of Computer Science, OG I 10 Videotape Multimodal system processing for accented and mobile users

Center for Human Computer Communication Department of Computer Science, OG I 11 Study 1- Results 1 in 8 multimodal commands succeeded due to mutual disambiguation (MD) of input signals MD levels significantly higher for accented speakers than native ones— 15% vs 8.5% of utterances Ratio of speech to total signal pull-ups differed for users—.65 accented vs.35 native Results replicated across signal & parse-level MD

Center for Human Computer Communication Department of Computer Science, OG I 12 Table 1—Mutual Disambiguation Rates for Native versus Accented Speakers

Center for Human Computer Communication Department of Computer Science, OG I 13 Table 2- Recognition Rate Differentials between Native and Accented Speakers for Speech, Gesture and Multimodal Commands

Center for Human Computer Communication Department of Computer Science, OG I 14 Study 1- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 41.3% reduction in total speech error rate No gender or practice effects found in MD rates

Center for Human Computer Communication Department of Computer Science, OG I 15 Study 2- Research Method QuickSet testing with same 100 map-based tasks Main study: –16 users with high-end mic (close-talking, noise- canceling) –Research design completely-crossed factorial: (1) Usage Context- Stationary vs Mobile (within subjects) (2) Gender Replication: –6 users with low-end mic (built-in, no noise cancellation) –Compared stationary vs mobile

Center for Human Computer Communication Department of Computer Science, OG I 16 Study 2- Research Analyses Corpus of 2,600 multimodal commands Signal amplitude, background noise & SNR estimated for each command Mutual disambiguation & multimodal system recognition rates analyzed in relation to dynamic signal data

Center for Human Computer Communication Department of Computer Science, OG I 17 Mobile user with hand-held system & close- talking headset in moderately noisy environment (40-60 dB noise)

Center for Human Computer Communication Department of Computer Science, OG I 18 Mobile research infrastructure, with user instrumentation and researcher field station

Center for Human Computer Communication Department of Computer Science, OG I 19 Study 2- Results 1 in 7 multimodal commands succeeded due to mutual disambiguation of input signals MD levels significantly higher during mobile than stationary system use— 16% vs 9.5% of utterances Results replicated across signal and parse-level MD

Center for Human Computer Communication Department of Computer Science, OG I 20 Table 3- Mutual Disambiguation Rates during Stationary and Mobile System Use

Center for Human Computer Communication Department of Computer Science, OG I 21 Table 4- Recognition Rate Differentials during Stationary and Mobile System Use for Speech, Gesture and Multimodal Commands

Center for Human Computer Communication Department of Computer Science, OG I 22 Study 2- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 19-35% reduction in total speech error rate (for noise-canceling & built-in mics, respectively) No gender effects found in MD

Center for Human Computer Communication Department of Computer Science, OG I 23 Multimodal architectures can support mutual disambiguation & improved robustness over unimodal processing Error rate reduction can be substantial— 20-40% Multimodal systems can reduce or close the recognition rate gap for challenging users (accented speakers) & usage contexts (mobile) Error-prone recognition technologies can be stabilized within a multimodal architecture, which functionmore reliably in real-world contexts Conclusions

Center for Human Computer Communication Department of Computer Science, OG I 24 Future Directions & Challenges Intelligently adaptive processing, tailored for mobile usage patterns & diverse users Improved language & dialogue processing techniques, and hybrid multimodal architectures Novel mobile & pervasive multimodal concepts Break the robustness barrier— reduce error rate (For more information—