Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury Schloss Dagstuhl, Germany 29 October 2001 MITRE

Slides:



Advertisements
Similar presentations
Some Reflections on Augmented Cognition Eric Horvitz ISAT & Microsoft Research November 2000 Some Reflections on Augmented Cognition Eric Horvitz ISAT.
Advertisements

National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Map of Human Computer Interaction
TeleMorph & TeleTuras: Bandwidth determined Mobile MultiModal Presentation Student: Anthony J. Solon Supervisors: Prof. Paul Mc Kevitt Kevin Curran School.
Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents.
Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.
The Architecture Dream Team Schloss Dagshul, Germany October 2001.
MediaHub: An Intelligent Multimedia Distributed Hub Student: Glenn Campbell Supervisors: Dr. Tom Lunney Prof. Paul Mc Kevitt School of Computing and Intelligent.
From Digital Libraries and Multimedia Archives Towards Virtual Information and Knowledge Environments supporting Collective Memories Technology Platforms.
Component-Based Software Engineering Oxygen Paul Krause.
Media Coordination in SmartKom Norbert Reithinger Dagstuhl Seminar “Coordination and Fusion in Multimodal Interaction” Deutsches Forschungszentrum für.
John Hu Nov. 9, 2004 Multimodal Interfaces Oviatt, S. Multimodal interfaces Mankoff, J., Hudson, S.E., & Abowd, G.D. Interaction techniques for ambiguity.
Ambient Computational Environments Sprint Research Symposium March 8-9, 2000 Professor Gary J. Minden The University of Kansas Electrical Engineering and.
Psychological Aspects Presented by Hanish Patel. Overview  HCI (Human Computer Interaction)  Overview of HCI  Human Use of Computer Systems  Science.
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
ICS 463, Intro to Human Computer Interaction Design: 3. Perception Dan Suthers.
© Lethbridge/Laganière 2001 Chapter 7: Focusing on Users and Their Tasks1 7.1 User Centred Design (UCD) Software development should focus on the needs.
Developing Intelligent Agents and Multiagent Systems for Educational Applications Leen-Kiat Soh Department of Computer Science and Engineering University.
Lecture Nine Database Planning, Design, and Administration
Emotional Intelligence and Agents – Survey and Possible Applications Mirjana Ivanovic, Milos Radovanovic, Zoran Budimac, Dejan Mitrovic, Vladimir Kurbalija,
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
1. Human – the end-user of a program – the others in the organization Computer – the machine the program runs on – often split between clients & servers.
Chapter 7 Requirement Modeling : Flow, Behaviour, Patterns And WebApps.
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Steps Toward an AGI Roadmap Włodek Duch ( Google: W. Duch) AGI, Memphis, 1-2 March 2007 Roadmaps: A Ten Year Roadmap to Machines with Common Sense (Push.
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
` Tangible Interaction with the R Software Environment Using the Meuse Dataset Rachel Bradford, Landon Rogge, Dr. Brygg Ullmer, Dr. Christopher White `
Affective Interfaces Present and Future Challenges Introductory statement by Antonio Camurri (Univ of Genoa) Marc Leman (Univ of Gent) MEGA IST Multisensory.
APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.
Multimodal Information Access Using Speech and Gestures Norbert Reithinger
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
NC-BSI: 3.3 Data Fusion for Decision Support Problem Statement/Objectives: Problem - Accurate situation awareness requires rapid integration of heterogeneous.
Subtask 1.8 WWW Networked Knowledge Bases August 19, 2003 AcademicsAir force Arvind BansalScott Pollock Cheng Chang Lu (away)Hyatt Rick ParentMark (SAIC)
Screen design Week - 7. Emphasis in Human-Computer Interaction Usability in Software Engineering Usability in Software Engineering User Interface User.
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主 持 人 : 傅立成 共同主持人 : 李琳山,歐陽明,洪一平, 陳祝嵩 水美溫泉會館研討會
Fundamentals of Information Systems, Sixth Edition1 Natural Language Processing and Voice Recognition Processing that allows the computer to understand.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
User-System Interaction: from gesture to action Prof. dr. Matthias Rauterberg IPO - Center for User-System Interaction TU/e Eindhoven University of Technology.
Introduction to Computational Linguistics
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
Human Factors In Visualization Research Melanie Tory and Torsten Moller Ajith Radhakrishnan Nandu C Nair.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
NLG STEC Workshop April 20-21, 2007 Arlington, VA Nancy Green Univ. of North Carolina Greensboro, USA.
Contact : Bernadette Bouchon-Meunier, Patrick Gallinari, Jean-Gabriel Ganascia LIP6, UPMC, 8 rue du Capitaine Scott, Paris, France
Conceptual Design Dr. Dania Bilal IS588 Spring 2008.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
GUI Meets VUI: Some Possible Guidelines James A. Larson VP, Larson Technical Services 4/21/20151© 2015 Larson Technical Services.
8th CGF & BR Conference May 1999 Copyright 1999 Institute for Simulation & Training Synthetic Forces Behavioral Architecture Ian Page
Chapter 7 Affective Computing. Structure IntroductionEmotions Emotions & Computers Applications.
Intelligent MultiMedia Storytelling System (IMSS) - Automatic Generation of Animation From Natural Language Input By Eunice Ma Supervisor: Prof. Paul Mc.
Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.
PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing.
Real Time Collaboration and Sharing
Preparing for the 2008 Beijing Olympics : The LingTour and KNOWLISTICS projects. MAO Yuhang, DING Xiao-Qing, NI Yang, LIN Shiuan-Sung, Laurence LIKFORMAN,
“Intelligent User Interfaces” by Hefley and Murray.
Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
What is Multimedia Anyway? David Millard and Paul Lewis.
Design Evaluation Overview Introduction Model for Interface Design Evaluation Types of Evaluation –Conceptual Design –Usability –Learning Outcome.
Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.
Visual Information Retrieval
Data Warehousing and Data Mining
Multimodal Human-Computer Interaction New Interaction Techniques 22. 1
Map of Human Computer Interaction
Presentation transcript:

Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury Schloss Dagstuhl, Germany 29 October 2001 MITRE This data is the copyright and proprietary data of the MITRE Corporation. It is made available subject to Limited Rights, as defined in paragraph (a) (15) of the clause at DFAR The restrictions governing the use and disclosure of these materials are set forth in the aforesaid clause.

MITRE Information What are we talking about? Information Perception Cognition Emotion Visualization Cognition Image Source: Dr. Nahum Gershon and Ellaine Mullen, Copyright The MITRE Corporation Speech Haptics/ Gesture Facial See Smell

MITRE Why Multimedia? Above modified from Cohen, P The role of natural language in a multimodal interface. In Proceedings of ACM SIGGRAPH Symposium on User Interface and Software and Technology (UIST), Monterey, CA

MITRE Why Multimedia? l Evidence users prefer both: -Flexibility (user, task, situation) - e.g., speech text, pen #’s -Efficiency and expressive power

Our Challenges l Empirical studies of the optimal combination of text, audio, video, gesture for device, cognitive load and style, task, etc. for both input and output - perceptual, cognitive, and emotional effect l Multi* Input -Integration of imprecise, ambiguous and incomplete input -Interpretation of uncertain multi* inputs l Multi* Output -Select, design, allocate, and realize coherent, cohesive and coordinated output l Interaction Management -Natural, joyous, agent-based (?) mixed initiative interaction l Integrative Architectures -Common components, well defined interfaces, levels of representation l Methodology and Evaluation -Ethnographic studies, community tasks, corpus-based

MITRE

MAPS Athens VIDEO Plato Aristotle NATURAL LANGUAGE Socrates, Plato, and Aristotle were Greek philosophers... Multimedia Presentation Generation: “No Presentation without Representation” PhilsopherBornDied Socrates Plato Aristotle TABLES DATA PhilosopherAristotlePlatoSocrates Born384 BC428 BC470 BC Died322 BC348 BC399 BC WorksPoeticsNone Emphasis VirtueScienceConduct Republic GRAPHS Lifespan BC Plato Aristotle Socrates ANIMATED AGENTS

MITRE Common Presentation Design Tasks l Co-constraining l Cascaded processes Communication Management Content Selection Presentation Design Media Allocation Media Realization Media Coordination Media Layout Length affects layout in space or time (e.g., EYP, audio) Information, task, user … Expressivity of different languages e.g., “ven aca” gesture

MITRE Common Representations: Communicative Acts [Maybury, 1993; Wahlster, Andre, Rist 1993]

MITRE User(s) Information Applications People Traditional Architecture Presentation Application Interface Dialog Control

User(s) Information Applications People Application Interface Media Fusion Interaction Management Intention Recognition Discourse Modeling User Modeling Presentation Design Representation and Inference User Model Discourse Model Domain Model Task Model Media Models Media Analysis Media/Mode Analysis Language Graphics Gesture Biometrics Design Media/Mode Design Language Graphics Gesture Animated Presentation Agent Media Input Processing Media Output Rendering Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998) Presentation Dialog Control Application Interface Integration Request Initiation Response

MITRE DARPA Galaxy Communicator Language Generation Language Generation Text-to-Speech Conversion Text-to-Speech Conversion Audio Server Audio Server Dialogue Management Dialogue Management Application Backend Application Backend Context Tracking Context Tracking Frame Construction Frame Construction Speech Recognition Speech Recognition Hub The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems Open source and documentation available at fofoca.mitre.org and sourceforge.net/projects/communicator

MITRE An Example: Communicator-Compliant Emergency Management Interface MITRE I/O podium displays input and output text MITRE I/O podium displays input and output text MIT phone connectivity connects audio to a telephone line MIT phone connectivity connects audio to a telephone line Database MITRE SQL generation converts abstract requests to SQL MITRE dialogue management tracks information, decides what to do, and formulates answers Frame construction extracts information from input text Frame construction extracts information from input text Speech recognition converts speech to text Speech recognition converts speech to text Hub Text-to-speech converts output text to audio Text-to-speech converts output text to audio CMU Festival engine, Colorado wrapper MIT SUMMIT engine and wrapper Colorado Phoenix engine, MITRE wrapper Open source PostGres engine, MITRE wrapper

AUDITORYVISUAL SENSORY passive words stimuli - fixation 2=primary auditory cortex OUTPUT repeat words - passive words ASSOCIATION generate use - repeat words (e.g., cake -> eat) SEMANTIC monitor semantic category - passive words 1.6 cm above ac-pc line a) - temporoparietal - bilateral superior - posterior temporal - inferior anterior cingulati non-speech audio No 2 b) occipital cortex (4cm above ac-pc line) c) d) Rolandic cortex (anterior superior motor cortex) (8 cm below) e) f) (inferior anterior frontal cortex, area 47 of Brodmann) - Left inferior frontal (semantic association) - Anterior cinguilati gyrus (attentional system for action selection, e.g. Pick dangerous animals) a b c d ef Source: Science or Nature Univ Washington

MITRE Evaluation Techniques l IUI harder than HCI evaluation -User influences interface behavior (i.e., user model) -Interface influences user behavior (e.g., critiquing, cooperating, challenging) -Varying task complexity, environment -Requires more careful evaluation l Many techniques -“Heuristic evaluation” - i.e., cognitive walk-through -Analytic/formal/theoretic (e.g., GOMS, CCT, ICS) l model resources required, task complexity, time to complete to predict performance, critique interface -Ablation studies -Wizard-of-oz, simulations -Instrumentation of live environments

MITRE Instrumented Evaluation Process Replay, Data Visualization, & Annotation Instrumented Interactive Application Indexed, Enriched Log Interaction Logging Corpus-based Adaptation Analysis and Evaluation Source: DARPA IC&V

MITRE WOSIT and COLAGEN Tutoring Agent: Collagen (MERL) Instrumentation: JOSIT End User Application: TARGETS observe simulate interpret perform communicate interact JOSIT: WOSIT: Instrumentation Software

Summary: Our Challenges l Empirical studies of the optimal combination of text, audio, video, gesture for device, cognitive load and style, task, etc. for both input and output - perceptual, cognitive, and emotional effect l Multi* Input -Integration of imprecise, ambiguous and incomplete input -Interpretation of uncertain multi* inputs l Multi* Output -Select, design, allocate, and realize coherent, cohesive and coordinated output l Interaction Management -Natural, joyous, agent-based (?) mixed initiative interaction l Integrative Architectures -Common components, well defined interfaces, levels of representation l Methodology and Evaluation -Ethnographic studies, community tasks, corpus-based

MITRE Conclusion l Emerging techniques for parsing simultaneous multimedia input, generating coordinated multimedia output, tailoring interaction to the user, task, situation. l Laboratory prototypes that integrate these to support multimedia dialogue, agent-based interaction l Personalization increasing, privacy a concern l Range of application areas: decision support, information retrieval, education and training, entertainment l Potential benefits -Increase the raw bit rate of information flow (right media/modality mix for job) -Increase relevance of information (e.g., information selection, tailored presentation) -Simplify and speed task performance via interface agents (e.g., speech inflections, facial expressions, hand gestures, task delegation).