Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury Schloss Dagstuhl, Germany 29 October 2001 MITRE

Similar presentations


Presentation on theme: "Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury Schloss Dagstuhl, Germany 29 October 2001 MITRE"— Presentation transcript:

1 Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury maybury@mitre.org Schloss Dagstuhl, Germany 29 October 2001 MITRE www.mitre.org/resources/centers/it/maybury/mark.html This data is the copyright and proprietary data of the MITRE Corporation. It is made available subject to Limited Rights, as defined in paragraph (a) (15) of the clause at DFAR 252.227-7013. The restrictions governing the use and disclosure of these materials are set forth in the aforesaid clause.

2 MITRE Information What are we talking about? Information Perception Cognition Emotion Visualization Cognition Image Source: Dr. Nahum Gershon and Ellaine Mullen, Copyright The MITRE Corporation Speech Haptics/ Gesture Facial See Smell

3 MITRE Why Multimedia? Above modified from Cohen, P. 1992. The role of natural language in a multimodal interface. In Proceedings of ACM SIGGRAPH Symposium on User Interface and Software and Technology (UIST), Monterey, CA 143-149.

4 MITRE Why Multimedia? l Evidence users prefer both: -Flexibility (user, task, situation) - e.g., speech text, pen #’s -Efficiency and expressive power

5 Our Challenges l Empirical studies of the optimal combination of text, audio, video, gesture for device, cognitive load and style, task, etc. for both input and output - perceptual, cognitive, and emotional effect l Multi* Input -Integration of imprecise, ambiguous and incomplete input -Interpretation of uncertain multi* inputs l Multi* Output -Select, design, allocate, and realize coherent, cohesive and coordinated output l Interaction Management -Natural, joyous, agent-based (?) mixed initiative interaction l Integrative Architectures -Common components, well defined interfaces, levels of representation l Methodology and Evaluation -Ethnographic studies, community tasks, corpus-based

6 MITRE

7 MAPS Athens VIDEO Plato Aristotle NATURAL LANGUAGE Socrates, Plato, and Aristotle were Greek philosophers... Multimedia Presentation Generation: “No Presentation without Representation” PhilsopherBornDied Socrates470399 Plato428348 Aristotle384322 TABLES DATA PhilosopherAristotlePlatoSocrates Born384 BC428 BC470 BC Died322 BC348 BC399 BC WorksPoeticsNone Emphasis VirtueScienceConduct Republic GRAPHS Lifespan 500 450 400 350 300 BC Plato Aristotle Socrates ANIMATED AGENTS

8 MITRE Common Presentation Design Tasks l Co-constraining l Cascaded processes Communication Management Content Selection Presentation Design Media Allocation Media Realization Media Coordination Media Layout Length affects layout in space or time (e.g., EYP, audio) Information, task, user … Expressivity of different languages e.g., “ven aca” gesture

9 MITRE Common Representations: Communicative Acts [Maybury, 1993; Wahlster, Andre, Rist 1993]

10 MITRE User(s) Information Applications People Traditional Architecture Presentation Application Interface Dialog Control

11 User(s) Information Applications People Application Interface Media Fusion Interaction Management Intention Recognition Discourse Modeling User Modeling Presentation Design Representation and Inference User Model Discourse Model Domain Model Task Model Media Models Media Analysis Media/Mode Analysis Language Graphics Gesture Biometrics Design Media/Mode Design Language Graphics Gesture Animated Presentation Agent Media Input Processing Media Output Rendering Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998) Presentation Dialog Control Application Interface Integration Request Initiation Response

12 MITRE DARPA Galaxy Communicator Language Generation Language Generation Text-to-Speech Conversion Text-to-Speech Conversion Audio Server Audio Server Dialogue Management Dialogue Management Application Backend Application Backend Context Tracking Context Tracking Frame Construction Frame Construction Speech Recognition Speech Recognition Hub The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems Open source and documentation available at fofoca.mitre.org and sourceforge.net/projects/communicator

13 MITRE An Example: Communicator-Compliant Emergency Management Interface MITRE I/O podium displays input and output text MITRE I/O podium displays input and output text MIT phone connectivity connects audio to a telephone line MIT phone connectivity connects audio to a telephone line Database MITRE SQL generation converts abstract requests to SQL MITRE dialogue management tracks information, decides what to do, and formulates answers Frame construction extracts information from input text Frame construction extracts information from input text Speech recognition converts speech to text Speech recognition converts speech to text Hub Text-to-speech converts output text to audio Text-to-speech converts output text to audio CMU Festival engine, Colorado wrapper MIT SUMMIT engine and wrapper Colorado Phoenix engine, MITRE wrapper Open source PostGres engine, MITRE wrapper

14 AUDITORYVISUAL SENSORY passive words stimuli - fixation 2=primary auditory cortex OUTPUT repeat words - passive words ASSOCIATION generate use - repeat words (e.g., cake -> eat) SEMANTIC monitor semantic category - passive words 1.6 cm above ac-pc line a) - temporoparietal - bilateral superior - posterior temporal - inferior anterior cingulati non-speech audio No 2 b) occipital cortex (4cm above ac-pc line) c) d) Rolandic cortex (anterior superior motor cortex) (8 cm below) e) f) (inferior anterior frontal cortex, area 47 of Brodmann) - Left inferior frontal (semantic association) - Anterior cinguilati gyrus (attentional system for action selection, e.g. Pick dangerous animals) a b c d ef Source: Science or Nature Univ Washington

15 MITRE Evaluation Techniques l IUI harder than HCI evaluation -User influences interface behavior (i.e., user model) -Interface influences user behavior (e.g., critiquing, cooperating, challenging) -Varying task complexity, environment -Requires more careful evaluation l Many techniques -“Heuristic evaluation” - i.e., cognitive walk-through -Analytic/formal/theoretic (e.g., GOMS, CCT, ICS) l model resources required, task complexity, time to complete to predict performance, critique interface -Ablation studies -Wizard-of-oz, simulations -Instrumentation of live environments

16 MITRE Instrumented Evaluation Process Replay, Data Visualization, & Annotation Instrumented Interactive Application Indexed, Enriched Log Interaction Logging Corpus-based Adaptation Analysis and Evaluation Source: DARPA IC&V

17 MITRE WOSIT and COLAGEN Tutoring Agent: Collagen (MERL) Instrumentation: JOSIT End User Application: TARGETS observe simulate interpret perform communicate interact JOSIT: http://www.mitre.org/tech_transfer/josit/ WOSIT: http://www.mitre.org/technology/wosit/ Instrumentation Software

18 Summary: Our Challenges l Empirical studies of the optimal combination of text, audio, video, gesture for device, cognitive load and style, task, etc. for both input and output - perceptual, cognitive, and emotional effect l Multi* Input -Integration of imprecise, ambiguous and incomplete input -Interpretation of uncertain multi* inputs l Multi* Output -Select, design, allocate, and realize coherent, cohesive and coordinated output l Interaction Management -Natural, joyous, agent-based (?) mixed initiative interaction l Integrative Architectures -Common components, well defined interfaces, levels of representation l Methodology and Evaluation -Ethnographic studies, community tasks, corpus-based

19 MITRE Conclusion l Emerging techniques for parsing simultaneous multimedia input, generating coordinated multimedia output, tailoring interaction to the user, task, situation. l Laboratory prototypes that integrate these to support multimedia dialogue, agent-based interaction l Personalization increasing, privacy a concern l Range of application areas: decision support, information retrieval, education and training, entertainment l Potential benefits -Increase the raw bit rate of information flow (right media/modality mix for job) -Increase relevance of information (e.g., information selection, tailored presentation) -Simplify and speed task performance via interface agents (e.g., speech inflections, facial expressions, hand gestures, task delegation).


Download ppt "Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury Schloss Dagstuhl, Germany 29 October 2001 MITRE"

Similar presentations


Ads by Google