The Architecture Dream Team Schloss Dagshul, Germany October 2001.

Slides:

Advertisements

Similar presentations

CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.

Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.

Map of Human Computer Interaction

TeleMorph & TeleTuras: Bandwidth determined Mobile MultiModal Presentation Student: Anthony J. Solon Supervisors: Prof. Paul Mc Kevitt Kevin Curran School.

Co-funded by the European Union Semantic CMS Community IKS impact on DFKI research Final Review Luxembourg March 13/14, 2013 Tilman Becker DFKI GmbH.

Manuela Veloso, Anthony Stentz, Alexander Rudnicky Brett Browning, M. Bernardine Dias Faculty Thomas Harris, Brenna Argall, Gil Jones Satanjeev Banerjee.

A component- and message-based architectural style for GUI software

Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.

MediaHub: An Intelligent Multimedia Distributed Hub Student: Glenn Campbell Supervisors: Dr. Tom Lunney Prof. Paul Mc Kevitt School of Computing and Intelligent.

PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing.

XISL language XISL= eXtensible Interaction Sheet Language or XISL=eXtensible Interaction Scenario Language.

Media Coordination in SmartKom Norbert Reithinger Dagstuhl Seminar “Coordination and Fusion in Multimodal Interaction” Deutsches Forschungszentrum für.

The State of the Art in VoiceXML Chetan Sharma, MS Graduate Student School of CSIS, Pace University.

John Hu Nov. 9, 2004 Multimodal Interfaces Oviatt, S. Multimodal interfaces Mankoff, J., Hudson, S.E., & Abowd, G.D. Interaction techniques for ambiguity.

Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,

Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science

Stanford hci group / cs376 research topics in human-computer interaction Multimodal Interfaces Scott Klemmer 15 November 2005.

Multimodal Interaction. Modalities vs Media Modalities are ways of encoding information e.g. graphics Media are instantiations of modalities e.g. a particular.

ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen.

© Lethbridge/Laganière 2001 Chapter 7: Focusing on Users and Their Tasks1 7.1 User Centred Design (UCD) Software development should focus on the needs.

Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,

1/23 Applications of NLP. 2/23 Applications Text-to-speech, speech-to-text Dialogues sytems / conversation machines NL interfaces to –QA systems –IR systems.

DFKI Approach to Dialogue Management Norbert Reithinger, Elsa Pecourt, Markus Löckelt

Smart Learning Services Based on Smart Cloud Computing

Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

Chapter 7 Requirement Modeling : Flow, Behaviour, Patterns And WebApps.

GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.

Des Hommes de Parole ® WP Des Hommes de Parole ®

Intelligent Multimodal Interaction: Challenges and Promise Mark T. Maybury Schloss Dagstuhl, Germany 29 October 2001 MITRE

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Working group on multimodal meaning representation Dagstuhl workshop, Oct

DARPA Communicator: The Development of Advanced Dialog Systems Using Open Source Software Bryan George, Samuel Bayer Presented at July 27, 2001.

MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney & Paul Mc Kevitt School of Computing and Intelligent Systems Faculty.

Chapter 7. BEAT: the Behavior Expression Animation Toolkit

Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.

Multimodal Information Access Using Speech and Gestures Norbert Reithinger

SOFTWARE DESIGN AND ARCHITECTURE LECTURE 07. Review Architectural Representation – Using UML – Using ADL.

Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.

Conceptual Architecture of Mozilla Firefox (version ) Jared Haines Iris Lai John,Chun-Hung,Chiu Josh Fairhead June 5, 2007.

Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.

卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主持人 : 傅立成共同主持人 : 李琳山，歐陽明，洪一平，陳祝嵩水美溫泉會館研討會

NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.

16.0 Spoken Dialogues References: , Chapter 17 of Huang 2. “Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE,

ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.

Agenda 1. What we have done on which tasks 2. Further specification of work on all our tasks 3. Planning for deliverable writing this autumn (due in December)

DenK and iCat Two Projects on Cooperative Electronic Assistants (CEA’s) Robbert-Jan Beun, Rogier van Eijk & Huub Prüst Department of Information and Computing.

A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.

World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"

Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.

Chapter 8. Situated Dialogue Processing for Human-Robot Interaction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Sabaleuski.

1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.

TAUCHI – Tampere Unit for Computer-Human Interaction Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere MUMIN PhD course,

Intelligent MultiMedia Storytelling System (IMSS) - Automatic Generation of Animation From Natural Language Input By Eunice Ma Supervisor: Prof. Paul Mc.

PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing.

Design-Directed Programming Martin Rinard Daniel Jackson MIT Laboratory for Computer Science.

“Intelligent User Interfaces” by Hefley and Murray.

User Modeling for the Mars Medical Assistant MCS Project By Mihir Kulkarni.

Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.

W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.

What is Multimedia Anyway? David Millard and Paul Lewis.

Application architectures Advisor : Dr. Moneer Al_Mekhlafi By : Ahmed AbdAllah Al_Homaidi.

MULTIMEDIA AUTHORING AND USER INTERFACE

Presented By Sharmin Sirajudeen S7 CS Reg No :

KRISTINA Consortium Presented by: Mónica Domínguez (UPF-TALN)

Chapter 18 MobileApp Design

MediaHub: An Intelligent MultiMedia Distributed Platform Hub

Professor John Canny Spring 2003

Presentation transcript:

The Architecture Dream Team Schloss Dagshul, Germany October 2001

Page 2 Would you build your dream house without a blueprint?

Page 3 What you hope to get

Page 4 … what you might get

Page 5 User(s) Information Applications People Today’s Conventional Architecture Presentation Application Interface Dialog Control

Page 6 CHAMELEON Platform (Intelimedia Workbench) Paul McKevitt Speech synthesizer Speech recognizer Laser pointer Black board NL parser Microphone array Domain model Gesture recognizer Dialogue manager Frame semantics Topsy

Page 7 Microsoft Derek Jacoby MIPAD Architecture A Typical DrWho App

Page 8 Harry Bunt Context Input InterpretationOutput Synthesis Context Management Dialogue Management API Application Pending Context linguistic semantic physical perceptual cognitive social

Page 9 Art Exploration Oliviero Stock explicit input (e.g., pointing) input analyzer composer engine implicit input (e.g., movement) presentation Physical space model Hypermedia information visitor models interaction history Audio message to headphonelinks and image to UI

Page 10 COLLAGEN Sidner et al.

Page 11 IBM’s Responsive Information Architect (RIA) Michelle Zhou speech gesture Multimodal Interpreter Conversational Facilitator Presentation Broker Media Producer Visual Designer Language Designer Models of: Design Domain User Conversation Environment user IRIS Info Server

Page 12 Interact Kristiina Jokinen Input Manager Presentation Manager Dialogue Manager Task Agents/Acts Information Storage Database Dialogue Agents/Acts (e.g., Q, A, State) ASR Language Understanding Topic Recognition TTS Generator Agents

Page 13 EMBASSI Conceptual Architecture l Z-Axis: -Underlying HW-Infrastructure -Software-Infrastructure (Agent / Distr. Comp. Middleware) -Functional building blocks of conceptual architecture (Multimodal Assistant Componentware, MAC) -Application-level Assistants (not shown) l XY-Plane of MAC -Dialogic Assistance -Effectual Assistance -Situational Assistance -Explicit and implied generic (= application independent) ontologies, defining component interfaces

Page 14 A n Assistent X 2 Tuner Strg. X 3 EPG Strg. X 5 Display Strg. X 4 VCR Strg. X 1 Embassi Strg. G 1 VCR G 2 Set-top Box G 3 Display S 1 Biometrie S 2 Umgeb. Sensor Umgebungs / Situations DB User DB Applikations DB Resourcen DB I 1 GUI Input I 2 Sprach- erkennung O 1 Audio Ausgabe. O 2 Display F 1 GUI- analyse F 2 Sprach- analyse PMI (Medien- fusion) PMO (Präsen- tation) R 3 Textge- nerierung R 2 GUI Renderer R 1 Avatar- Controller Unimodale I/O Geräte “Lexik.” Ebene Multimod. Datenauf- bereitung “Syntakt.” Ebene Dialog- management Assistenz- methoden “Semant.” Ebene Strategie Ebene Gerätestrgs. Ebene Ausführungs- komponenten Geräteinfra- struktur GiGi XiXi … … D Dialog- mgr. I 3 Gestik- erkennung I 4 Blickricht. erkennung O 3 Avatar- Renderer F 2 Geräte- auswahl Kontext-Manager EMBASSI Architecture “Ich will das auf dem da aufnehmen!”

Page 15 SMARTKOM Wolfgang Wahlster

Page 16

Page 17 DARPA Galaxy Communicator Language Generation Language Generation Text-to-Speech Conversion Text-to-Speech Conversion Audio Server Audio Server Dialogue Management Dialogue Management Application Backend Application Backend Context Tracking Context Tracking Frame Construction Frame Construction Speech Recognition Speech Recognition Hub The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems Open source and documentation available at fofoca.mitre.org and sourceforge.net/projects/communicator

Page 18 An Example: Communicator-Compliant Emergency Management Interface MITRE I/O podium displays input and output text MITRE I/O podium displays input and output text MIT phone connectivity connects audio to a telephone line MIT phone connectivity connects audio to a telephone line Database MITRE SQL generation converts abstract requests to SQL MITRE dialogue management tracks information, decides what to do, and formulates answers Frame construction extracts information from input text Frame construction extracts information from input text Speech recognition converts speech to text Speech recognition converts speech to text Hub Text-to-speech converts output text to audio Text-to-speech converts output text to audio CMU Festival engine, Colorado wrapper MIT SUMMIT engine and wrapper Colorado Phoenix engine, MITRE wrapper Open source PostGres engine, MITRE wrapper

Page 19 Communicator Protocol l All communication is in terms of objects, which bear a message type and object type l Messages encoded in XDR (public domain data representation) broker object broker start broker end new message message reply error reply destroy reply postponement disconnect message type broker connection message sizeobject type string integer float frame list integer array (8, 16, 32., 64 bits) float array (32, 64 bits) object data

Page 20 Frames and Messages l A frame is an attribute-value structure consisting of a name, a frame type (always a clause), and a collection of pairs of keys and associated typed values (string, integer, float, frame, list, etc.) l Frames can be constructed using API calls or parsed from a string representation {c main :utterance_id 0 :domain “travel” } name frame type keys integer valuestring value l A message is a frame passed between the Hub and a server -A message can be new (initiating an action) or a reply

Page 21 Definitions l Abstract Architecture -Components, connections (protocols), and constraints (IEEE definition) -Data/knowledge structures, data flow and protocols, control flow -Consider use cases, e.g., l In-car navigation system l Desktop, kiosk, mobile device interaction l Media conversion

Page 22 Requirements l Functional -Modality integration (input and output) -Situation (User, task, application) appropriate real-time sensing/response (e.g., supporting barge-in, perceptual sensing/feedback) -Representation of level of granularity (modules and data structures) -Manage feedback - local and global, when/where? -Support incremental processing -Support incremental development (and scaleability) l System/Technical -Support for processing/fusing multimodal input (e.g., parallel processing) -Modular, composable (possibly distributed processing) -Efficient implementation -Time scale, Temporal and spatial resolution -Accessible (even partial) data structures -Open and extensible protocols

Page 23 Components l Media/mode Analysis -Multimodal fusion -Mutual disambiguation and reference resolution l Media/mode Design -Content selection, media design, allocation, coordination, layout l Discourse Management -Attention management -Selects dialogue act/interpretation -Error handling l Context Management -physical/spatial, temporal state l User Modeling -Capabilities, beliefs, intentions -User ID l Knowledge sources, states, histories available to all processes

User(s) Information Applications People Media Fusion Interaction Management Intention Recognition Discourse Modeling User Modeling Presentation Design Representation and Inference User Model Discourse Model Domain Model Task Model Media Models Media Analysis Media/Mode Analysis Language Graphics Gesture Biometrics Design Media/Mode Design Language Graphics Gesture Animated Presentation Agent Media Input Processing Media Output Rendering Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998) Presentation Dialog Control Application Interface Application Interface

Information, Applications, People User(s) User Modeling Discourse Management Intention Recognition Interaction Management Media/ Mode Analysis Language Graphics Gesture Sound Media Input Processing Media Output Rendering Architecture Context Management Lexicon Management User ID Biometrics Application Interface Integrate Respond Request Terminate Initiate T A V G G Mode Coordination Presentation Design Multimodal Reference Resolution Multimodal Fusion A A V G G Media/ Mode Design Language Graphics Gesture Sound Animated Presentation Agent Select Content Design Allocate Coordinate Layout User Model Discourse Model Domain Model Media Models Task Model Representation and Inference, States and Histories Application Models Context Model Reference Resolution Action Planning

The Architecture Dream Team Schloss Dagshul, Germany October 2001

Page 27 Media Fusion Media Fusion Media Analysis Media/Mode Analysis Spoken Language Lip Reading Gesture Media Fusion S V V

Page 28 COLLAGEN Sidner et al. Speech interpretation Planning and discourse Agent Application USER Speech Window events Student Model Mel ViaVoice