Media Coordination in SmartKom Norbert Reithinger Dagstuhl Seminar “Coordination and Fusion in Multimodal Interaction” Deutsches Forschungszentrum für.

Slides:



Advertisements
Similar presentations
German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.
Advertisements

Co-funded by the European Union Semantic CMS Community IKS impact on DFKI research Final Review Luxembourg March 13/14, 2013 Tilman Becker DFKI GmbH.
SECOND MIDTERM REVIEW CS 580 Human Computer Interaction.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents.
MediaHub: An Intelligent Multimedia Distributed Hub Student: Glenn Campbell Supervisors: Dr. Tom Lunney Prof. Paul Mc Kevitt School of Computing and Intelligent.
RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA Paul Piwek, ITRI, Brighton Brigitte Krenn, OFAI, Vienna Marc Schröder,
1http://img.cs.man.ac.uk/stevens Interaction Models of Humans and Computers CS2352: Lecture 7 Robert Stevens
XISL language XISL= eXtensible Interaction Sheet Language or XISL=eXtensible Interaction Scenario Language.
German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Business Process Orchestration
DFKI Approach to Dialogue Management Norbert Reithinger, Elsa Pecourt, Markus Löckelt
Sunee Holland University of South Australia School of Computer and Information Science Supervisor: Dr G Stewart Von Itzstein.
German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
Architectural Design.
Software Development Process
ACL, ECCAI and the Verbmobil/SmartKom Consortia German Research Center for Artificial Intelligence Stuhlsatzenhausweg 3, Geb Saarbrücken Tel.:
1.1 1 Introduction Foundations of Computer Science  Cengage Learning.
1COM6030 Systems Analysis and Design © University of Sheffield 2005 COM 6030 Software Analysis and Design Lecture 4 - System modelling Dr Richard Clayton.
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
Conversational Applications Workshop Introduction Jim Larson.
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
` Tangible Interaction with the R Software Environment Using the Meuse Dataset Rachel Bradford, Landon Rogge, Dr. Brygg Ullmer, Dr. Christopher White `
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
1 Chapter 14 Architectural Design 2 Why Architecture? The architecture is not the operational software. Rather, it is a representation that enables a.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Affective Interfaces Present and Future Challenges Introductory statement by Antonio Camurri (Univ of Genoa) Marc Leman (Univ of Gent) MEGA IST Multisensory.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.
Centre for HCI Design INTERACT 2003 Tutorial Multimedia & the Web  Planning thematic threads through several media  Presentation layout: sequential or.
Multimodal Information Access Using Speech and Gestures Norbert Reithinger
German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.
Chapter 4 Finding out about tasks and work. Terminology GOAL: End result or objective TASK: An activity that a person has to do to accomplish a goal ACTION:
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.
German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.
Chennai, 17./18. Feb 04Andreas KlüterNLP System Software Engineering Verbmobil from a Software Engineering point of view System Design and Software Integration.
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
95-843: Service Oriented Architecture 1 Master of Information System Management Service Oriented Architecture Lecture 7: BPEL Some notes selected from.
7 Systems Analysis and Design in a Changing World, Fifth Edition.
German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
A Common Ground for Virtual Humans: Using an Ontology in a Natural Language Oriented Virtual Human Architecture Arno Hartholt (ICT), Thomas Russ (ISI),
Agenda 1. What we have done on which tasks 2. Further specification of work on all our tasks 3. Planning for deliverable writing this autumn (due in December)
User-System Interaction: from gesture to action Prof. dr. Matthias Rauterberg IPO - Center for User-System Interaction TU/e Eindhoven University of Technology.
1 Representing New Voice Services and Their Features Ken Turner University of Stirling 11th June 2003.
A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.
Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
Animated Speech Therapist for Individuals with Parkinson Disease Supported by the Coleman Institute for Cognitive Disabilities J. Yan, L. Ramig and R.
© Michèle Courant, University of Fribourg Projects.
DFKI GmbH, , R. Karger Perspectives for the Indo German Scientific and Technological Cooperation in the Field of Language Technology Reinhard.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
Language in Cognitive Science. Research Areas for Language Computational models of speech production and perception Signal processing for speech analysis,
PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing.
German Research Center for Artificial Intelligence DFKI GmbH Saarbruecken, Germany WWW: Eurospeech.
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
Chapter – 8 Software Tools.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
1 Seminar on SOA Seminar on Service Oriented Architecture BPEL Some notes selected from “Business Process Execution Language for Web Services” by Matjaz.
© W. Wahlster, DFKI IST ´98 Workshop „The Language of Business - the Business of Language“ Vienna, 2 December 1998 German Research Center for Artificial.
Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.
Chapter 9 System Control
Presentation transcript:

Media Coordination in SmartKom Norbert Reithinger Dagstuhl Seminar “Coordination and Fusion in Multimodal Interaction” Deutsches Forschungszentrum für Künstliche Intelligenz GmbH Stuhlsatzenhausweg 3, Geb Saarbrücken Tel.: (0681)

© NR Overview Situated Delegation-oriented Dialog Paradigm More About the System Software Media Coordination Issues Media Processing: The Data Flow Processing the User‘s State Media Fusion Media Design Conclusion

© NR The SmartKom Consortium MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor DFKI Saarbrücken Project Budget: € 25.5 million Project Duration: 4 years (September 1999 – September 2003) Ulm

© NR Situated Delegation-oriented Dialog Paradigm User specifies goal delegates task cooperate on problems asks questions presents results Service 1 Service 2 Service 3 IT Services Personalized Interaction Agent Smartakus

© NR More About the System

© NR More About the System Modules realized as independent processes Not all must be there (critical path: speech or graphic input to speech or graphic output) (Mostly) independent from display size Pool Communication Architecture (PCA) based on PVM for Linux and NT Modules know about their I/O pools Literature: –Andreas Klüter, Alassane Ndiaye, Heinz Kirchmann:Verbmobil From a Software Engineering Point of View: System Design and Software Integration. In Wolfgang Wahlster: Verbmobil - Foundation of Speech-To-Speech Translation. Springer, Data exchanged using M3L documents C:\Documents and Settings\bert\Desktop\SmartKom-Systeminfo\index.html C:\Documents and Settings\bert\Desktop\SmartKom-Systeminfo\index.html All modules and pools are visualized here...

© NR

© NR Media Coordination Issues Input: –Speech Words Prosody: boundaries, stress, emotion Mimics: neutral, anger –Gesture: Touch free (scenario public) Touch sensitive screen Output: –Display objects –Speech –Agent: posture, gesture, lip movement

© NR Media Processing: The Data Flow Display Objects with ref ID and Location Dialog-Core Presentation (Media Design) Media Fusion User StateDomain InformationSystem State Speech Agent‘s Posture and Behaviour Mimics (Neutral or Anger) Interaction Modeling Prosody (emotion) Gesture

© NR The Input/Output Modules

© NR Processing the User‘s State

© NR Processing the User‘s State User state: neutral and anger Recognized using mimics and prosody In case of anger activate the dynamic help in the Dialog Core Engine Elmar Nöth will hopefully tell you more about this in his talk Modeling the User State - The Role of Emotions

© NR Media Fusion

© NR Gesture Processing Objects on the screen are tagged with IDs Gesture input –Natural gestures recognized by SIVIT –Touch sensitive screen Gesture recognition –Location –Type of gesture: pointing, tarrying, encircling Gesture Analysis –Reference object in the display described as XML domain model (sub-)objects (M3L schemata) –Bounding box –Output: gesture lattice with hypotheses

© NR Speech Recognizer produces word lattice Prosody inserts boundary and stress information Speech analysis creates intention hypotheses with markers for deictic expressions Speech Processing

© NR Media Fusion Integrates gesture hypotheses in the intention hypotheses of speech analysis Information restriction possible from both media Possible but not necessary correspondence of gestures and placeholders (deictic expressions/ anaphora) in the intention hypothesis Necessary: Time coordination of gesture and speech information Time stamps in ALL M3L documents!! Output: sequence of intention hypothesis

© NR Media Design (Media Fission)

© NR Media Design Starts with action planning Definition of an abstract presentation goal Presentation planner: –Selects presentation, style, media, and agent‘s general behaviour –Activates natural language generator which activates the speech synthesis which returns audio data and time-stamped phoneme/viseme sequence Character Animation realizes the agent‘s behaviour Synchronized presentation of audio and visual information

© NR Lip Synchronization with Visemes Goal: present a speech prompt as natural as possible Viseme: elementary lip positions Correspondence of visemes and phonemes Examples:

© NR Behavioural Schemata Goal: Smartakus is always active to signal the state of the system Four main states –Wait for user‘s input –User‘s input –Processing –System presentation Current body movements –9 vital, 2 processing, 9 presentation (5 pointing, 2 movements, 2 face/mouth) –About 60 basic movements

© NR Conclusion Three implemented systems (Public, Home, Mobile) Media coordination implemented „Backbone“ uses declarative knowledge sources and is rather flexible Lot‘s remains to be done –Robustness –Complex speech expressions –Complex gestures (shape and timing) –Implementation of all user states –.... Reuse of modules in other contexts, e.g. in MIAMM