German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

Slides:

Advertisements

Similar presentations

CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.

Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.

Chapter 11 Designing the User Interface

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

Co-funded by the European Union Semantic CMS Community IKS impact on DFKI research Final Review Luxembourg March 13/14, 2013 Tilman Becker DFKI GmbH.

Talk and Look: Tools for Ambient Linguistic Knowledge A Project funded by the European Community under the Sixth Framework Programme for Research and Technological.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.

© Löckelt, Becker, Pfleger, Alexandersson; DFKI Edilog 2002 Workshop Jan Alexandersson (Tilman Becker, Markus Löckelt, Norbert Pfleger) German Research.

ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents.

Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.

MediaHub: An Intelligent Multimedia Distributed Hub Student: Glenn Campbell Supervisors: Dr. Tom Lunney Prof. Paul Mc Kevitt School of Computing and Intelligent.

Chapter 5 Input and Output. What Is Input? What is input? p. 166 Fig. 5-1 Next  Input device is any hardware component used to enter data or instructions.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

Media Coordination in SmartKom Norbert Reithinger Dagstuhl Seminar “Coordination and Fusion in Multimodal Interaction” Deutsches Forschungszentrum für.

CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.

Industrial Ontologies Group Oleksiy Khriyenko, Vagan Terziyan INDIN´04: 24th – 26th June, 2004, Berlin, Germany OntoSmartResource: An Industrial Resource.

AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.

DFKI Approach to Dialogue Management Norbert Reithinger, Elsa Pecourt, Markus Löckelt

Adapted from CTAE Resources Network PROFITT Curriculum Basic Computer Skills Module 1 Hardware.

Building the Design Studio of the Future Aaron Adler Jacob Eisenstein Michael Oltmans Lisa Guttentag Randall Davis October 23, 2004.

Smart Learning Services Based on Smart Cloud Computing

Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.

The Future and Accessibility OZeWAI Conference 2011 Jacqui van Teulingen Director, Web Policy 1.

Chapter 7 Requirement Modeling : Flow, Behaviour, Patterns And WebApps.

Introduction to Computers

Chapter 12 Designing the Inputs and User Interface.

Enabling enactive interaction in virtualized experiences Stefano Tubaro and Augusto Sarti DEI – Politecnico di Milano, Italy.

GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.

Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.

Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.

Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.

Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.

Working group on multimodal meaning representation Dagstuhl workshop, Oct

Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.

APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.

Multimodal Information Access Using Speech and Gestures Norbert Reithinger

-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

MULTIMEDIA DEFINITION OF MULTIMEDIA

Input By Hollee Smalley. What is Input? Input is any data or instructions entered into the memory of a computer.

CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s.

Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

Semantic Web - an introduction By Daniel Wu (danielwujr)

STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: (+49.

ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.

SEMANTIC AGENT SYSTEMS Towards a Reference Architecture for Semantic Agent Systems Applied to Symposium Planning Usman Ali.

A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.

Andreas Abecker Knowledge Management Research Group From Hypermedia Information Retrieval to Knowledge Management in Enterprises Andreas Abecker, Michael.

School of something FACULTY OF OTHER Facing Complexity Using AAC in Human User Interface Design Lisa-Dionne Morris School of Mechanical Engineering

Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.

1 Workshop « Multimodal Corpora » Jean-Claude MARTIN Patrizia PAGGIO Peter KÜEHNLEIN Rainer STIEFELHAGEN Fabio PIANESI.

Information Dynamics & Interoperability Presented at: NIT 2001 Global Digital Library Development in the New Millennium Beijing, China, May 2001, and DELOS.

DFKI GmbH, , R. Karger Perspectives for the Indo German Scientific and Technological Cooperation in the Field of Language Technology Reinhard.

1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

IoT Meets Big Data Standardization Considerations

PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing.

Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.

German Research Center for Artificial Intelligence DFKI GmbH Saarbruecken, Germany WWW: Eurospeech.

W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.

WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.

© W. Wahlster, DFKI IST ´98 Workshop „The Language of Business - the Business of Language“ Vienna, 2 December 1998 German Research Center for Artificial.

Knowledge Management Systems

Multimodal Human-Computer Interaction New Interaction Techniques 22. 1

Presentation transcript:

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg Saarbruecken, Germany phone: ( ) /4162 fax: ( ) WWW: Wolfgang Wahlster SmartKom: Dialog-based Human Computer Interaction by Coordinated Analysis and Generation of Multiple Modalities BMBF Status Conference "Human Computer Interaction" 2003 June 3, Berlin Symmetric Multimodality in an Adaptive and Reusable Dialogue Shell

© W. Wahlster Spoken Dialogue Graphical User interfaces Gestural Interaction Multimodal Interaction SmartKom: Merging Various User Interface Paradigms Facial Expressions Biometrics

© W. Wahlster The SmartKom Consortium MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor DFKI Saarbrücken Project duration: September 1999 – September 2003 Final presentation focusing on the mobile version: 5th September, Stuttgart Ulm

© W. Wahlster MAJOR SCIENTIFIC GOALS SmartKom‘s Major Scientific Goals Explore and design new symbolic and statistical methods for the seamless fusion and mutual disambiguation of multimodal input on semantic and pragmatic levels. Generalize advanced discourse models for spoken dialogue systems so that they can capture a broad spectrum of multimodal discourse phenomena. Explore and design new constraint-based and plan-based methods for multimodal fission and adaptive presentation layout. Integrate all these multimodal capabilities in a reusable, efficient and robust dialogue shell, that guarantees flexible configuration, domain independence and plug- and-play functionality.

© W. Wahlster Outline of the Talk 1.Towards Symmetric Multimodality 2.SmartKom: A Flexible and Adaptive Multimodal Dialogue Shell 3. Perception and Action under Multimodal Conditions 4. Multimodal Fusion and Fission in SmartKom 5. Ontological Inferences and the Three-Tiered Discourse Model of SmartKom 6. The Economic and Scientific Impact of SmartKom 7. Conclusions

© W. Wahlster Input Speech Gestures Facial Expressions Multimodal Fusion SmartKom Provides Full Symmetric Multimodality Symmetric multimodality means that all input modes (speech, gesture, facial expression) are also available for output, and vice versa. Challenge: A dialogue system with symmetric multimodality must not only understand and represent the user's multimodal input, but also its own multimodal output. Output Speech Gestures Facial Expressions Multimodal Fission USER SYSTEM The modality fission component provides the inverse functionality of the modality fusion component.

© W. Wahlster SmartKom Covers the Full Spectrum of Multimodal Discourse Phenomena Multimodal Discourse Phenomena mutual disambiguation of modalities multimodal deixis resolution and generation crossmodal reference resolution and generation multimodal turn-taking and backchannelling multimodal ellipsis resolution and generation multimodal anaphora resolution and generation Symmetric multimodality is a prerequisite for a principled study of these discourse phenomena.

© W. Wahlster Infrared Camera for Gestural Input, Tilting CCD Camera for Scanning, Video Projector Microphone Multimodal Control of TV-Set Multimodal Control of VCR/DVD Player Camera for Facial Analysis Projection Surface Speakers for Speech Output SmartKom’s Multimodal Input and Output Devices 3 dual Xeon 2.8 Ghz processors with 1.5 GB main memory

© W. Wahlster Smartkom‘s Control Panel

© W. Wahlster MM Dialogue Back- Bone Home: Consumer Electronics EPG Public: Cinema, Phone, Fax, Mail, Biometrics Mobile: Car and Pedestrian Navigation Application Layer SmartKom-Mobile Mobile Travel Companion that helps with navigation SmartKom-Public: Communication Companion that helps with phone, fax, , and authetification SmartKom-Home/Office: Infotainment Companion that helps select media content SmartKom: A Flexible and Adaptive Shell for Multimodal Dialogues

© W. Wahlster SmartKom`s SDDP Interaction Metaphor SDDP = Situated Delegation-oriented Dialogue Paradigm Anthropomorphic Interface = Dialogue Partner User specifies goal delegates task cooperate on problems asks questions presents results Service 1 Service 2 Service 3 Webservices Personalized Interaction Agent See: Wahlster et al. 2001, Eurospeech

© W. Wahlster SmartKom‘s Language Model and Lexicon is Augmented on the Fly with Named Entities Cinema Info - movie titles - actor names SmartKom‘s Basic Vocabulary 5500 Words TV Info - names of TV features - actor names Geographic Info - street names - names of points-of-interest e.g. all cinemas in one city > 200 new words e.g. TV programm of one day > 200 new words e.g. one city > more than 500 new names After a short dialogue sequence the lexicon includes > words.

© W. Wahlster Now you can remove the document. The German Federal President ing a Scanned Image with SmartKom’s Help

© W. Wahlster Please place your hand with spread fingers on the marked area. Interactive Biometric Authentication by Hand Contour Recognition

© W. Wahlster My name is Norbert Reithinger. I require authentication from you. I have found the record of Norbert Reithinger. I require a signature authentication for Norbert Reithinger. Please sign in the write-in field. The authentication was successful. I like to send a document to Wolfgang Wahlster. I have found the record for Wolfgang Wahlster. Please place the document on the marked area. Please remove it now. The documents was successfully scanned. The document has now been sent. Scanning a Document and Sending the Captured Image as an Attach- ment SmartKom bridges the full loop from multimodal perception to physical action:

© W. Wahlster Adaptive Perceptual Feedback on the System State

© W. Wahlster Unification of Scored Hypothesis Graphs for Modality Fusion in SmartKom Word Hypothesis Graph with Acoustic Scores Clause and Sentence Boundaries with Prosodic Scores Scored Hypotheses about the User‘s Emotional State Gesture Hypothesis Graph with Scores of Potential Reference Objects Intention Recognizer Selection of Most Likely Interpretation Modality Fusion Mutual Disambiguation Reduction of Uncertainty Intention Hypotheses Graph

© W. Wahlster […] acoustic gesture understanding set epg_info […] featureFilm Enemy of the State […] […] […] acoustic gesture understanding set epg_info […] featureFilm Enemy of the State […] […] Confidence in the Speech Recognition Result Confidence in the Gesture Recognition Result Planning Act Object Reference Confidence in the Speech Understanding Result M3L Representation of an Intention Lattice Fragment I would like to know more about this

© W. Wahlster Fusing Symbolic and Statistical Information in SmartKom Early Fusion on the Signal Processing Level Face Camera Microphone Facial Expressions Affective User State Emotional Prosody - anger - joy Multiple Recognizers for a Single Modality time-stamped and scored hypotheses Speech Signal Boundary Prosody Emotional Prosody Speech Recognition

© W. Wahlster SmartKom‘s Computational Mechanisms for Modality Fusion and Fission Modality Fusion Modality Fission Ontological Inferences Unification Overlay Operations Planning Constraint Propagation M3L: Modality-Free Semantic Representation

© W. Wahlster The Markup Language Layer Model of SmartKom M3L MultiModal Markup Language OIL Ontology Inference Layer XMLS eXtended Markup Language Schema RDFS Resource Description Framework Schema XML eXtended Markup Language RDF Resource Description Framework HTML Hypertext Markup Language

© W. Wahlster Personalization Mapping Digital Content Onto a Variety of Structures and Layouts From the “one-size fits-all“ approach of static presentations to the “perfect personal fit“ approach of adaptive multimodal presentations Structure XML 1 XML 2 XML n Content M3L Layout HTML 11 HTML 1m HTML 21 HTML 2o HTML 31 HTML 3p

© W. Wahlster The Role of the Semantic Web Language M3L M3L (Multimodal Markup Language) defines the data exchange formats used for communication between all modules of SmartKom M3L is partioned into 40 XML schema definitions covering SmartKom‘s discourse domains The XML schema event.xsd captures the semantic representation of concepts and processes in SmartKom‘s multimodal dialogs

© W. Wahlster OIL2XSD: Using XSLT Stylesheets to Convert an OIL Ontology to an XML Schema

© W. Wahlster Using Ontologies to Extract Information from the Web MyOnto-Movie :title :description :actors MyOnto-Person :name :birthday :director Film.de-Movie :title :description Kinopolis.de-Movie :name :critics :o-title :main actor Mapping of Metadata

© W. Wahlster I would like to send an to Dr.Reuse M3L as a Meaning Representation Language for the User‘s Input

© W. Wahlster Exploiting Ontological Knowledge to Understand and Answer the User‘s Queries T10:25:46 Schwarzenegger/name> Pro7 Which movies with Schwarzenegger are shown on the Pro7 channel?

© W. Wahlster SmartKom’s Multimodal Dialogue Back-Bone Communication Blackboards Data Flow Context Dependencies Analyzers External Services Modality Fusion Discourse Modeling Action Planning Modality Fission Generators Speech Gestures Facial Expressions Speech Graphics Gestures Dialogue Manager

© W. Wahlster list epg_browse now T19:42: T22:00: T19:50: T19:55:00 Today’s Stock News ARD …….. A Fragment of a Presentation Goal, as specified in M3L

© W. Wahlster Today's Stock News Everybody Loves Raymond The King of Queens Evening News Still Standing Yes, Dear Crossing Jordan Bonanza Passions Mr. Personality Down to Earth Weather Forecast Today Here is a listing of tonight's TV broadcasts. A Dynamically Generated Multimodal Presentation based on a Presentation Goal

© W. Wahlster Domain Layer Discourse Layer Modality Layer OO1 TV broadcasts on 20/3/2003 DO 1 DO 11 DO 12 DO 13 OO2 Broadcast of „The King of Queens“ on 20/3/2003 DO 2 DO 3 DO 4 DO 5 LO 5 third one LO 1 listing VO 1 GO 1 here (pointing) LO 2 tonight LO 3 TV broadcast LO 4 tape An Excerpt from SmartKom’s Three-Tiered Multimodal Discourse Model

© W. Wahlster Overlay Operations Using the Discourse Model Augmentation and Validation –compare with a number of previous discourse states: fill in consistent information compute a score –for each hypothesis - background pair: Overlay (covering, background) Covering: Background: Intention Hypothesis Lattice Selected Augmented Hypothesis Sequence

© W. Wahlster The Overlay Operation Versus the Unification Operation Nonmonotonic and noncommutative unification-like operation Inherit (non-conflicting) background information two sources of conflicts: –conflicting atomic values overwrite background (old) with covering (new) –type clash assimilate background to the type of covering; recursion Unification Overlay cf. J. Alexandersson, T. Becker 2001

© W. Wahlster Example for Overlay User: "What films are on TV tonight?" System: [presents list of films] User: "That‘s a boring program, I‘d rather go to the movies." How do we inherit “tonight” ?

© W. Wahlster Overlay Simulation Go to the moviesFilms on TV tonight Assimilation Background Covering

© W. Wahlster Overlay - Scoring Four fundamental scoring parameters: –Number of features from Covering (co) –Number of features from Background (bg) –Number of type clashes (tc) –Number of conflicting atomic values (cv) Codomain [-1,1] Higher score indicates better fit (1  overlay(c,b)  unify(c,b))

© W. Wahlster SmartKom‘s Presentation Planner The Presentation Planner generates a Presentation Plan by applying a set of Presentation Strategies to the Presentation Goal. GlobalPresent PresentAddSmartakus DoLayout EvaluatePersonaNode Inform TryToPresentTVOverview ShowTVOverview SetLayoutData ShowTVOverview SetLayoutData PersonaAction SendScreenCommand Generation of Layout Smartakus Actions GenerateText... Speak cf. J. Müller, P. Poller, V. Tschernomas 2002

© W. Wahlster Adaptive Layout and Plan-Based Animation in SmartKom‘s Multimodal Presentation Generator

© W. Wahlster Seamless integration and mutual disambiguation of multimodalinput and output on semantic and pragmatic levels Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models Adaptive generation of coordinated, cohesive and coherent multimodal presentations Semi- or fully automatic completion of user-delegated tasks through the integration of information services Intuitive personification of the system through a presentation agent Salient Characteristics of SmartKom

© W. Wahlster The Economic and Scientific Impact of SmartKom 51 patents + 29 spin-off products 13 speech recognition 10 dialogue management 6 biometrics 3 video-based interaction 2 multimodal interfaces 2 emotion recognition Economic Impact 246 publications 117 keynotes / invited talks 66 masters and doctoral theses 27 new projects use results 5 tenured professors 10 TV features 81 press articles Scientific Impact

© W. Wahlster The virtual mouse has been installed in a cell phone with a camera. When the user holds a normal pen about 30cm in front of the camera, the system recognizes the tip of the pen as a mouse pointer. A red point then appears at the the tip on the display. An Example of Technology Transfer: The Virtual Mouse

© W. Wahlster Former Employees of DFKI and Researchers from the SmartKom Consortium have Founded Five Start-up Companies Eyeled ( CoolMuseum GmbH ( Mineway GmbH ( Location-aware mobile information systems Multimodal systems for music rerieval Agent-based middleware Sonicson GmbH ( Quadox AG (

© W. Wahlster SmartKom’s Impact on International Standardization SmartKom‘s Multimodal Markup Language M3L Standard for Multimodal Content Representation Scheme ISO, TC37, SC4 Standard for Natural Markup Language w3.org/TR/nl-spec ISO W3C

© W. Wahlster SmartKom‘s Impact on Software Tools and Resources for Research on Multimodality MULTIPLATFORM Software Framework Sites all over Europe COMIC, EU, FP5 Conversational Multimodal Interaction with Computers 1.6 Terabytes 448 WOZ Sessions - audio transcripts - gesture and emotion labeling BAS ELRA LDC Germany Europe World

© W. Wahlster Various types of unification, overlay, constraint processing, planning and ontological inferences are the fundamental processes involved in SmartKom‘s modality fusion and fission components. The key function of modality fusion is the reduction of the overall uncertainty and the mutual disambiguation of the various analysis results based on a three-tiered representation of multimodal discourse. We have shown that a multimodal dialogue sytsem must not only understand and represent the user‘s input, but its own multimodal output. Conclusions