German Research Center for Artificial Intelligence DFKI GmbH Saarbruecken, Germany WWW: Eurospeech Scandinavia Dialog Systems - Project Descriptions II Aalborg, 6 September 2001 Wolfgang Wahlster Anselm Blocher, Norbert Reithinger SmartKom: Multimodal Communication with a Life-like Character
© W. Wahlster Verbmobil SmartKom Today‘s Cell Phone Third Generation UMTS Phone Speech onlySpeech, Graphics and Gesture From Spoken Dialogue to Multimodal Dialogue
© W. Wahlster Spoken Dialogue Graphical User interfaces Gestural Interaction Multimodal Interaction Merging Various User Interface Paradigms
© W. Wahlster I‘d like to reserve tickets for this movie. Where would you like to sit? I‘d like these two seats. Multimodal Interaction with a Life-like Character User Input: Speech and Gesture Smartakus Output: Speech, Gesture and Facial Expressions User Input: Speech and Gesture
© W. Wahlster SmartKom: Multimodal Dialogs with a Life-like Character
© W. Wahlster SmartKom: Intuitive Multimodal Interaction MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor DFKI Saarbrücken The SmartKom Consortium: Project Budget: € 25.5 million Project Duration: 4 years (September 1999 – September 2003) Ulm
© W. Wahlster Salient Characteristics of SmartKom Seamless integration and mutual disambiguation of multimodal input and output on semantic and pragmatic levels Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models Adaptive generation of coordinated, cohesive and coherent multimodal presentations Semi- or fully automatic completion of user-delegated tasks through the integration of information services Intuitive personification of the system through a presentation agent
© W. Wahlster SmartKom-Home/Office: Multimodal Portal to Information Services SmartKom-Public: A Multimodal Communication Kiosk SmartKom-Mobile: A Handheld Communication Assistant Media Analysis Kernel of SmartKom Interface Agent Interaction Management Application Manage- ment Media Design SmartKom: A Transportable Interface Agent
© W. Wahlster Fujitsu Stylistic™ 3500X 500 MHz Intel ® Celeron ™ 10.4" XGA TFT (1024x768 Pixels) 256 MB SDRAM 15 GB shock-mounted SmartKom-Home on a Portable Webpad Provides electronic program guides (EPG) for TV, controls consumer electronics like VCRs, and accesses standard applications like phone and Lean-forward mode: coordinated speech and gesture input Lean-backward mode: voice input alone
© W. Wahlster can be added to a car navigation system or carried by a pedestrian Additional services like route planning interactive navigation through a city can be accessed via GPS and GSM/UMTS connectivity Smartkom-Mobile
© W. Wahlster SmartKom`s SDDP Interaction Metaphor SDDP = Situated Delegation-oriented Dialogue Paradigm User specifies goal delegates task cooperate on problems asks questions presents results Service 1 Service 2 Service 3 IT Services Personalized Interaction Agent
© W. Wahlster Visual Support for SDDP adaptation to the user’s viewing angle reduction of the association “screen computer” (no background) spotlights guide and control the user’s attention
© W. Wahlster classic fixed isometric perspective completely variable user-adaptive perspective with limited variability The Perspective of the User
© W. Wahlster Decomposition of Behavioural Schemata: Phases of Gestures PreparationStroke Retraction
© W. Wahlster Some Complex Behavioural Patterns of the Interaction Agent Smartakus Examples of complex motion patterns, pointing gestures and co-speech gestures Enumerate five points Go in a circle Jumping on the spot The i shape of Smartakus reminds one of an „ i “ at information kiosks.
© W. Wahlster Multimodal Input and Output in SmartKom Input by the User Output by the Presentation agent Speech Gesture Facial Expressions
© W. Wahlster Semantic Representation Language Semantic Representation Language Face Description Language Face Description Language Gesture Description Language Gesture Description Language Ontologies Knowledge Representation Language Inference Component Knowledge Representation Language Inference Component DBMS/ KBMS/ WWW DBMS/ KBMS/ WWW Face Analysis Facial Expression Generation Gesture Analysis Gesture Generation Parsing Facial Expressions Facial Expressions Gestures Modality-Specific Representation Languages as an Intermediate Representation before Media Fusion Speech Input M3L based on XML
© W. Wahlster SmartKom‘s Data Collection of Multimodal Dialogs User Side-view Camera Face-tracking Camera with Microphone Environmental Noise Microphone Array Screen Projected Webpage Face-tracking Camera Loudspeaker Microphone Array User Bird’s-eye Camera LCD Beamer SIVIT- Camera See: Talk by U. Türk on the SmartKom Data Collection at am in Session D11, ESE 11, Next Generation Speech Resources
© W. Wahlster Mobile Presentation Unit for SmartKom-Public 2 Sony DSR-PD100AP Video Cameras LCD-Beamer ASK C5 SIVIT Gesture Recognition Unit with Infrared Camera Microphones (Microphone Array) Speakers 3 Dual Pentiums III, 500 Visit the SmartKom Demo Booth, Next Demos: 10 am,12, 2pm and 4pm
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster The SmartKom Control GUI
© W. Wahlster Three Levels of Mark-up Languages for the Web Content : Structure : Form = 1 : n : m WWW Document Content Structure Form M3L XML HTML
© W. Wahlster [...] cinema_17a Europa [...] pid1234 [...] [...] cinema_17a Europa [...] pid1234 [...] M3L Representation of the Multimodal Discourse Context Blackboard with Presentation Context of the Previous Dialogue Turn
© W. Wahlster M3L Representation of the Word Lattice Produced by the Speech Recognizer for “ There [ ] I would like to get a reservation.“ T13:44:37.900Z shortPause [...] 5 7 gern PT0.57S PT0.84S 5 7 gerne PT0.57S PT0.84S [...] T13:44:37.900Z shortPause [...] 5 7 gern PT0.57S PT0.84S 5 7 gerne PT0.57S PT0.84S [...]
© W. Wahlster T14:45: PT0.040S T14:45: PT0.040S tarrying dynamic Gesture Recognition and Gesture Analysis “There [ ] I would like to get a reservation.“ Gesture Lattice as Result of Gesture Recognition Result of Gesture Analysis [...] tarrying dynStructId30 1 dynStructId28 2 [...] cinema_17a Europa [...] [...] tarrying dynStructId30 1 dynStructId28 2 [...] cinema_17a Europa [...]
© W. Wahlster Language Analysis and Media Fusion: Turn8: “There [ ] I would like to get a reservation.“ [...] acoustic understanding reserve cinema_17a Europa [...] [...] acoustic understanding reserve cinema_17a Europa [...] Confidence in the Speech Recognition Result Confidence in the Speech Understanding Result Planning Act Object Reference
© W. Wahlster Result of the Action Planner: Presentation Tasks and Presentation Results list add [...] 20:00 [...] list add [...] 20:00 [...]
© W. Wahlster Output Synchronization: Speech, Gesture, Graphics, Animation 11 declarative [...] eine Übersicht [...] 11 declarative [...] eine Übersicht [...]
© W. Wahlster SmartKom uses a Combination of Concept-to- Speech and Text-to-Speech Technologies L*H H%L*HH*LH*L%
© W. Wahlster Classification of Facial Expressions (U. Erlangen) Localization Classification (SVM, Eigenfaces) Classification (SVM, Eigenfaces) Annoyance Annoyance No Annoyance
© W. Wahlster The GUI of the Second SmartKom Prototype
URL of this Presentation: