Presentation is loading. Please wait.

Presentation is loading. Please wait.

German Research Center for Artificial Intelligence DFKI GmbH 66123 Saarbruecken, Germany WWW:http://www.dfki.de/~wahlster Eurospeech.

Similar presentations


Presentation on theme: "German Research Center for Artificial Intelligence DFKI GmbH 66123 Saarbruecken, Germany WWW:http://www.dfki.de/~wahlster Eurospeech."— Presentation transcript:

1

2 German Research Center for Artificial Intelligence DFKI GmbH 66123 Saarbruecken, Germany e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster Eurospeech 2001- Scandinavia Dialog Systems - Project Descriptions II Aalborg, 6 September 2001 Wolfgang Wahlster Anselm Blocher, Norbert Reithinger SmartKom: Multimodal Communication with a Life-like Character

3 © W. Wahlster Verbmobil SmartKom Today‘s Cell Phone Third Generation UMTS Phone Speech onlySpeech, Graphics and Gesture From Spoken Dialogue to Multimodal Dialogue

4 © W. Wahlster Spoken Dialogue Graphical User interfaces Gestural Interaction Multimodal Interaction Merging Various User Interface Paradigms

5 © W. Wahlster I‘d like to reserve tickets for this movie. Where would you like to sit? I‘d like these two seats. Multimodal Interaction with a Life-like Character User Input: Speech and Gesture Smartakus Output: Speech, Gesture and Facial Expressions User Input: Speech and Gesture

6 © W. Wahlster SmartKom: Multimodal Dialogs with a Life-like Character

7 © W. Wahlster SmartKom: Intuitive Multimodal Interaction MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor DFKI Saarbrücken The SmartKom Consortium: Project Budget: € 25.5 million Project Duration: 4 years (September 1999 – September 2003) Ulm

8 © W. Wahlster Salient Characteristics of SmartKom Seamless integration and mutual disambiguation of multimodal input and output on semantic and pragmatic levels Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models Adaptive generation of coordinated, cohesive and coherent multimodal presentations Semi- or fully automatic completion of user-delegated tasks through the integration of information services Intuitive personification of the system through a presentation agent

9 © W. Wahlster SmartKom-Home/Office: Multimodal Portal to Information Services SmartKom-Public: A Multimodal Communication Kiosk SmartKom-Mobile: A Handheld Communication Assistant Media Analysis Kernel of SmartKom Interface Agent Interaction Management Application Manage- ment Media Design SmartKom: A Transportable Interface Agent

10 © W. Wahlster Fujitsu Stylistic™ 3500X 500 MHz Intel ® Celeron ™ 10.4" XGA TFT (1024x768 Pixels) 256 MB SDRAM 15 GB shock-mounted SmartKom-Home on a Portable Webpad Provides electronic program guides (EPG) for TV, controls consumer electronics like VCRs, and accesses standard applications like phone and e-mail Lean-forward mode: coordinated speech and gesture input Lean-backward mode: voice input alone

11 © W. Wahlster can be added to a car navigation system or carried by a pedestrian Additional services like route planning interactive navigation through a city can be accessed via GPS and GSM/UMTS connectivity Smartkom-Mobile

12 © W. Wahlster SmartKom`s SDDP Interaction Metaphor SDDP = Situated Delegation-oriented Dialogue Paradigm User specifies goal delegates task cooperate on problems asks questions presents results Service 1 Service 2 Service 3 IT Services Personalized Interaction Agent

13 © W. Wahlster Visual Support for SDDP adaptation to the user’s viewing angle reduction of the association “screen  computer” (no background) spotlights guide and control the user’s attention

14 © W. Wahlster classic fixed isometric perspective completely variable user-adaptive perspective with limited variability The Perspective of the User

15 © W. Wahlster Decomposition of Behavioural Schemata: Phases of Gestures PreparationStroke Retraction

16 © W. Wahlster Some Complex Behavioural Patterns of the Interaction Agent Smartakus Examples of complex motion patterns, pointing gestures and co-speech gestures Enumerate five points Go in a circle Jumping on the spot The i shape of Smartakus reminds one of an „ i “ at information kiosks.

17 © W. Wahlster Multimodal Input and Output in SmartKom Input by the User Output by the Presentation agent Speech Gesture Facial Expressions + + + + + +

18 © W. Wahlster Semantic Representation Language Semantic Representation Language Face Description Language Face Description Language Gesture Description Language Gesture Description Language Ontologies Knowledge Representation Language Inference Component Knowledge Representation Language Inference Component DBMS/ KBMS/ WWW DBMS/ KBMS/ WWW Face Analysis Facial Expression Generation Gesture Analysis Gesture Generation Parsing Facial Expressions Facial Expressions Gestures Modality-Specific Representation Languages as an Intermediate Representation before Media Fusion Speech Input M3L based on XML

19 © W. Wahlster SmartKom‘s Data Collection of Multimodal Dialogs User Side-view Camera Face-tracking Camera with Microphone Environmental Noise Microphone Array Screen Projected Webpage Face-tracking Camera Loudspeaker Microphone Array User Bird’s-eye Camera LCD Beamer SIVIT- Camera See: Talk by U. Türk on the SmartKom Data Collection at 10.20 am in Session D11, ESE 11, Next Generation Speech Resources

20 © W. Wahlster Mobile Presentation Unit for SmartKom-Public 2 Sony DSR-PD100AP Video Cameras LCD-Beamer ASK C5 SIVIT Gesture Recognition Unit with Infrared Camera Microphones (Microphone Array) Speakers 3 Dual Pentiums III, 500 Visit the SmartKom Demo Booth, Next Demos: 10 am,12, 2pm and 4pm

21 © W. Wahlster The SmartKom Control GUI

22 © W. Wahlster The SmartKom Control GUI

23 © W. Wahlster The SmartKom Control GUI

24 © W. Wahlster The SmartKom Control GUI

25 © W. Wahlster The SmartKom Control GUI

26 © W. Wahlster The SmartKom Control GUI

27 © W. Wahlster The SmartKom Control GUI

28 © W. Wahlster The SmartKom Control GUI

29 © W. Wahlster The SmartKom Control GUI

30 © W. Wahlster The SmartKom Control GUI

31 © W. Wahlster Three Levels of Mark-up Languages for the Web Content : Structure : Form = 1 : n : m WWW Document Content Structure Form M3L XML HTML

32 © W. Wahlster [...] cinema_17a Europa 225 230 [...] 0.5542 0.1950 0.9892 0.7068 pid1234 [...] [...] cinema_17a Europa 225 230 [...] 0.5542 0.1950 0.9892 0.7068 pid1234 [...] M3L Representation of the Multimodal Discourse Context Blackboard with Presentation Context of the Previous Dialogue Turn

33 © W. Wahlster M3L Representation of the Word Lattice Produced by the Speech Recognizer for “ There [  ] I would like to get a reservation.“ 2000-12-07T13:44:37.900Z shortPause [...] 5 7 gern 6.51343 PT0.57S PT0.84S 5 7 gerne 6.19579 PT0.57S PT0.84S [...] 2000-12-07T13:44:37.900Z shortPause [...] 5 7 gern 6.51343 PT0.57S PT0.84S 5 7 gerne 6.19579 PT0.57S PT0.84S [...]

34 © W. Wahlster 2000-12-07T14:45:03.125 PT0.040S 2000-12-07T14:45:03.125 PT0.040S 0.872641 0.477261 tarrying dynamic Gesture Recognition and Gesture Analysis “There [  ] I would like to get a reservation.“ Gesture Lattice as Result of Gesture Recognition Result of Gesture Analysis [...] tarrying dynStructId30 1 dynStructId28 2 [...] cinema_17a Europa 225 230 [...] [...] tarrying dynStructId30 1 dynStructId28 2 [...] cinema_17a Europa 225 230 [...]

35 © W. Wahlster Language Analysis and Media Fusion: Turn8: “There [  ] I would like to get a reservation.“ [...] acoustic 60.95448 understanding 0.928571 reserve cinema_17a Europa [...] [...] acoustic 60.95448 understanding 0.928571 reserve cinema_17a Europa [...] Confidence in the Speech Recognition Result Confidence in the Speech Understanding Result Planning Act Object Reference

36 © W. Wahlster Result of the Action Planner: Presentation Tasks and Presentation Results list add [...] 20:00 [...] list add [...] 20:00 [...]

37 © W. Wahlster Output Synchronization: Speech, Gesture, Graphics, Animation 11 declarative [...] eine 2.1539 2.2829 Übersicht 2.2829 3.2997 [...] 11 declarative [...] eine 2.1539 2.2829 Übersicht 2.2829 3.2997 [...]

38 © W. Wahlster SmartKom uses a Combination of Concept-to- Speech and Text-to-Speech Technologies L*H H%L*HH*LH*L%

39 © W. Wahlster Classification of Facial Expressions (U. Erlangen) Localization Classification (SVM, Eigenfaces) Classification (SVM, Eigenfaces) Annoyance Annoyance No Annoyance

40 © W. Wahlster The GUI of the Second SmartKom Prototype

41 http://www.smartkom.org URL of this Presentation: http://www.dfki.de/~wahlster/eurospeech2001


Download ppt "German Research Center for Artificial Intelligence DFKI GmbH 66123 Saarbruecken, Germany WWW:http://www.dfki.de/~wahlster Eurospeech."

Similar presentations


Ads by Google