eNTERFACE 08 Project #1 “ MultiParty Communication with a Tour Guide ECA” Final presentation August 29th, 2008
Outline Project Overview Objectives, Issues & Work Done System Overview Configuration and Design Conclusion
Project Objectives Main objective: develop an ECA Tour Guide system which can interract with one or two users Research features: multiparty dialogue model and scenario between two humans and ECA handling and combining input data: users presence and behaviors (speech, tracking) gaze behaviors control and nonverbal model of ECA
Work done: Component Functionality Overview We implemented components which support scenario based on narration and interruptions ECA is narrator, users can ask context-related questions (“where”, “how”, “when”) speaker, addresse and listener identification, ECA gaze model ECA can ask users simple “yes/no” questions to keep attention System can detect users appearance and dynamically initiate/end session System can detect and handle situation when users are paying less attention System can recover from failure (e.g. SR does not recognize user’s speech)
Work done...about to be done... Components are implemented System is being integrated debugging and full testing is needed Not supported: Detection of situation when users are starting their conversation Detection of speech collision between users Smart scheduling and control of ECAs behaviors
System Configuration
Speech Recognition
Functionality: Detects users requests (“Where”, “How”, “When”, “Who”) Detects users willingness to leave the system Detects results of simple questioners (“yes/no”) Detects unknown words Implementation: Keywords detection with confidence score and speech duration is implemented by using Loquendo API
Nonverbal Inputs and Understanding
Nonverbal Inputs: Users appearance and face orientation Functionality of components: Detect motions and users appearance/disappearance Detect number of users present Detect users face orientation and increased/decreased attention left, right user Implementation: OpenCV (motion) & Okao Vision (face orientation, gazing)
Decision Making Component
Decision Making Component - Functionalities Makes decisions “when and what to do to whom”: Handles multimodal input events (number of users, attention, speech channels) Handles user interruptions while ECA is speaking Handles failures from SR component Generates multimodal output and controls ECA’s gazing Simple rule: “First one will be served” “yes”/”no” questionnaire is exception No domain knowledge and behavior scheduling
Decision Making Component - Implementation Decision Making Component component uses ideas from information state theory [Larsson’00] and AIML: The progress of dialogue is represented by a set of variables Most appropriate plans are selected and scheduled by simple inference Time control to obtain both messages from speech channels in case (“yes/no”) questions Component is being developed by using MIDIKI’s toolkit as reference
Animation Player
Functionality: Animation player uses scripted behaviors (GSML language) to generate speech and animation Model of gaze in a multiparty communication is supported: Gazing control is obtained on the utterance level Gaze pattern is following conversational rules (who is addresee, who is listener) Implementation: Visage SDK (based on MPEG-4 standard) 3ds Max
Conclusion Components to support context-based two party human - ECA communication are implemented System is being integrated, but not fully tested Component issues: missing face tracking and domain knowledge about users behaviors simple dialogue management and control (no smart scheduling and smart gaze control) Future directions: system debugging and testing, implement tracking, improve gazing control, study on users behaviors and gazing, system evaluation