RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA Paul Piwek, ITRI, Brighton Brigitte Krenn, OFAI, Vienna Marc Schröder,

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

5th FP7 Networked Media Concertation Meeting, Brussels 3-4 February 2010 A unIfied framework for multimodal content SEARCH Short Project Overview.
TeleMorph & TeleTuras: Bandwidth determined Mobile MultiModal Presentation Student: Anthony J. Solon Supervisors: Prof. Paul Mc Kevitt Kevin Curran School.
Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Empirical and Data-Driven Models of Multimodality Advanced Methods for Multimodal Communication Computational Models of Multimodality Adequate.
MediaHub: An Intelligent Multimedia Distributed Hub Student: Glenn Campbell Supervisors: Dr. Tom Lunney Prof. Paul Mc Kevitt School of Computing and Intelligent.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.
Media Coordination in SmartKom Norbert Reithinger Dagstuhl Seminar “Coordination and Fusion in Multimodal Interaction” Deutsches Forschungszentrum für.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
MAGIC Seen from the Perspective of RAGS Kathleen R. McKeown Department of Computer Science Columbia University.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
WP1 UGOT demos 2nd year review Saarbrucken Mar 2006.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Emotional Intelligence and Agents – Survey and Possible Applications Mirjana Ivanovic, Milos Radovanovic, Zoran Budimac, Dejan Mitrovic, Vladimir Kurbalija,
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Prosody and NLP Seminar by Nikhil: Adith: Prachur: 06D05011 We have a presentation this Friday ?
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
1 Darmstadt, October 02, 2007 Amalia Ortiz Asociación VICOMTech Mikeletegi Pasealekua Donostia - San Sebastián (Gipuzkoa)
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Working group on multimodal meaning representation Dagstuhl workshop, Oct
Markup of Multimodal Emotion-Sensitive Corpora Berardina Nadja de Carolis, Univ. Bari Marc Schröder, DFKI.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney & Paul Mc Kevitt School of Computing and Intelligent Systems Faculty.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Affective Interfaces Present and Future Challenges Introductory statement by Antonio Camurri (Univ of Genoa) Marc Leman (Univ of Gent) MEGA IST Multisensory.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.
Multimodal Information Access Using Speech and Gestures Norbert Reithinger
1 Computational Linguistics Ling 200 Spring 2006.
May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
H.U. Hoppe: About the relation between C and C in CSCL H.U. Hoppe: About the relation between C and C in CSCL Part 1: ______________________________ Computational.
Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
Greta MPEG-4 compliant Script based behaviour generator system: Script based behaviour generator system: input - BML or APML input - BML or APML output.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
1 MPML and SCREAM: Scripting the Bodies and Minds of Life-Like Characters Soft computing Laboratory Yonsei University October 27, 2004.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
A Common Ground for Virtual Humans: Using an Ontology in a Natural Language Oriented Virtual Human Architecture Arno Hartholt (ICT), Thomas Russ (ISI),
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.
Introduction to Computational Linguistics
Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003.
Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.
Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.
Österreichisches Forschnungsinstitut für Artificial Intelligence Representational Lego for ECAs Brigitte Krenn.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
ENTERFACE 08 Project #1 “ MultiParty Communication with a Tour Guide ECA” Final presentation August 29th, 2008.
PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing.
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.
A New Approach to Decision-making within an Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Aiden Mc Caughey, Paul Mc Kevitt.
HUMAN MEDIA INTERACTION CREATIVE TECHNOLOGY FOR PLAY AND
KRISTINA Consortium Presented by: Mónica Domínguez (UPF-TALN)
Linguistic knowledge for Speech recognition
Functions of intonation 1
Multimodal Caricatural Mirror
Discourse & Dialogue CMSC October 28, 2004
Indian Institute of Technology Bombay
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA Paul Piwek, ITRI, Brighton Brigitte Krenn, OFAI, Vienna Marc Schröder, DFKI, Saarbrücken Martine Grice, IPUS, Saarbrücken Stefan Baumann, IPUS, Saarbrücken Hannes Pirker, OFAI, Vienna

NECA Duration: 2.5 years Start: October 2001 A new generation of mixed multi-user / multi agent virtual spaces for the internet Populated by affective conversational agents

Affective Conversational Agents Express themselves through –Emotional speech and –synchronised non-verbal expression

Application Scenarios Socialite –a multi-user web-application in the social domain eShowRoom –a novel approach to the presentation of products in e-Commerce applications The NECA Platform will be evaluated in two concrete application scenarios

Socialite

NECA’s Architecture Scene Generator User Input Scene Description Affective Reasoner (AR)

NECA’s Architecture Scene Generator User Input Scene Description Multi-modal Output Multi-modal Natural Language Generator (M-NLG) Affective Reasoner (AR)

NECA’s Architecture Scene Generator Text/Concept to Speech Synthesis (CTS) User Input Scene Description Multi-modal Output Multi-modal Natural Language Generator (M-NLG) Phonetic+Prosodic Information Affective Reasoner (AR) Emotional Speech

NECA’s Architecture Scene Generator Text/Concept to Speech Synthesis (CTS) User Input Scene Description Multi-modal Output Multi-modal Natural Language Generator (M-NLG) Gesture Assignment Module (GA) Phonetic+Prosodic Information Affective Reasoner (AR) Emotional Speech Animation directives

NECA’s Architecture Scene Generator Text/Concept to Speech Synthesis (CTS) User Input Scene Description Multi-modal Output Multi-modal Natural Language Generator (M-NLG) Gesture Assignment Module (GA) Animation Control Sequence Phonetic+Prosodic Information Affective Reasoner (AR) Emotional Speech Player-Specific Rendering Animation directives

NECA’s Architecture Scene Generator Text/Concept to Speech Synthesis (CTS) User Input Scene Description Multi-modal Output Multi-modal Natural Language Generator (M-NLG) Gesture Assignment Module (GA) Animation Control Sequence Phonetic+Prosodic Information Affective Reasoner (AR) Emotional Speech Player-Specific Rendering Animation directives RRL

Requirements for RRL Application Domain –Represent combinations of different types of information –Expressivity Processing Modules –Ease of manipulation/search (incremental/fast) Developers (Maintainability) –Predictability –Locality –Conciseness –Intelligibility

Scene Description SG M-NLG GA TTS/CTS What is a Scene? I Theatr. 1 A subdivision of (an act of) a play, in which the time is continuous and the setting fixed, …; the action and dialogue comprised in any one of these subdivisions. (New Shorter Oxford English Dictionary, 1996)

Scene Descriptions in a Nutshell Network representations: –Flat, uniform –Use the Description Logical T and A-box distinction. T-box defines types, subtypes, attributes and constants –Can emulate CFGs, so we can include, e.g., semantic representation languages: Discourse Representation Theory (Kamp & Reyle, 1994) –Reification of expressions in the network provide useful handles for interleaving different types of information –Lends itself well for graphical representation

Scene Descriptions in a Nutshell Further Features of (RRL) Scene Descriptions –For communication between modules: XML syntax –Temporal relations are explicitly represented. –Meta-conditions used in DRT for WH-questions, Topics and Bridging Anaphora

eShowRoom Example

Multimodal Output SG M-NLG GA TTS/CTS Multimodal Natural Language Generation (M- NLG) supplies –Information on emotional state –Conceptually rich input for Speech Synthesis –Initial specification of gestures and facial expressions for later use in Gesture Assignment

Neca’s Speech Synthesis: Emotions SG M-NLG GA TTS/CTS Not restricted to prosody (pitch, duration) Several voice databases –diphon-inventories for different voice qualities (modal, loud, soft) Emotive interjections Gradual emotional states –Shades of emotion / changing over time

Neca’s Speech Synthesis: Concept-to-Speech SG M-NLG GA TTS/CTS Concept-to-Speech instead of Text-to- Speech approach: –Part of Speech tags –Syntactic structure –Information status (given/new) –Information structure (theme/rheme)

CTS specific information SG M-NLG GA TTS/CTS This car has leather seats.

CTS specific information SG M-NLG GA TTS/CTS This car has leather seats.

CTS specific information SG M-NLG GA TTS/CTS This car has leather seats.

CTS specific information SG M-NLG GA TTS/CTS This car has leather seats.

CTS specific information SG M-NLG GA TTS/CTS This car has leather seats.

Prosodic/Phonetic Information for GA SG M-NLG GA TTS/CTS Phonetics –exact timing of speech sounds, pauses and interjections Prosody –boundarie locations for syllables words prosodic phrases

Prosodic/Phonetic Information for GA SG M-NLG GA TTS/CTS –information on: syllables bearing word-stress position and type of sentence accents position and type of prosodic boundaries

Animation directives SG M-NLG GA TTS/CTS Phonetic information (phonemes) used for specifying –Visemes –breathing

Animation directives SG M-NLG GA TTS/CTS Prosodic information (stress, accents, phrasing) used for specifying –synchronization of gestures with speech –eye-blinking –gaze

Conclusions RRL is representation language for wide range of expert knowledge required at interfaces of NECA modules. Scene Descriptions: uniform representation/integration of different types of information (illustrated with integration of DRT); using handles;… Speech Synthesis: conceptually rich input as opposed to text Gesture Assignment: access to exact timing of speech