Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

SAL (Sensitive Artificial Listener) Emotion induction technique developed at QUB.
Descriptive schemes for facial expression introduction.
Bruxelles, October 3-4, Interface Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented Environments IST Concertation.
The Extended Cohn-Kanade Dataset(CK+):A complete dataset for action unit and emotion-specified expression Author:Patrick Lucey, Jeffrey F. Cohn, Takeo.
DE L EARYOUS TRAINING INTERPERSONAL COMMUNICATION SKILLS USING UNCONSTRAINED TEXT INPUT Frederik Vaassen, Walter Daelemans Jeroen Wauters, Frederik Van.
Communicating with Robots using Speech: The Robot Talks (Speech Synthesis) Stephen Cox Chris Watkins Ibrahim Almajai.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
Human Language Technologies – Text-to-Speech © 2007 IBM Corporation Sixth Speech Synthesis Workshop, Bonn, Germany.August 22-24, 2007 Automatic Exploration.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Spoken Language Generation Project II Synthesizing Emotional Speech in Fairy Tales.
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
Natural Language Processing AI - Weeks 19 & 20 Natural Language Processing Lee McCluskey, room 2/07
MUSCLE movie data base is a multimodal movie corpus collected to develop content- based multimedia processing like: - speaker clustering - speaker turn.
Mark 洪偉翔 Andy 楊燿宇 Image Emotion.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Emotional Intelligence and Agents – Survey and Possible Applications Mirjana Ivanovic, Milos Radovanovic, Zoran Budimac, Dejan Mitrovic, Vladimir Kurbalija,
Sunee Holland University of South Australia School of Computer and Information Science Supervisor: Dr G Stewart Von Itzstein.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Facial Feature Detection
SSIP Project 2 GRIM GRINS Michal Hradis Ágoston Róth Sándor Szabó Ilona Jedyk Team 2.
STANDARDIZATION OF SPEECH CORPUS Li Ai-jun, Yin Zhi-gang Phonetics Laboratory, Institute of Linguistics, Chinese Academy of Social Sciences.
Project 10 Facial Emotion Recognition Based On Mouth Analysis SSIP 08, Vienna 1
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Affective Interfaces Present and Future Challenges Introductory statement by Antonio Camurri (Univ of Genoa) Marc Leman (Univ of Gent) MEGA IST Multisensory.
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
Prepared by: Waleed Mohamed Azmy Under Supervision:
APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.
Method Participants Participants were 68 preschoolers, between the ages of 29 and 59 months of age. The sample was comprised of 32 male participants and.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Multimodal Information Analysis for Emotion Recognition
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.
NESPOLE! is a project which aims at providing a system capable of supporting communication in the field of e-commerce and e-service by resorting to automatic.
Feedback Elisabetta Bevacqua, Dirk Heylen,, Catherine Pelachaud, Isabella Poggi, Marc Schröder.
Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4.
Design of PCA and SVM based face recognition system for intelligent robots Department of Electrical Engineering, Southern Taiwan University, Tainan County,
Performance Comparison of Speaker and Emotion Recognition
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
2D-CSF Models of Visual System Development. Model of Adult Spatial Vision.
G. Anushiya Rachel Project Officer
University of Rochester
Automated Detection of Human Emotion
Automatic Speech Recognition
INITIAL GOAL: Detecting personality based on interaction with Alexa
Voice conversion using Artificial Neural Networks
The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression By: Patrick Lucey, Jeffrey F. Cohn, Takeo.
The Development of Emotional Interactions Across the Senses:
Multimodal Caricatural Mirror
Project #2 Multimodal Caricatural Mirror Intermediate report
Presented by: Mónica Domínguez
AHED Automatic Human Emotion Detection
Automated Detection of Human Emotion
Language Transfer of Audio Word2Vec:
Presentation transcript:

Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen University College Dublin

Introduction Expressive speech synthesis in human interaction Speech-to-speech translation: audiovisual input, affective state does not need to be predicted from text Facial expression as an input annotation modality for affective speech-to-speech translation

Introduction Goal: Transferring paralinguistic information from source to target language by means of an intermediate, symbolic representation: facial expression as an input annotation modality. FEAST: Facial Expression-based Affective Speech Translation Facial expression as an input annotation modality for affective speech-to-speech translation

System Architecture of FEAST Facial expression as an input annotation modality for affective speech-to-speech translation

Face detection and analysis SHORE library for real-time face detection and analysis Facial expression as an input annotation modality for affective speech-to-speech translation

Emotion classification and style selection Aim of the facial expression analysis in FEAST system: a single decision regarding the emotional state of the speaker over each utterance Visual emotion classifier, trained on segments of the SEMAINE database, with input features from SHORE Facial expression as an input annotation modality for affective speech-to-speech translation

Expressive speech synthesis Expressive unit-selection synthesis using the open-source synthesis platform MARY TTS German male voice dfki-pavoque-styles : Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation

The SEMAINE database (semaine-db.eu)semaine-db.eu Audiovisual database collected to study natural social signals occurring in English conversations Conversations with four emotionally stereotyped characters: Poppy (happy, outgoing) Obadiah (sad, depressive) Spike (angry, confrontational) Prudence (even tempered, sensible) Facial expression as an input annotation modality for affective speech-to-speech translation

Evaluation experiments 1.Does the system accurately classify emotion on the utterance level, based on the facial expression in the video input? 2.Do the synthetic voice styles succeed in conveying the target emotion category? 3.Do listeners agree with the cross-lingual transfer of paralinguistic information from the multimodal stimuli to the expressive synthetic output? Facial expression as an input annotation modality for affective speech-to-speech translation

Experiment 1: Classification of facial expressions Support Vector Machine (SVM) classifier trained on utterances of the male operators from the SEMAINE database 535 utterances used for training, 107 for testing Facial expression as an input annotation modality for affective speech-to-speech translation

Experiment 2: Perception of expressive synthesis Perception experiment with 20 subjects Listen to natural and synthesised stimuli and choose which voice style describes the utterance best: Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation

Experiment 2: Results

Experiment 3: Adequacy for S2S translation Perceptual experiment with 14 bilingual participants 24 utterances from SEMAINE operator data and their corresponding translation in each voice style Listeners were asked to choose which German translation matches the original video best. Facial expression as an input annotation modality for affective speech-to-speech translation

NCADNCAD Examples - Poppy (happy) Facial expression as an input annotation modality for affective speech-to-speech translation

NCADNCAD Examples - Prudence (neutral) Facial expression as an input annotation modality for affective speech-to-speech translation

NCADNCAD Examples - Spike (angry) Facial expression as an input annotation modality for affective speech-to-speech translation

NCADNCAD Examples - Obadiah (sad) Facial expression as an input annotation modality for affective speech-to-speech translation

Experiment 3: Results Facial expression as an input annotation modality for affective speech-to-speech translation

Conclusion Preserving the paralinguistic content of a message across languages is possible with significantly greater than chance accuracy Visual emotion classifier performed with an overall 63.5% accuracy Cheerful/happy is often mistaken for neutral (conditioned by the voice) Facial expression as an input annotation modality for affective speech-to-speech translation

Future Work Extending the classifier to compute the prediction of the affective state of the user based on acoustic and prosodic analysis as well as facial expressions. Demonstration of the prototype system that takes live input through a webcamera and microphone. Integration of a speech recogniser and a machine translation component Facial expression as an input annotation modality for affective speech-to-speech translation

Questions?