Download presentation
Presentation is loading. Please wait.
Published byPauline Roberts Modified over 9 years ago
2
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen University College Dublin
3
Introduction Expressive speech synthesis in human interaction Speech-to-speech translation: audiovisual input, affective state does not need to be predicted from text Facial expression as an input annotation modality for affective speech-to-speech translation
4
Introduction Goal: Transferring paralinguistic information from source to target language by means of an intermediate, symbolic representation: facial expression as an input annotation modality. FEAST: Facial Expression-based Affective Speech Translation Facial expression as an input annotation modality for affective speech-to-speech translation
5
System Architecture of FEAST Facial expression as an input annotation modality for affective speech-to-speech translation
6
Face detection and analysis SHORE library for real-time face detection and analysis http://www.iis.fraunhofer.de/en/bf/bsy/produkte/shore/ Facial expression as an input annotation modality for affective speech-to-speech translation
7
Emotion classification and style selection Aim of the facial expression analysis in FEAST system: a single decision regarding the emotional state of the speaker over each utterance Visual emotion classifier, trained on segments of the SEMAINE database, with input features from SHORE Facial expression as an input annotation modality for affective speech-to-speech translation
8
Expressive speech synthesis Expressive unit-selection synthesis using the open-source synthesis platform MARY TTS German male voice dfki-pavoque-styles : Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation
9
The SEMAINE database (semaine-db.eu)semaine-db.eu Audiovisual database collected to study natural social signals occurring in English conversations Conversations with four emotionally stereotyped characters: Poppy (happy, outgoing) Obadiah (sad, depressive) Spike (angry, confrontational) Prudence (even tempered, sensible) Facial expression as an input annotation modality for affective speech-to-speech translation
10
Evaluation experiments 1.Does the system accurately classify emotion on the utterance level, based on the facial expression in the video input? 2.Do the synthetic voice styles succeed in conveying the target emotion category? 3.Do listeners agree with the cross-lingual transfer of paralinguistic information from the multimodal stimuli to the expressive synthetic output? Facial expression as an input annotation modality for affective speech-to-speech translation
11
Experiment 1: Classification of facial expressions Support Vector Machine (SVM) classifier trained on utterances of the male operators from the SEMAINE database 535 utterances used for training, 107 for testing Facial expression as an input annotation modality for affective speech-to-speech translation
12
Experiment 2: Perception of expressive synthesis Perception experiment with 20 subjects Listen to natural and synthesised stimuli and choose which voice style describes the utterance best: Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation
13
Experiment 2: Results
14
Experiment 3: Adequacy for S2S translation Perceptual experiment with 14 bilingual participants 24 utterances from SEMAINE operator data and their corresponding translation in each voice style Listeners were asked to choose which German translation matches the original video best. Facial expression as an input annotation modality for affective speech-to-speech translation
15
NCADNCAD Examples - Poppy (happy) Facial expression as an input annotation modality for affective speech-to-speech translation
16
NCADNCAD Examples - Prudence (neutral) Facial expression as an input annotation modality for affective speech-to-speech translation
17
NCADNCAD Examples - Spike (angry) Facial expression as an input annotation modality for affective speech-to-speech translation
18
NCADNCAD Examples - Obadiah (sad) Facial expression as an input annotation modality for affective speech-to-speech translation
19
Experiment 3: Results Facial expression as an input annotation modality for affective speech-to-speech translation
20
Conclusion Preserving the paralinguistic content of a message across languages is possible with significantly greater than chance accuracy Visual emotion classifier performed with an overall 63.5% accuracy Cheerful/happy is often mistaken for neutral (conditioned by the voice) Facial expression as an input annotation modality for affective speech-to-speech translation
21
Future Work Extending the classifier to compute the prediction of the affective state of the user based on acoustic and prosodic analysis as well as facial expressions. Demonstration of the prototype system that takes live input through a webcamera and microphone. Integration of a speech recogniser and a machine translation component Facial expression as an input annotation modality for affective speech-to-speech translation
22
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.