Presentation is loading. Please wait.

Presentation is loading. Please wait.

Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.

Similar presentations


Presentation on theme: "Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen."— Presentation transcript:

1

2 Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen University College Dublin

3 Introduction Expressive speech synthesis in human interaction Speech-to-speech translation: audiovisual input, affective state does not need to be predicted from text Facial expression as an input annotation modality for affective speech-to-speech translation

4 Introduction Goal: Transferring paralinguistic information from source to target language by means of an intermediate, symbolic representation: facial expression as an input annotation modality. FEAST: Facial Expression-based Affective Speech Translation Facial expression as an input annotation modality for affective speech-to-speech translation

5 System Architecture of FEAST Facial expression as an input annotation modality for affective speech-to-speech translation

6 Face detection and analysis SHORE library for real-time face detection and analysis http://www.iis.fraunhofer.de/en/bf/bsy/produkte/shore/ Facial expression as an input annotation modality for affective speech-to-speech translation

7 Emotion classification and style selection Aim of the facial expression analysis in FEAST system: a single decision regarding the emotional state of the speaker over each utterance Visual emotion classifier, trained on segments of the SEMAINE database, with input features from SHORE Facial expression as an input annotation modality for affective speech-to-speech translation

8 Expressive speech synthesis Expressive unit-selection synthesis using the open-source synthesis platform MARY TTS German male voice dfki-pavoque-styles : Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation

9 The SEMAINE database (semaine-db.eu)semaine-db.eu Audiovisual database collected to study natural social signals occurring in English conversations Conversations with four emotionally stereotyped characters: Poppy (happy, outgoing) Obadiah (sad, depressive) Spike (angry, confrontational) Prudence (even tempered, sensible) Facial expression as an input annotation modality for affective speech-to-speech translation

10 Evaluation experiments 1.Does the system accurately classify emotion on the utterance level, based on the facial expression in the video input? 2.Do the synthetic voice styles succeed in conveying the target emotion category? 3.Do listeners agree with the cross-lingual transfer of paralinguistic information from the multimodal stimuli to the expressive synthetic output? Facial expression as an input annotation modality for affective speech-to-speech translation

11 Experiment 1: Classification of facial expressions Support Vector Machine (SVM) classifier trained on utterances of the male operators from the SEMAINE database 535 utterances used for training, 107 for testing Facial expression as an input annotation modality for affective speech-to-speech translation

12 Experiment 2: Perception of expressive synthesis Perception experiment with 20 subjects Listen to natural and synthesised stimuli and choose which voice style describes the utterance best: Cheerful Depressed Aggressive Neutral Facial expression as an input annotation modality for affective speech-to-speech translation

13 Experiment 2: Results

14 Experiment 3: Adequacy for S2S translation Perceptual experiment with 14 bilingual participants 24 utterances from SEMAINE operator data and their corresponding translation in each voice style Listeners were asked to choose which German translation matches the original video best. Facial expression as an input annotation modality for affective speech-to-speech translation

15 NCADNCAD Examples - Poppy (happy) Facial expression as an input annotation modality for affective speech-to-speech translation

16 NCADNCAD Examples - Prudence (neutral) Facial expression as an input annotation modality for affective speech-to-speech translation

17 NCADNCAD Examples - Spike (angry) Facial expression as an input annotation modality for affective speech-to-speech translation

18 NCADNCAD Examples - Obadiah (sad) Facial expression as an input annotation modality for affective speech-to-speech translation

19 Experiment 3: Results Facial expression as an input annotation modality for affective speech-to-speech translation

20 Conclusion Preserving the paralinguistic content of a message across languages is possible with significantly greater than chance accuracy Visual emotion classifier performed with an overall 63.5% accuracy Cheerful/happy is often mistaken for neutral (conditioned by the voice) Facial expression as an input annotation modality for affective speech-to-speech translation

21 Future Work Extending the classifier to compute the prediction of the affective state of the user based on acoustic and prosodic analysis as well as facial expressions. Demonstration of the prototype system that takes live input through a webcamera and microphone. Integration of a speech recogniser and a machine translation component Facial expression as an input annotation modality for affective speech-to-speech translation

22 Questions?


Download ppt "Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen."

Similar presentations


Ads by Google