Presentation is loading. Please wait.

Presentation is loading. Please wait.

M. Brendel 1, R. Zaccarelli 1, L. Devillers 1,2 1 LIMSI-CNRS, 2 Paris-South University French National Research Agency - Affective Avatar project (2007-2010)

Similar presentations


Presentation on theme: "M. Brendel 1, R. Zaccarelli 1, L. Devillers 1,2 1 LIMSI-CNRS, 2 Paris-South University French National Research Agency - Affective Avatar project (2007-2010)"— Presentation transcript:

1 M. Brendel 1, R. Zaccarelli 1, L. Devillers 1,2 1 LIMSI-CNRS, 2 Paris-South University French National Research Agency - Affective Avatar project (2007-2010)

2  Example of application of the system for emotions detection from speech to control an affective avatar: Skype (the speaker is depicted by his/her avatar)  The avatar should show the expressive behavior in facial and gesture corresponding to the emotion detected in voice – speech synchronized with the lips movements  The mapping between the output of the emotion detection system and the expressive avatar was done with the ECA team at LIMSI and the other partners

3  This application has two main challenges for emotion detection:  speaker- independent emotion detection  real-time emotion detection  We focus on emotion detection for 4 macro-classes:  Anger ( Annoyance, Hot anger, Impatience)  Sadness (Disappointment, Sadness)  Positive (Amusement, Joy, Satisfaction)  Neutral

4  Choice of appropriate corpora for training models -> fundamental.  Data must be as close as possible to the behaviors observed in the real application but sometimes such application does not exist  Corpus must be large enough,  Large number of speakers,  Sufficient variability of emotional expressions, including complex, mixed and shaded emotions.  Available corpora in the community are mainly acted and small including few speakers and little variations in the expression of emotions without any application in sight.  LIMSI corpora mainly collected in call centers (bank, emergency or stock exchange call centers) with lots of negative emotions.

5 CINEMO (Rollet et al., 2009, Schuller et al., 2010) contains acted emotional expression (mainly everyday situations) obtained by playing dubbing exercices (Cine Karaoké) by 50 speakers - manually segmented - 2 coders (annotation scheme allows to annotate mixtures of emotion)  lot of shaded emotions and mixtures JEMO obtained by an emotion detection game with 39 speakers - first prototype of a real time detection system – automatically segmented - 2 coders  prototypical emotions: very few mixtures of emotions were annotated Question of this paper: Can we mix different kinds of corpus recorded in same conditions for training more efficient classifier?

6 Sub-corpus CINEMO (50 speakers) POSSADANGNEUTOTAL #segments3133643445101012 Sub-corpora on consensual segments were chosen for training models for detection of 4 classes – we have not considered mixtures of emotions Sub-corpus JEMO (38 diff. speakers) POSSADANGNEUTOTAL #segments3162231794161062

7 We have compared Corpus-anger with Corpus-All with some acoustic features, we have plotted the for each feature across the three corpora. 1- rolloff05% 2- rolloff25% 3- rolloff50% 4- rolloff75% 5- rolloff95% 6- centroid 7- spectralslope 8- spectralorigin 9- bandenergy0-250 10- bandenergy250-650 11- bandenergy0-650 12- bandenergy650-1k 13- bandenergy1k-4k 14- barkband1... 37- barkband24 38- mfcc0 39- mfcc1 … 50- mfcc12 51- zcr 52- meanloudness 53- rmsintensity *1000 54- rapportMaxMinF0 55- varF0 56- F2-F1 57- F3-F2 58- varF1 59- varF2 60- varF3 61- voicedratio 62- jitterlocal 63- shimmerlocal 64- HNR (Tahon & devillers, Speech Prosody 2010)

8 Computed on voiced segments Low-Level Descriptors (# nb computed with functionals) Functionals Energy (29)moments(2) RMS Energy (22)absolute mean, max FO (23)extremes(2) Zero-Crossing Rate (18)2 x values, range MFCC 1-16 (366)linear regression(2) MAE/MSE, slope quartiles(2) quartile, tquartile

9 RR/UARTest CINEMOTest JEMO Train CINEMO0.50/ 0.480.51/0.48 Train JEMO0.43/0.390.60/0.55 Training on CINEMO and testing on JEMO performs better than vice-versa. It seems better to train on a wider set (more variability of emotional expressions in CINEMO – different contexts) and test on a narrower (JEMO contains more prototypical emotions) than the other way. Surprisingly, training on CINEMO then testing on JEMO gives a slightly better performance than testing on CINEMO itself.

10 RR/ UAR SpD CV WEKA SpI CV CINEMO0.57/ 0.56 0.50/ 0.48 JEMO0.63/ 0.59 0.60/ 0.55 CINEMO +JEMO 0.58/ 0.56 0.54/ 0.51 Be careful, WEKA use SPD cross-validation!

11  This means that the unification of the corpora improved the results.  We could not be better than JEMO, but it is obvious that the good result of JEMO on itself is because it is a small corpus with prototypical emotions only, and it has no good generalization power: training on JEMO, testing on CINEMO 0.43/0.39  After balancing tests, We can also conclude that the performance improvement is mainly due to the large number of instances.

12 12

13 RR/UARSPI CV All features SPI CV SFFS Female0.59/ 0.550.65 (31 features) Male0.52/0.490.55 (38 features) All0.54/0.51 PositiveSadnessAngerNeutral Male252262267432 Female377325256494

14  Unification of both corpora (88 speakers) allows to improve the results:  The number of instances is approximately doubled  The classes are more balanced  The two corpora enrich each other  Spitting the corpus along gender is also beneficial: the models trained on the sub-corpus are better;  Gender information was available in our application of affective avatar  features selection seems also beneficial (need cross- corpora studies)

15 15

16 Emotional databases often small, sparse resource when using natural context (often less than 10% of utt. are emotional), difficult to build generic models from one corpus  Find measures for qualifying emotional databases  Cross-corpora studies are very important  Use multiple corpora collected in different contexts to train models

17  Thanks for attention


Download ppt "M. Brendel 1, R. Zaccarelli 1, L. Devillers 1,2 1 LIMSI-CNRS, 2 Paris-South University French National Research Agency - Affective Avatar project (2007-2010)"

Similar presentations


Ads by Google