Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France.

Similar presentations


Presentation on theme: "Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France."— Presentation transcript:

1 Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France IST AMITIES FP5 Project Automated Multi-lingual Interaction with Information and Services HUMAINE FP6 NoE Human-Machine Interaction on Emotion CHIL FP6 Project Computer in the Human Interaction Loop

2 Vidrascu & Devillers - IEEE ICME 2005 Introduction Study of real-life emotions to improve the capacities of current speech technologies  detecting emotions can help by orienting the evolution of H-C interaction via dynamic modification of dialog strategies  Most previous works on emotion have been conducted on acted or induced data with archetypal emotions.  Results on artificial data transfer poorly to real data  expression of emotion complex: blended, shaded, masked  dependent of contextual and social factors  expressed at many different levels: prosodic, lexical, etc Challenges for detecting emotions in real-life data  representation of complex emotion  robust annotation validation protocol

3 Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus recorded in a call center  call centers are very interesting environments because the recording can be made imperceptibly emotion annotation emotion detection blended emotions perspectives

4 Vidrascu & Devillers - IEEE ICME 2005 Corpus recorded at a Web-based Stock Exchange Customer Service Center  Dialogs are real agent-client interactions in French covering a range of investment topics, account management and Web questions or problems,  5229 speech turns making 5012 in-task exchanges. # agents 4 # turns/dialog Average: 50 # words/turn Average: 9 # words total 44.1 k # clients 100 Min: 5 Max: 227 Min: 1 Max: 128 # distinct 3k

5 Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation phase is complex  definition of emotion representation and emotional unit  annotation  validation emotion detection blended emotions perspectives

6 Vidrascu & Devillers - IEEE ICME 2005 Three types of emotion representation describing emotions via appraisal dimensions (Scherer, 1999)  novelty, pleasantness, etc describing emotions via abstract dimensions (Osgood, 1975)  activation: active/passive,  valence: negative/positive  control: relation to stimulus verbal categories  8 primary universal emotions for Ekman (2002)  Primary vs. secondary/social (Plutchik, 1994)

7 Vidrascu & Devillers - IEEE ICME 2005 Emotion Definition and Annotation We consider emotion in a broad sense including attitudes and emotions Definition  set of 5 task-dependent emotion labels : anger and fear emotions, excuse, satisfaction, neutral attitudes.  emotional unit: speaker turn Dialog corpus labeled with audio listening  2 independent annotators: ambiguities ~3% AngerFearExc.Sat.Neutr. Client9.9%6.7%0.1%2.6%80.7% Agent0.7%1.3%1.8%4.0%92.1%

8 Vidrascu & Devillers - IEEE ICME 2005 Annotation Validation Inter-annotation agreement measure  Kappa=0,8 Perceptual test to validate the presence of emotions in the corpus  Test data: 40 speaker turns & 20 native French subjects  75% of negative emotions were well-detected Ref: Devillers, L., Vasilescu I., Mathon, C., (2003), “Acoustic cues for perceptual emotion detection in task-oriented Human-Human corpus”, 15th ICPhS, Barcelona

9 Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation emotion detection  Prosodic, acoustic and some disfluencies cues  Neutral/Negative, Fear/Anger classification blended emotions perspectives

10 Vidrascu & Devillers - IEEE ICME 2005 Prosodic, acoustic and disfluencies cues Crucial point: Selection of a set of relevant features Not well established / appears to be data-dependent Big and redundant set of features:  F0 features : min, max, mean, standard deviation, range, slope, regression coefficient and its mean square error, cross-variation of F0 between two adjoining voiced segments.  Energy features min, max, mean, standard deviation, range  Duration features: speaking rate (inverse of the average length of the speech voiced parts)  Other acoustic features: formants (first and second), and their bandwidths  Speech disfluencies cues: number and length of silent pauses (unvoiced parts between 200-800 ms) and filler pauses “euh”

11 Vidrascu & Devillers - IEEE ICME 2005 Speech Data Processing F0, Energy and acoustic cues extraction (Praat) Ex: F0 processing, z-score normalization  Since F0 feature detection is subject to error, segments with duration of less than 30 ms are eliminated (1.4% of the segments, balanced on classes) Automatic alignment for filler and silent pauses extraction:  LIMSI system (HMMs with Gaussian mixtures for acoustic modeling)  word alignment was manually verified for speaker turns labeled with negative emotions

12 Vidrascu & Devillers - IEEE ICME 2005 Features selection and Detection systems Weka toolkit (www.cs.waikato.ac.nz): collection of machine learning algorithms for data mining testswww.cs.waikato.ac.nz  selection of subsets of the best attributes SVM predictif, Entropy measure (infogain), Correlation based Feature Selection  classifiers tested Decision tree that uses pruning (C4.5) Support Vector Machine (SVM) Voting algorithms (ADTree and Adaboost): combine the outputs of different models

13 Vidrascu & Devillers - IEEE ICME 2005 Neutral/Negative emotion detection using prosodic and acoustic cues, Jackknifing proc. (30 runs) C4.5AdaBoostADTree SVM 5att72.8 ( 5.2)71.2 (4.5)72.3(4.6)67.2(6.3 ) 10att73.0 ( 5.3)71.5( 4.8)73.0( 5.7)69.5( 5.6) 15att71.7 ( 6.4)71.1( 4.7)71.6( 4.9)70.8( 4.9) 20att71.8 ( 5.3)71.3( 4.3)71.8( 5.1)71.0( 4.9) allatt69.4 ( 5.6)71.7( 4.3)71.6( 4.8)69.6( 3.5) Very few attributes (5att) yield high level of detection Little differences between the different techniques

14 Vidrascu & Devillers - IEEE ICME 2005 Anger/Fear emotion detection  Decision Tree classifier:  56% correct detection with prosodic and acoustic cues  60% when adding disfluencies (silent pauses and filler pauses « euh ») cues  We hypothesize that this low performance is due to blended emotions

15 Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation emotion detection blended emotions  In certain states of mind, it is possible to exhibit more than one emotion when trying to mask a feeling, conflicting emotions, suffering, etc perspectives

16 Vidrascu & Devillers - IEEE ICME 2005 Blended emotions In this financial task, Anger and Fear can be combined: « Clients can be angry because they are afraid of losing money »  Confusion matrix (40% confusion): there are as many « Anger classified Fear » as « Fear classified Anger ». Re-annotation procedure of negative emotions with a new scheme defined for other tasks (medical call center, EmoTV), 2 different annotators

17 Vidrascu & Devillers - IEEE ICME 2005 New emotion annotation scheme allows to choose 2 labels per segment:  Major emotion: which is perceived as dominant  Minor emotion: if another emotion is perceived in background (the most intense minor emotion) 7 coarse classes (defined for another task)  Fear, Sadness, Anger, Hurt, Positive, Surprise, Neutral attitude

18 Vidrascu & Devillers - IEEE ICME 2005 Perception of emotion is very subjective How to mix different annotations? Labeler 1: Major Anger, Minor Sadness Labeler 2: Major Fear, Minor Anger exploit the differences by combining the labels from multiple annotators in a soft emotion vector -> (wM/W Anger, wm/W Fear, wm/W Sadness) For wM=2, wm=1,W=6 in this example -> (3/6 Anger, 2/6 Fear, 1/6 Sadness)

19 Vidrascu & Devillers - IEEE ICME 2005 Re-annotation result Because, we are focusing on Anger and Fear emotions, 4 classes were deduced from emotion vectors:  Fear (Fear>0; Anger=0)  Anger (Fear=0; Anger>0)  Blended emotion (Fear>0; Anger>0)  Other (Fear=0; Anger=0) Consistance between the first and the second annotation for 78% utterances  If (Anger >= Fear) and previous annotation Anger -> consistance  Same Major label in 64% utterances  No common labels between the two annotators: 13%

20 Vidrascu & Devillers - IEEE ICME 2005 Re-annotation results  Validation of the presence of mixtures of emotion in the Anger and Fear segments  Excerpt taken from a call: Client: “No, but I haven’t handled it at all. I was on holidays, I got a letter, about 4… 400 euros were missing…”

21 Vidrascu & Devillers - IEEE ICME 2005 Summary and perspectives Detection performance  73% correct detection between Neutral and Negative emotion whereas only 60% between Fear and Anger Validation of the presence of mixtures of Fear/Anger emotion Emotion representation: Soft emotion vector  medical call center corpus (20h annotated)  multimodal corpus of TV interviews (EmoTV-HUMAINE) Perspectives  improve detection performance by using non complex part of the corpus for training model  analyse real-life blended emotions and perceptual test on blended emotions

22 Vidrascu & Devillers - IEEE ICME 2005 Thank you for your attention reference: L. Devillers, L. Vidrascu, L. Lamel, “Challenges in real-life emotion annotation and machine learning based detection”, Special issue, Journal of Neural Networks, to appear in July 2005.

23 Vidrascu & Devillers - IEEE ICME 2005 Combining lexical and paralinguistic lexical unigram model : 78% detection neutral/negative linear combination of the 2 scores on 10 test sets (50 utterances)

24 Vidrascu & Devillers - IEEE ICME 2005 Emotion Detection Model Emotion detection model is based on unigram models Due to the sparseness of the on-emotion data, each emotion model is an interpolation of an Emotion-specific model and a General task-specific model estimated on the entire training corpus. The similarity between u and E is the normalized log likelihood ratio between an emotion model and the general model. Standard preprocessing procedures: compounding (negative forms ex: « pas_normal »), stemming, and stopping

25 Vidrascu & Devillers - IEEE ICME 2005 Experiments on Anger/Fear detection  Prosodic and acoustic cues  56% of detection  around 60% when disfluencies are added  Lexical cues: ICME 2003  often same lexical words : problem, abnormal, etc  difference is much more syntactic than lexical

26 Vidrascu & Devillers - IEEE ICME 2005 Attribute selection weka toolkit  With a model (SVM )  Information Gain (A: attribut; C : classe)  CFS (Correlation based Feature Selection)


Download ppt "Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France."

Similar presentations


Ads by Google