Download presentation
Presentation is loading. Please wait.
Published byRandolph Dawson Modified over 9 years ago
1
Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France IST AMITIES FP5 Project Automated Multi-lingual Interaction with Information and Services HUMAINE FP6 NoE Human-Machine Interaction on Emotion CHIL FP6 Project Computer in the Human Interaction Loop
2
Vidrascu & Devillers - IEEE ICME 2005 Introduction Study of real-life emotions to improve the capacities of current speech technologies detecting emotions can help by orienting the evolution of H-C interaction via dynamic modification of dialog strategies Most previous works on emotion have been conducted on acted or induced data with archetypal emotions. Results on artificial data transfer poorly to real data expression of emotion complex: blended, shaded, masked dependent of contextual and social factors expressed at many different levels: prosodic, lexical, etc Challenges for detecting emotions in real-life data representation of complex emotion robust annotation validation protocol
3
Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus recorded in a call center call centers are very interesting environments because the recording can be made imperceptibly emotion annotation emotion detection blended emotions perspectives
4
Vidrascu & Devillers - IEEE ICME 2005 Corpus recorded at a Web-based Stock Exchange Customer Service Center Dialogs are real agent-client interactions in French covering a range of investment topics, account management and Web questions or problems, 5229 speech turns making 5012 in-task exchanges. # agents 4 # turns/dialog Average: 50 # words/turn Average: 9 # words total 44.1 k # clients 100 Min: 5 Max: 227 Min: 1 Max: 128 # distinct 3k
5
Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation phase is complex definition of emotion representation and emotional unit annotation validation emotion detection blended emotions perspectives
6
Vidrascu & Devillers - IEEE ICME 2005 Three types of emotion representation describing emotions via appraisal dimensions (Scherer, 1999) novelty, pleasantness, etc describing emotions via abstract dimensions (Osgood, 1975) activation: active/passive, valence: negative/positive control: relation to stimulus verbal categories 8 primary universal emotions for Ekman (2002) Primary vs. secondary/social (Plutchik, 1994)
7
Vidrascu & Devillers - IEEE ICME 2005 Emotion Definition and Annotation We consider emotion in a broad sense including attitudes and emotions Definition set of 5 task-dependent emotion labels : anger and fear emotions, excuse, satisfaction, neutral attitudes. emotional unit: speaker turn Dialog corpus labeled with audio listening 2 independent annotators: ambiguities ~3% AngerFearExc.Sat.Neutr. Client9.9%6.7%0.1%2.6%80.7% Agent0.7%1.3%1.8%4.0%92.1%
8
Vidrascu & Devillers - IEEE ICME 2005 Annotation Validation Inter-annotation agreement measure Kappa=0,8 Perceptual test to validate the presence of emotions in the corpus Test data: 40 speaker turns & 20 native French subjects 75% of negative emotions were well-detected Ref: Devillers, L., Vasilescu I., Mathon, C., (2003), “Acoustic cues for perceptual emotion detection in task-oriented Human-Human corpus”, 15th ICPhS, Barcelona
9
Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation emotion detection Prosodic, acoustic and some disfluencies cues Neutral/Negative, Fear/Anger classification blended emotions perspectives
10
Vidrascu & Devillers - IEEE ICME 2005 Prosodic, acoustic and disfluencies cues Crucial point: Selection of a set of relevant features Not well established / appears to be data-dependent Big and redundant set of features: F0 features : min, max, mean, standard deviation, range, slope, regression coefficient and its mean square error, cross-variation of F0 between two adjoining voiced segments. Energy features min, max, mean, standard deviation, range Duration features: speaking rate (inverse of the average length of the speech voiced parts) Other acoustic features: formants (first and second), and their bandwidths Speech disfluencies cues: number and length of silent pauses (unvoiced parts between 200-800 ms) and filler pauses “euh”
11
Vidrascu & Devillers - IEEE ICME 2005 Speech Data Processing F0, Energy and acoustic cues extraction (Praat) Ex: F0 processing, z-score normalization Since F0 feature detection is subject to error, segments with duration of less than 30 ms are eliminated (1.4% of the segments, balanced on classes) Automatic alignment for filler and silent pauses extraction: LIMSI system (HMMs with Gaussian mixtures for acoustic modeling) word alignment was manually verified for speaker turns labeled with negative emotions
12
Vidrascu & Devillers - IEEE ICME 2005 Features selection and Detection systems Weka toolkit (www.cs.waikato.ac.nz): collection of machine learning algorithms for data mining testswww.cs.waikato.ac.nz selection of subsets of the best attributes SVM predictif, Entropy measure (infogain), Correlation based Feature Selection classifiers tested Decision tree that uses pruning (C4.5) Support Vector Machine (SVM) Voting algorithms (ADTree and Adaboost): combine the outputs of different models
13
Vidrascu & Devillers - IEEE ICME 2005 Neutral/Negative emotion detection using prosodic and acoustic cues, Jackknifing proc. (30 runs) C4.5AdaBoostADTree SVM 5att72.8 ( 5.2)71.2 (4.5)72.3(4.6)67.2(6.3 ) 10att73.0 ( 5.3)71.5( 4.8)73.0( 5.7)69.5( 5.6) 15att71.7 ( 6.4)71.1( 4.7)71.6( 4.9)70.8( 4.9) 20att71.8 ( 5.3)71.3( 4.3)71.8( 5.1)71.0( 4.9) allatt69.4 ( 5.6)71.7( 4.3)71.6( 4.8)69.6( 3.5) Very few attributes (5att) yield high level of detection Little differences between the different techniques
14
Vidrascu & Devillers - IEEE ICME 2005 Anger/Fear emotion detection Decision Tree classifier: 56% correct detection with prosodic and acoustic cues 60% when adding disfluencies (silent pauses and filler pauses « euh ») cues We hypothesize that this low performance is due to blended emotions
15
Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation emotion detection blended emotions In certain states of mind, it is possible to exhibit more than one emotion when trying to mask a feeling, conflicting emotions, suffering, etc perspectives
16
Vidrascu & Devillers - IEEE ICME 2005 Blended emotions In this financial task, Anger and Fear can be combined: « Clients can be angry because they are afraid of losing money » Confusion matrix (40% confusion): there are as many « Anger classified Fear » as « Fear classified Anger ». Re-annotation procedure of negative emotions with a new scheme defined for other tasks (medical call center, EmoTV), 2 different annotators
17
Vidrascu & Devillers - IEEE ICME 2005 New emotion annotation scheme allows to choose 2 labels per segment: Major emotion: which is perceived as dominant Minor emotion: if another emotion is perceived in background (the most intense minor emotion) 7 coarse classes (defined for another task) Fear, Sadness, Anger, Hurt, Positive, Surprise, Neutral attitude
18
Vidrascu & Devillers - IEEE ICME 2005 Perception of emotion is very subjective How to mix different annotations? Labeler 1: Major Anger, Minor Sadness Labeler 2: Major Fear, Minor Anger exploit the differences by combining the labels from multiple annotators in a soft emotion vector -> (wM/W Anger, wm/W Fear, wm/W Sadness) For wM=2, wm=1,W=6 in this example -> (3/6 Anger, 2/6 Fear, 1/6 Sadness)
19
Vidrascu & Devillers - IEEE ICME 2005 Re-annotation result Because, we are focusing on Anger and Fear emotions, 4 classes were deduced from emotion vectors: Fear (Fear>0; Anger=0) Anger (Fear=0; Anger>0) Blended emotion (Fear>0; Anger>0) Other (Fear=0; Anger=0) Consistance between the first and the second annotation for 78% utterances If (Anger >= Fear) and previous annotation Anger -> consistance Same Major label in 64% utterances No common labels between the two annotators: 13%
20
Vidrascu & Devillers - IEEE ICME 2005 Re-annotation results Validation of the presence of mixtures of emotion in the Anger and Fear segments Excerpt taken from a call: Client: “No, but I haven’t handled it at all. I was on holidays, I got a letter, about 4… 400 euros were missing…”
21
Vidrascu & Devillers - IEEE ICME 2005 Summary and perspectives Detection performance 73% correct detection between Neutral and Negative emotion whereas only 60% between Fear and Anger Validation of the presence of mixtures of Fear/Anger emotion Emotion representation: Soft emotion vector medical call center corpus (20h annotated) multimodal corpus of TV interviews (EmoTV-HUMAINE) Perspectives improve detection performance by using non complex part of the corpus for training model analyse real-life blended emotions and perceptual test on blended emotions
22
Vidrascu & Devillers - IEEE ICME 2005 Thank you for your attention reference: L. Devillers, L. Vidrascu, L. Lamel, “Challenges in real-life emotion annotation and machine learning based detection”, Special issue, Journal of Neural Networks, to appear in July 2005.
23
Vidrascu & Devillers - IEEE ICME 2005 Combining lexical and paralinguistic lexical unigram model : 78% detection neutral/negative linear combination of the 2 scores on 10 test sets (50 utterances)
24
Vidrascu & Devillers - IEEE ICME 2005 Emotion Detection Model Emotion detection model is based on unigram models Due to the sparseness of the on-emotion data, each emotion model is an interpolation of an Emotion-specific model and a General task-specific model estimated on the entire training corpus. The similarity between u and E is the normalized log likelihood ratio between an emotion model and the general model. Standard preprocessing procedures: compounding (negative forms ex: « pas_normal »), stemming, and stopping
25
Vidrascu & Devillers - IEEE ICME 2005 Experiments on Anger/Fear detection Prosodic and acoustic cues 56% of detection around 60% when disfluencies are added Lexical cues: ICME 2003 often same lexical words : problem, abnormal, etc difference is much more syntactic than lexical
26
Vidrascu & Devillers - IEEE ICME 2005 Attribute selection weka toolkit With a model (SVM ) Information Gain (A: attribut; C : classe) CFS (Correlation based Feature Selection)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.