Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
L-Devillers - Plenary 5 juin Emotional Speech detection Laurence Devillers, LIMSI-CNRS, Expression of emotions in.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
EMOTIONS NATURE EVALUATION BASED ON SEGMENTAL INFORMATION BASED ON PROSODIC INFORMATION AUTOMATIC CLASSIFICATION EXPERIMENTS RESYNTHESIS VOICE PERCEPTUAL.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Speaker Adaptation for Vowel Classification
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Three kinds of learning
Techniques for Emotion Classification Julia Hirschberg COMS 4995/6998 Thanks to Kaushal Lahankar.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Facial Feature Detection
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
HUMAINE - WP5 Belfast04 1 Experience with emotion labelling in naturalistic data L. Devillers, S. Abrilian, JC. Martin, LIMSI-CNRS, Orsay E. Cowie, C.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
LIMSI-CNRS WP5 - Belfast September, 2004 Multimodal Annotation of Emotions in TV Interviews S. Abrilian, L. Devillers, J.C Martin, S. Buisine LIMSI – CNRS,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Multimodal Information Analysis for Emotion Recognition
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Laurence DEVILLERS & Jean-Claude MARTIN LIMSI-CNRS FP6 IST HUMAINE Network of Excellence / Association (
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 Computation Approaches to Emotional Speech Julia Hirschberg
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
MedIX – Summer 07 Lucia Dettori (room 745)
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
M. Brendel 1, R. Zaccarelli 1, L. Devillers 1,2 1 LIMSI-CNRS, 2 Paris-South University French National Research Agency - Affective Avatar project ( )
National Taiwan University, Taiwan
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Predicting Voice Elicited Emotions
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Data Mining and Decision Support
Reporter: Shau-Shiang Hung( 洪紹祥 ) Adviser:Shu-Chen Cheng( 鄭淑真 ) Date:99/06/15.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
University of Rochester
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
EE513 Audio Signals and Systems
Emotional Speech Julia Hirschberg CS /16/2019.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 7: Transformations
Measuring the Similarity of Rhythmic Patterns
Automatic Prosodic Event Detection
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France IST AMITIES FP5 Project Automated Multi-lingual Interaction with Information and Services HUMAINE FP6 NoE Human-Machine Interaction on Emotion CHIL FP6 Project Computer in the Human Interaction Loop

Vidrascu & Devillers - IEEE ICME 2005 Introduction Study of real-life emotions to improve the capacities of current speech technologies  detecting emotions can help by orienting the evolution of H-C interaction via dynamic modification of dialog strategies  Most previous works on emotion have been conducted on acted or induced data with archetypal emotions.  Results on artificial data transfer poorly to real data  expression of emotion complex: blended, shaded, masked  dependent of contextual and social factors  expressed at many different levels: prosodic, lexical, etc Challenges for detecting emotions in real-life data  representation of complex emotion  robust annotation validation protocol

Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus recorded in a call center  call centers are very interesting environments because the recording can be made imperceptibly emotion annotation emotion detection blended emotions perspectives

Vidrascu & Devillers - IEEE ICME 2005 Corpus recorded at a Web-based Stock Exchange Customer Service Center  Dialogs are real agent-client interactions in French covering a range of investment topics, account management and Web questions or problems,  5229 speech turns making 5012 in-task exchanges. # agents 4 # turns/dialog Average: 50 # words/turn Average: 9 # words total 44.1 k # clients 100 Min: 5 Max: 227 Min: 1 Max: 128 # distinct 3k

Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation phase is complex  definition of emotion representation and emotional unit  annotation  validation emotion detection blended emotions perspectives

Vidrascu & Devillers - IEEE ICME 2005 Three types of emotion representation describing emotions via appraisal dimensions (Scherer, 1999)  novelty, pleasantness, etc describing emotions via abstract dimensions (Osgood, 1975)  activation: active/passive,  valence: negative/positive  control: relation to stimulus verbal categories  8 primary universal emotions for Ekman (2002)  Primary vs. secondary/social (Plutchik, 1994)

Vidrascu & Devillers - IEEE ICME 2005 Emotion Definition and Annotation We consider emotion in a broad sense including attitudes and emotions Definition  set of 5 task-dependent emotion labels : anger and fear emotions, excuse, satisfaction, neutral attitudes.  emotional unit: speaker turn Dialog corpus labeled with audio listening  2 independent annotators: ambiguities ~3% AngerFearExc.Sat.Neutr. Client9.9%6.7%0.1%2.6%80.7% Agent0.7%1.3%1.8%4.0%92.1%

Vidrascu & Devillers - IEEE ICME 2005 Annotation Validation Inter-annotation agreement measure  Kappa=0,8 Perceptual test to validate the presence of emotions in the corpus  Test data: 40 speaker turns & 20 native French subjects  75% of negative emotions were well-detected Ref: Devillers, L., Vasilescu I., Mathon, C., (2003), “Acoustic cues for perceptual emotion detection in task-oriented Human-Human corpus”, 15th ICPhS, Barcelona

Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation emotion detection  Prosodic, acoustic and some disfluencies cues  Neutral/Negative, Fear/Anger classification blended emotions perspectives

Vidrascu & Devillers - IEEE ICME 2005 Prosodic, acoustic and disfluencies cues Crucial point: Selection of a set of relevant features Not well established / appears to be data-dependent Big and redundant set of features:  F0 features : min, max, mean, standard deviation, range, slope, regression coefficient and its mean square error, cross-variation of F0 between two adjoining voiced segments.  Energy features min, max, mean, standard deviation, range  Duration features: speaking rate (inverse of the average length of the speech voiced parts)  Other acoustic features: formants (first and second), and their bandwidths  Speech disfluencies cues: number and length of silent pauses (unvoiced parts between ms) and filler pauses “euh”

Vidrascu & Devillers - IEEE ICME 2005 Speech Data Processing F0, Energy and acoustic cues extraction (Praat) Ex: F0 processing, z-score normalization  Since F0 feature detection is subject to error, segments with duration of less than 30 ms are eliminated (1.4% of the segments, balanced on classes) Automatic alignment for filler and silent pauses extraction:  LIMSI system (HMMs with Gaussian mixtures for acoustic modeling)  word alignment was manually verified for speaker turns labeled with negative emotions

Vidrascu & Devillers - IEEE ICME 2005 Features selection and Detection systems Weka toolkit ( collection of machine learning algorithms for data mining testswww.cs.waikato.ac.nz  selection of subsets of the best attributes SVM predictif, Entropy measure (infogain), Correlation based Feature Selection  classifiers tested Decision tree that uses pruning (C4.5) Support Vector Machine (SVM) Voting algorithms (ADTree and Adaboost): combine the outputs of different models

Vidrascu & Devillers - IEEE ICME 2005 Neutral/Negative emotion detection using prosodic and acoustic cues, Jackknifing proc. (30 runs) C4.5AdaBoostADTree SVM 5att72.8 ( 5.2)71.2 (4.5)72.3(4.6)67.2(6.3 ) 10att73.0 ( 5.3)71.5( 4.8)73.0( 5.7)69.5( 5.6) 15att71.7 ( 6.4)71.1( 4.7)71.6( 4.9)70.8( 4.9) 20att71.8 ( 5.3)71.3( 4.3)71.8( 5.1)71.0( 4.9) allatt69.4 ( 5.6)71.7( 4.3)71.6( 4.8)69.6( 3.5) Very few attributes (5att) yield high level of detection Little differences between the different techniques

Vidrascu & Devillers - IEEE ICME 2005 Anger/Fear emotion detection  Decision Tree classifier:  56% correct detection with prosodic and acoustic cues  60% when adding disfluencies (silent pauses and filler pauses « euh ») cues  We hypothesize that this low performance is due to blended emotions

Vidrascu & Devillers - IEEE ICME 2005 Outline real-life corpus description emotion annotation emotion detection blended emotions  In certain states of mind, it is possible to exhibit more than one emotion when trying to mask a feeling, conflicting emotions, suffering, etc perspectives

Vidrascu & Devillers - IEEE ICME 2005 Blended emotions In this financial task, Anger and Fear can be combined: « Clients can be angry because they are afraid of losing money »  Confusion matrix (40% confusion): there are as many « Anger classified Fear » as « Fear classified Anger ». Re-annotation procedure of negative emotions with a new scheme defined for other tasks (medical call center, EmoTV), 2 different annotators

Vidrascu & Devillers - IEEE ICME 2005 New emotion annotation scheme allows to choose 2 labels per segment:  Major emotion: which is perceived as dominant  Minor emotion: if another emotion is perceived in background (the most intense minor emotion) 7 coarse classes (defined for another task)  Fear, Sadness, Anger, Hurt, Positive, Surprise, Neutral attitude

Vidrascu & Devillers - IEEE ICME 2005 Perception of emotion is very subjective How to mix different annotations? Labeler 1: Major Anger, Minor Sadness Labeler 2: Major Fear, Minor Anger exploit the differences by combining the labels from multiple annotators in a soft emotion vector -> (wM/W Anger, wm/W Fear, wm/W Sadness) For wM=2, wm=1,W=6 in this example -> (3/6 Anger, 2/6 Fear, 1/6 Sadness)

Vidrascu & Devillers - IEEE ICME 2005 Re-annotation result Because, we are focusing on Anger and Fear emotions, 4 classes were deduced from emotion vectors:  Fear (Fear>0; Anger=0)  Anger (Fear=0; Anger>0)  Blended emotion (Fear>0; Anger>0)  Other (Fear=0; Anger=0) Consistance between the first and the second annotation for 78% utterances  If (Anger >= Fear) and previous annotation Anger -> consistance  Same Major label in 64% utterances  No common labels between the two annotators: 13%

Vidrascu & Devillers - IEEE ICME 2005 Re-annotation results  Validation of the presence of mixtures of emotion in the Anger and Fear segments  Excerpt taken from a call: Client: “No, but I haven’t handled it at all. I was on holidays, I got a letter, about 4… 400 euros were missing…”

Vidrascu & Devillers - IEEE ICME 2005 Summary and perspectives Detection performance  73% correct detection between Neutral and Negative emotion whereas only 60% between Fear and Anger Validation of the presence of mixtures of Fear/Anger emotion Emotion representation: Soft emotion vector  medical call center corpus (20h annotated)  multimodal corpus of TV interviews (EmoTV-HUMAINE) Perspectives  improve detection performance by using non complex part of the corpus for training model  analyse real-life blended emotions and perceptual test on blended emotions

Vidrascu & Devillers - IEEE ICME 2005 Thank you for your attention reference: L. Devillers, L. Vidrascu, L. Lamel, “Challenges in real-life emotion annotation and machine learning based detection”, Special issue, Journal of Neural Networks, to appear in July 2005.

Vidrascu & Devillers - IEEE ICME 2005 Combining lexical and paralinguistic lexical unigram model : 78% detection neutral/negative linear combination of the 2 scores on 10 test sets (50 utterances)

Vidrascu & Devillers - IEEE ICME 2005 Emotion Detection Model Emotion detection model is based on unigram models Due to the sparseness of the on-emotion data, each emotion model is an interpolation of an Emotion-specific model and a General task-specific model estimated on the entire training corpus. The similarity between u and E is the normalized log likelihood ratio between an emotion model and the general model. Standard preprocessing procedures: compounding (negative forms ex: « pas_normal »), stemming, and stopping

Vidrascu & Devillers - IEEE ICME 2005 Experiments on Anger/Fear detection  Prosodic and acoustic cues  56% of detection  around 60% when disfluencies are added  Lexical cues: ICME 2003  often same lexical words : problem, abnormal, etc  difference is much more syntactic than lexical

Vidrascu & Devillers - IEEE ICME 2005 Attribute selection weka toolkit  With a model (SVM )  Information Gain (A: attribut; C : classe)  CFS (Correlation based Feature Selection)