Download presentation
Presentation is loading. Please wait.
Published byClarence Doyle Modified over 8 years ago
1
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009
2
Motivation Emotion expression is challenging: –Multi-scale dependencies: Time, speaker, context, mood, personality, culture –Intentionally obfuscation: Frustration may be suppressed –Inherent multimodality: Contentment is expressed using the face and voice –Colored by mood, culture, personality, dialog-flow Problem Statement: –How can arbitrary emotional expressions be evaluated? –How can interaction-level information be used to inform classification?
3
Operating Emotion Definitions Prototypical emotions: –Expressions that are consistently recognized by a set of human evaluators (e.g., rage, glee, etc.) Nonprototypical emotions: –Expressions that are not consistently recognized by a set of human evaluators –Potential causes: Ambiguous class definitions [frustration, anger] Emotional subtlety Multimodal expression [sarcasm] Natural emotional flow of a dialog
4
Emotion and its Complexities Temporal variability: –Emotion is manifested and perceived across varying time scales Additional challenges: –Individual variability: Emotion perception varies at the individual level –Multi-modality: Emotion is expressed using speech, the face, body posture, etc. –Representation: Emotion reporting may be influenced by the representation and method of evaluation
5
Temporal Variability Multi-scale Representation Emotion is modulated across different time scales There is an inherent interdependency between the manifestations of emotion over the varying scales –Time units: phoneme, syllable, word, phrase, utterance, turn, subdialog, dialog,... –The style of emotion expression is non- constant over these time units. –Segments may be highly prototypical or non- prototypical
6
Temporal Variability Emotional Profile Create emotional profiles to: –Estimate prototypical ebb and flow –Identify “relevance sections,” sections Describes confidence of an emotional label assignment Soft label representative of the classification output Benefits: –Retention of emotional information lost in a single hard emotion assignment –Locate emotional tenor changes in a dialog Emotional profiles as features
7
Temporal Variability Interaction modeling Proposal- use emotional profiles to develop an emotional interaction framework High-level example: Angry Happy Sad Neutral Angry Happy Sad Neutral Utterance 1 Utterance 2 Utterance 3 Utterance 4 Ground Truth: Angry Dialog Angry ???? Angry First level classification: Majority-vote assignment There is no evidence to suggest that the emotional content of the dialog is not angry. Assign the emotional tag of the dialog to “angry.”
8
Temporal Variability Interaction modeling Dynamic dyadic interaction modeling at the dialog level –Captures influences existing between interlocutors –Captures individual-specific temporal characteristics of emotion Emotion state changes as a function of interlocutor’s state Temporal smoothness –Individual’s emotion flow relatively constant between two overlapping windows Captures individual evaluation styles
9
Temporal Variability An Example of Interaction modeling *First-order Markov Chain for Temporal Dynamics of Emotion State Influence of Speaker A’s State on Speaker B Within a Turn Mutual Influence of Emotion State Across Turns Emotion States During Turn t Emotion States During Turn t-1
10
Emotion and its Complexities Temporal variability: –Emotion is manifested and perceived across varying time scales Additional challenges: –Individual variability: Emotion perception varies at the individual level –Multi-modality: Emotion is expressed using speech, the face, body posture, etc. –Representation: Emotion reporting may be influenced by the representation and method of evaluation
11
Additional Challenges Individual Variability: User Perception Emotion perception is colored by: –Emotion content of an utterance –Semantic content of an utterance –Context –Mood of evaluator –Personality of evaluator –Fatigue of evaluator –Attention of evaluator
12
Additional Challenges Individual Variability: Explicit User Models Capture evaluation style Create models that define: –Perception as a function of mood –Perception as a function of attention –Perception as a function of alertness These models can be used to: –Estimate the state of the user –Create “active-learning” environments
13
Additional Challenges Multi-modality of Emotion Expression Inherent limits of uni-modal processing The audio information alone does not fully capture the emotion content –“Prototypical” angry example –Video examples: Subtle angerHot angerSarcasmContentment
14
Additional Challenges The Effect of Representation Reported emotion perception is dependent on the evaluation structure –Evaluation structure for our data: Multi-modal (audio and video) Clips are viewed in order Reported emotion perception is dependent on the evaluation methodology –Categorical –Dimensional
15
Conclusions Goal: develop techniques to interpret emotional expressions independent of their prototypical or non-prototypical nature Improve dialog-level classification: –Consider the dynamics of the acoustic features and the dynamics of the underlying classification –Classify the emotion within the context of a dialog based on emotionally clear data (vs. ambiguous content) –Will result in enhanced automated emotional comprehension by machines
16
Open Questions How can prototypical emotions be used to understand and interpret non-prototypical emotions? Is it important to be able to successfully interpret all utterances of an individual? Should a user’s emotion state ever be discarded? How can we best make use of limited data? How can ambiguous emotional content be interpreted and utilized during human- machine interaction?
17
Questions?
18
Prototypical & Nonprototypical Prototypical Expressions Nonprototypical Majority-Vote Expressions Nonprototypical Non-Majority-Vote Expressions
19
Data overview: IEMOCAP database Modalities: –Audio, video, motion capture Collection style: –Dyadic interaction (mixed-gender) –Scripted and improvisational expressions –“Natural” emotion elicitation Size: –Five pairs (five men, five women) –12 hours
20
Data overview: IEMOCAP database Evaluation: –Twelve evaluators (overlapping subsets) –Sequential annotation –Categorical ratings (3+ per utterance) Angry, happy, excited, sad, neutral, frustrated, surprised, disgusted, fearful, other (~25%) –Dimensional ratings (2 per utterance) Valence, activation
21
Data overview: IEMOCAP database Database specific definitions: –Prototypical- complete evaluator agreement –Nonprototypical majority-vote- majority vote agreement –Nonprototypical non-majority-vote- expressions without a majority consensus Emotional CategoryPrototypicalNP MVNP NMV* Anger497604802 Happiness/Excitement44111892095 Neutrality38812961623 Sadness465618616 Frustration56212801383 * At least one evaluator tagged as given emotion, non-disjoint set
22
Emotional profiling: Sadness
23
Emotional profiling: Anger
24
Emotional profiling: Frustration
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.