Download presentation
Presentation is loading. Please wait.
Published byTobias Welch Modified over 9 years ago
2
Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, mlouwerse}@memphis.edu Institute for Intelligent Systems University of Memphis
3
Presentation Overview Motivation Methods Database Results Conclusion
4
Motivations Animated agents to recognize emotion in e-Learning environment. Agents need to be sensitive and adaptive to learners’ emotion.
5
Methods Our method is partially motivated by the work of Lee and Naranyan [1], who first introduced the notion of salient words.
6
Shortcomings of Lee and Narayan’s work Lee et al. argued that there is one- to-one correspondence between a word and a positive or negative emotion. This is NOT true for every case.
7
Examples ConfusionFlow Delight Normal Figure 1: Pictorial depiction of the word “okay” uttered with different intonations to express different emotions.
8
More examples.. Scar!! Scar??
9
More examples… Two months!!Two months??
10
Our Hypothesis Lexical information extracted from combined prosodic and acoustic features that correspond to intonation pattern of “salient words” will yield robust recognition of emotion from speech. It also provides a framework for signal level analysis of speech for emotion.
11
Creation of Database
12
Details on the Database 15 utterances were selected for four emotion categories: confusion/uncertain, delight, flow (confident, encouragement), and frustration [2]. Utterances were stand-alone ambiguous expressions in conversations, dependent on the context. Examples are “Great”, “Yes”, “Yeah”, “No”, “Ok”, “Good”, “Right”, “Really”, “What”, “God”.
13
Three graduate students listened to the audio clips. They successfully distinguished between the positive and negative emotions 65% of the time. No specific instructions were given as to what intonation patterns to listen to. Details on the Database…
14
High Level Diagram Feature Extraction Word Level Utterances Classifiers Positive Data Projection Negative Figure 2. The high level description of the overall emotion recognition process.
15
Hierarchical Classifiers Emotion Negative Positive ConfusionFrustrationDelightFlow Figure 3. The design of the hierarchical binary classifiers.
16
Emotion Models using Lexical Information Pitch: Minimum, maximum, mean, standard deviation, absolute value, quantile, ratio between voiced and unvoiced frames. Duration: ε time ε height Intensity: Minimum, maximum, mean, standard deviation, quantile. Formant: First formant, second formant, third formant, fourth formant, fifth formant, second formant / first formant, third formant / first formant Rhythm: Speaking rate.
17
Duration Features Figure 4. Measures of F0 for computing parameters (ε time, ε height ) which corresponds to rising and lowering of intonation. Inclusion of height and time accounts for possible low or high pitch accents.
18
Types of Classifiers RulesTreesMetaFunctionsBayes PartRandomForrestAdaBoostM1LogisticNaïve Bayes NNgeJ48BaggingMulti-layer Perceptron Naïve Bayes Simple RidorLogistic Model Tree Classification via Regression RBF NetworkNaïve Bayes Updateable --LogitBoostSimple Logistics- --Multi Class Classifier SMO- --Ordinal Class Classifier -- --Threshold Selector --
19
Shortcomings of Lee and Narayan’s work. (2004)
20
Results CategoryClassifiersAccuracy (%) Features (a)PCA (b)LDA (c)PCA+LDA (d) F15 (b1)F20 (b2) RulesPart5066.67 47.6183.33 NNge33.33 38.09 83.33 Ridor66.6783.3310047.2066.67 TreesRandom Forrest50 66.6783.33 J485066.67 47.6183.33 Logistic Model Tree33.3347.6183.3366.6771.67 MetaAdaBoostM161.9071.42 42.8561.90 Bagging33.3366.6783.3342.8566.67 Classification via Regression5066.67 47.6183.33 Logit Boost50 61.9052.3883.33 Multi Class Classifier5042.8552.3857.1483.33 Ordinal Class Classifier5066.67 47.6283.33 Threshold Selector5066.67 61.90100 FunctionsLogistic5042.8557.3857.1483.33 Multi-layer Perceptron5057.1452.385083.33 RBF Network33.3366.6752.3838.0983.33 Simple Logistics33.3347.6183.3366.67 SMO71.4257.1461.9052.3871.42 BayesNaïve Bayes66.675033.3352.3866.67 Naïve Bayes Simple66.675033.3357.1466.67 Naïve Bayes Updateable66.675033.3352.3866.67
21
Summary of Results ModelsAverage Accuracy Base Features50.79 % PCA (15)57.1 % PCA (20)61 % LDA52.01 % PCA (15) + LDA83.33 %
22
21 CLASSIFIERS ON POSITIVE AND NEGATIVE EMOTIONS. CategoryClassifiersAccuracy (%) Delight + FlowConfusion + Frustration RulesPart72.72100 NNge80100 Ridor66.67100 TreesRandomForrest63.6366.67 J4872.72100 LMT72.72100 MetaAdaBoostM154.44100 Bagging63.6466.67 Classification via Regression72.72100 LogitBoost63.64100 Multi Class Classifier72.72100 Ordinal Class Classifier72.72100 Threshold Selector83.33100 FunctionsLogistic72.72100 Multi-layer Perceptron66.67100 RBF Network66.67100 Simple Logistics72.72100 SMO72.72100 BayesNaïve Bayes72.72100 Naïve Bayes Simple72.72100 Native Bayes Updateable72.72100
23
Limitations and Future work Algorithm Feature Selection Discourse Information Future efforts will include fusion of video and audio data in a signal level framework. Database Clipping arbitrary words from a conversation may be ineffective at various cases. May need to look words in a sequence.
24
More examples..
25
M. E. Hoque, M. Yeasin, M. M. Louwerse. Robust Recognition of Emotion from Speech, 6th International Conference on Intelligent Virtual Agents, Marina Del Rey, CA, August 2006. M. E. Hoque, M. Yeasin, M. M. Louwerse. Robust Recognition of Emotion in e- Learning Environment. 18th Annual Student Research Forum, Memphis, TN April, 2006. [2 nd Best Poster Award]
26
Acknowledgments This research was partially supported by grant NSF-IIS-0416128 awarded to the third author. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding institution.
27
Questions?
28
Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max Louwerse {mhoque, myeasin, mlouwerse}@memphis.edu Institute for Intelligent Systems University of Memphis
29
References 1. C. Lee and S. Narayanan, "Toward detecting emotions in spoken dialogs," IEEE transaction on speech and audio processing, vol.13, 2005. 2. B. Kort, R. Reilly, and R. W. Picard, "An Affective Model of Interplay Between Emotions and Learning: Reengineering Educational Pedagogy-Building a Learning Companion.," presented at In Proceedings of International Conference on Advanced Learning Technologies (ICALT 2001), Madison, Wisconsin, August 2001.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.