Download presentation
Presentation is loading. Please wait.
Published byLenard Wilcox Modified over 9 years ago
1
Introduction Databases Conclusion Robust Recognition of Emotion from Speech in e-Leaning Environment Mohammed E. Hoque 1,2, Mohammed Yeasin 1, Max Louwerse 2 Acknowledgements 1. Computer Vision, Pattern and Image Analysis Laboratory Department of Electrical and Computer Engineering University of Memphis, TN 38152 2. Multimodal Aspects of Discourse (MAD) Laboratory Department of Psychology / Institute for Intelligent Systems University of Memphis, TN 38152 Results This research was partially supported by grant NSF-IIS-0416128 awarded to the third author. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding institution. (a) The word “OK” uttered under confusion (b) The word “OK” uttered under flow (c) The word “OK” uttered under delight (d) The word “OK” uttered normally Figure 1: Pitch of the word “OK” in various emotional states. Emotion in Speech in learning environment is a strong indicator of how effective the learning process is [1,2]. This assertion has impacted the study of emotion significantly. Our aim is to identify salient words and observe their prosodic features. Any set of words can be expressed with different intonational patterns and they will convey totally different meanings, as shown in figure 1. Therefore, we argue that extracting lexical and prosodic features from “salient words” only will yield robust recognition of emotion from speech in learning environment. Three movies were selected to clip emotional utterances from. (a) Fahrenheit 911. (b) Bowling for Columbine. (c) Before Sunset. (a)(b)(c) Categories of Emotion Emotion Positive Negative Flow Delight Frustration Confusion Figure 2: Categories of emotion pertinent to e-Learning Novel Features of Speech Pitch: Minimum, Maximum, Mean, Standard deviation, Absolute Value, Quantile, Ratio between voiced and unvoiced frames. Formant: First formant, Second formant, Third formant, Fourth formant, Fifth formant, Second formant / first formant, Third formant / first formant Intensity: Minimum, Maximum, Mean, Standard deviation, Quantile. Figure 3: Clustered speech features, after reducing their dimensions using both Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). CategoryClassifiersAccuracy (%) PCA + LDA Feature s RulesPart83.3350 NNge83.3333.33 Ridor66.67 TreesRandomForre st 83.3350 J4883.3350 LMT71.6733.33 MetaAdaBoostM161.90 Bagging66.6733.33 Classification via Regression 83.3350 LogitBoost83.3350 Multi Class Classifier 83.3350 Ordinal Class Classifier 83.3350 Threshold Selector 10050 FunctionsLogistic83.3350 Multi-layer Perceptron 83.3350 RBF Network83.3333.33 Simple Logistics 66.6733.33 SMO71.42 BayesNaïve Bayes66.67 Naïve Bayes Simple 66.67 Naïve Bayes Updateable 66.67 Catego ry ClassifiersAccuracy (%) Deligh t + Flow Confusion + Frustration RulesPart72.72 (66%) 100 NNge80 (50%) 100 Ridor66.67 (80%) 100 TreesRandomFo rrest 63.63 (66%) 66.67 J4872.72 (66%) 100 LMT72.72 (66%) 100 MetaAdaBoost M1 54.44 (66%) 100 Bagging63.64 (66%) 66.67 Classificati on via Regression 72.72 (66%) 100 LogitBoost63.64 (66%) 100 Multi Class Classifier 72.72 (66%) 100 Ordinal Class Classifier 72.72 (66%) 100 Threshold Selector 83.33 (80%) 100 Functio ns Logistic72.72 (66%) 100 Multi-layer Perceptron 66.67 (80%) 100 RBF Network 66.67 (80%) 100 Simple Logistics 72.72 (66%) 100 SMO72.72 (66%) 100 BayesNaïve Bayes 72.72 (66%) 100 Naïve Bayes Simple 72.72 (66%) 100 Naïve Bayes Updateable 72.72 (66%) 100 The hypothesis of extracting prosodic features from salient words has been successfully demonstrated. The results have been validated using 21 different classifiers. The results from the classifiers are cross validated by 10 folds. The results show that applying data projection and dimension reduction techniques, such as Principal Component Analysis and Linear Discriminant Analysis, yield better results. Classifiers performed nearly 100% to distinguish between frustration and confusion. Classifiers performed comparatively worse in distinguishing between positive patterns such as delight and flow. The next phase of the project will involve testing the algorithm on maptask data collected from the i-MAP project of Institute for Intelligent Systems (IIS). Future efforts are going to involve fusing multimodal channels such as facial expression, speech and gestures in both decision and feature levels. 21 different classifiers are used to validate the robustness of our algorithm to distinguish between positive and negative emotions as shown in Table 1. The comparison between with and without using data projection/reduction techniques on the features are also demonstrated in Table 1. Table 1: Classification results for positive and negative emotion Table 2: Classification results in positive and negative emotion Table 2 shows a comparison between how the classifiers performed in distinguishing the delight and flow in positive emotion and confusion and frustration in negative emotion. Results show that negative emotions are classified better than positive emotions. References [1] Craig, S. D. & Gholson, B. (2002, July). Does an agent matter?: The Effects of Animated Pedagogical Agents on Multimedia Environments. In Barker P. & Rebelsky S. (Eds.) Proceedings for ED-MEDIA 2002: World Conference on Educational Multimedia, Hypermedia and Telecommunications. (357-362). Norfolk, VA: Association for the Advancement of Computing in Education. [2] Craig, S. D., Gholson, B., & Driscoll, D. (2002) Animated Pedagogical Agents in Multimedia Educational Environments: Effects of Agent Properties, Picture Features, and Redundancy. Journal of Educational Psychology. 94, (428-434).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.