Multimodal emotion recognition recognition models- application dependency –discrete / dimensional / appraisal theory models theoretical models of multimodal.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
From Facial Features to Facial Expressions A.Raouzaiou, K.Karpouzis and S.Kollias Image, Video and Multimedia Systems Laboratory National Technical University.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
PROGRESS ON EMOTION RECOGNITION J G Taylor & N Fragopanagos King’s College London.
Soft computing Lecture 6 Introduction to neural networks.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
ICT Networking Session « SEMANTIC ADAPTATION IN AFFECTIVE INTERACTION » STEFANOS KOLLIAS National Technical University of Athens Computer Science Division.
Chapter 2: Pattern Recognition
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Techniques for Emotion Classification Julia Hirschberg COMS 4995/6998 Thanks to Kaushal Lahankar.
Presented by Zeehasham Rasheed
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
A Fuzzy System for Emotion Classification based on the MPEG-4 facial definition parameter set Nicolas Tsapatsoulis, Kostas Karpouzis, George Stamou, Fred.
1 IUT de Montreuil Université Paris 8 Emotion in Interaction: Embodied Conversational Agents Catherine Pelachaud.
02 -1 Lecture 02 Agent Technology Topics –Introduction –Agent Reasoning –Agent Learning –Ontology Engineering –User Modeling –Mobile Agents –Multi-Agent.
Lecture 09 Clustering-based Learning
Producing Emotional Speech Thanks to Gabriel Schubiner.
SOMTIME: AN ARTIFICIAL NEURAL NETWORK FOR TOPOLOGICAL AND TEMPORAL CORRELATION FOR SPATIOTEMPORAL PATTERN LEARNING.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Facial Feature Detection
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Human Emotion Synthesis David Oziem, Lisa Gralewski, Neill Campbell, Colin Dalton, David Gibson, Barry Thomas University of Bristol, Motion Ripper, 3CR.
Soft Computing Colloquium 2 Selection of neural network, Hybrid neural networks.
Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
Soft Computing Lecture 20 Review of HIS Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Three Topics Facial Animation 2D Animated Mesh MPEG-4 Audio.
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
Chapter 9 Neural Network.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Multimodal Information Analysis for Emotion Recognition
1 Mpeg-4 Overview Gerhard Roth. 2 Overview Much more general than all previous mpegs –standard finished in the last two years standardized ways to support:
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
EE459 Neural Networks Examples of using Neural Networks Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.
Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.
Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4.
Performance Comparison of Speaker and Emotion Recognition
2004 謝俊瑋 NTU, CSIE, CMLab 1 A Rule-Based Video Annotation System Andres Dorado, Janko Calic, and Ebroul Izquierdo, Senior Member, IEEE.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
Cognitive models for emotion recognition: Big Data and Deep Learning
Face Recognition Summary –Single pose –Multiple pose –Principal components analysis –Model-based recognition –Neural Networks.
WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
Facial Expression Analysis Theoretical Results –Low-level and mid-level segmentation –High-level feature extraction for expression analysis (FACS – MPEG4.
Under Guidance of Mr. A. S. Jalal Associate Professor Dept. of Computer Engineering and Applications GLA University, Mathura Presented by Dev Drume Agrawal.
Today’s Lecture Neural networks Training
Self organizing networks
Integrating Segmentation and Similarity in Melodic Analysis
Multimodal Caricatural Mirror
Project #2 Multimodal Caricatural Mirror Intermediate report
EE 492 ENGINEERING PROJECT
Week 3 Presentation Ngoc Ta Aidean Sharghi.
End-to-End Speech-Driven Facial Animation with Temporal GANs
Presentation transcript:

multimodal emotion recognition recognition models- application dependency –discrete / dimensional / appraisal theory models theoretical models of multimodal integration –direct / separate / dominant / motor integration modality synchronization –visemes/ EMGs & FAPs / SC-RSP & speech temporal evolution and modality sequentiality –multimodal recognition techniques classifiers + context + goals + cognition/attention + modality significance in interaction

Two approaches have been developed and used for audiovisual emotion recognition: A.Separated Recognition An attention-feedback recurrent neural network applied to emotion recognition from speech. A neurofuzzy system including a-priori knowledge used for emotion recognition from facial expressions. multimodal emotion recognition

Separated Recognition The goal was to evaluate performance of the obtained recognition by each modality. Visual feeltracing was required. Pause detection & tune-based analysis, with speech playing the main role, was the means to synchronise the two modalities.

Emotion analysis & facial expressions A rule-based system for emotion recognition was created, characterising a user’s emotional state in terms of the six universal, or archetypal, expressions (joy, surprise, fear, anger, disgust, sadness. Rules have been created in terms of the MPEG-4 FAPs for each of these expressions.

Sample Profiles of Anger A 1 : F 4 [22, 124], F 31 [-131, -25], F 32 [-136,-34], F 33 [-189,-109], F 34 [- 183,-105], F 35 [-101,-31], F 36 [-108,-32], F 37 [29,85], F 38 [27,89] A 2 : F 19 [-330,-200], F 20 [-335,-205], F 21 [200,330], F 22 [205,335], F 31 [-200,-80], F 32 [-194,-74], F 33 [-190,-70], F 34 =[-190,-70] A 3 : F 19 [-330,-200], F 20 [-335,-205], F 21 [200,330], F 22 [205,335], F 31 [-200,-80], F 32 [-194,-74], F 33 [70,190], F 34 [70,190]

Emotion analysis & facial expressions f : Values derived from the calculated distances G : the value of a corresponding FAP

Expressions : Rule more often activated (% examined cases) Anger [open_jaw_low, lower_top_midlip_medium, raise_bottom_midlip_high, raise_left_inner_eyebrow_low, raise_right_inner_eyebrow_low, raise_left_medium_eyebrow_low, raise_right_medium_eyebrow_low, squeeze_left_eyebrow_high, squeeze_right_eyebrow_high, wrinkles_between_eyebrows_high, raise_left_outer_cornerlip_medium, raise_right_outer_cornerlip_medium] (47%) Joy [open_jaw_high, lower_top_midlip_low, raise_bottom_midlip_verylow, widening_mouth_high, close_left_eye_high, close_right_eye_high] (39%) Disgust [open_jaw_low, lower_top_midlip_low, raise_bottom_midlip_high, widening_mouth_low, close_left_eye_high, close_right_eye_high, raise_left_inner_eyebrow_medium, raise_right_inner_eyebrow_medium, raise_left_medium_eyebrow_medium, raise_right_medium_eyebrow_medium, wrinkles_between_eyebrows_medium] {33%)

Expressions : Rule more often activated (% examined cases) Surprise [open_jaw_high, raise_bottom_midlip_verylow, widening_mouth_low, close_left_eye_low, close_right_eye_low, raise_left_inner_eyebrow_high, raise_right_inner_eyebrow_high, raise_left_medium_eyebrow_high, raise_right_medium_eyebrow_high, raise_left_outer_eyebrow_high, raise_right_outer_eyebrow_high, squeeze_left_eyebrow_low, squeeze_right_eyebrow_low, wrinkles_between_eyebrows_low] (71%) Neutral [open_jaw_low, lower_top_midlip_medium, raise_left_inner_eyebrow_medium, raise_right_inner_eyebrow_medium, raise_left_medium_eyebrow_medium, raise_right_medium_eyebrow_medium, raise_left_outer_eyebrow_medium, raise_right_outer_eyebrow_medium, squeeze_left_eyebrow_medium, squeeze_right_eyebrow_medium, wrinkles_between_eyebrows_medium, raise_left_outer_cornerlip_medium, raise_right_outer_cornerlip_medium] (70%)

Expression based Emotion Analysis Results These rules were extended to deal with 2-D continuous (activation- evaluation) 4 quadrant emotional space They were applied to QUB SALAS generated data to test the performance to real life emotion expressive data sets.

Clustering/Neurofuzzy Analysis of Facial Features The rule-based expression/emotion analysis system was extended to handle specific characteristics of each user in continuous 2-D emotional analysis. Novel clustering and fuzzy reasoning techniques were developed and used for producing specific FAP ranges (around 10 clusters) for each user and providing rules to handle them. Results on the continuous 2-D emotional framework with SALAS data indicate that a good performance (reaching 80%) was obtained applying the adapted systems to each specific user.

Direct Multimodal Recognition The attention-feedback recurrent neural network architecture (ANNA) was applied to emotion recognition based on all input modalities. Features extracted from all input modalities (linguistic, paralinguistic speech, FAPs) were provided by processing and analysing common SALAS emotional expressive data.

Emotion Recognition based on ANNA ANNA hidden layer = emotion state, + feedback control for attention (= IMC) Learning laws for ANNA developed ANNA fuses all modalities or only one

BASIC EMOTION RECOGNITION ARCHITECTURE Attention control system: Feature vector Inputs: Emotion state as hidden layer Output as recognised emotional state ↑ →

Text Post-Processing Module Prof. Whissell compiled ‘Dictionary of Affect in Language (DAL)’ Mapping of ~9000 words → (activation-evaluation), based on students’ assessment Take words from meaningful segments obtained by pause detection → (activation-evaluation) space But humans use context to assign emotional content to words

ANNA on top correlated ASSESS features Quadrant match using top 10 activation features + top 10 evaluation features and activation – evaluation output space: Feeltracerjdccdrem Avg Quad Match Std Dev

ANNA on top correlated ASSESS features Half-plane match using top 10 activation features and activation only output space : Feeltracerjdccdrem Avg Quad Match Std Dev

Multi-modal Results 500 training epochs, 3 runs per dataset, final results being averaged (with associated Sdev). 5 hidden layer (EMOT) neurons and 5 feedback layer (IMC) neurons ; learning rate fixed at Of each dataset 4 parts used for training and 1 part for testing the net on ‘unseen’ inputs. A=Activation, E=Evaluation; FT stands for the FeelTracer used as supervisor; AVG denotes the average quadrant match (for 2D-space) or average half-plane match for (1D-space) over 3 runs. PCA on the ASSESS features to reduce them from about 500 to around 7-10 as describing most of the volatility.

Multi-modal Results: Using A Output only Classification using A output only: relatively high (in three cases up to 98%, and with two more at 90% or above) Effectiveness of data from FeelTracer EM: Average success rates of (86%, 88%, 98%, 89%, 95%, 98%, 98%) for 7 choices of input combinations: (ass/fap/dal/a+f/d+f/a+d/a+d+f) Also high success with Feeltracer JD Consistently lower values for FeelTracer DR (all in 60-66% band) Also for CC (64%, 54%, 75%, 63%, 73%, 73%, 73%).

Ontology Representation & Facial Expression/Emotion Analysis Use ontologies for real life usage of facial expression/emotion analysis results Extensibility (Ontologies form an excellent basis for considering issues like constrained reasoning, personalisation, adaptation, which have been shown crucial for applying our results to real life applications ) Standardisation (OWL ontologies form a standard knowledge representation and reasoning Web framework)

Facial Emotion Analysis Ontology Development An ontology has been created to represent the geometry and different variations of facial expressions based on the MPEG-4 Face Animation Parameters (FDPs) and Face Definition Parameters (FAPs). The ontology was built using the ontology language OWL DL and the Protégé OWL ontologies development Plugin. The ontology will be the tool for extending the obtained results to real life applications dealing with specific users’ profiles & constraints.

Concept and Relation Examples Concepts –Face –Face_Animation_Parameter –Face_Definition_Parameter –Facial_Expression Relations  is_Defined_By  is_Animated_By  has_Facial_Expression