Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems Lab National Technical University of Athens

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 2 expressivity and emotion recognition affective computing –capability of machines to recognize, express, model, communicate and respond to emotional information computers need the ability to recognize human emotion –everyday HCI is emotional: three-quarters of computer users admit to swearing at computers –user input and system reaction are important to pinpoint problems or provide natural interfaces

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 3 the targeted interaction framework Generating intelligent interfaces with affective, learning, reasoning and adaptive capabilities. Multidisciplinary expertise is the basic means for novel interfaces, including perception and emotion recognition, semantic analysis, cognition, modelling and expression generation and production of multimodal avatars capable of adapting to the goals and context of interaction. Humans function due to four primary modes of being, i.e., affect, motivation, cognition, and behavior; these are related to feeling, wanting, thinking, and acting. Affect is particularly difficult requiring to understand and model the causes and consequences of emotions. The latter, especially as realized in behavior, is a daunting task

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 4

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 5 everyday emotional states dramatic extremes (terror, misery, elation) are fascinating, but marginal for HCI. the target of an affect-aware system –register everyday states with an emotional component – excitement, boredom, irritation, enthusiasm, stress, satisfaction, amusement –achieve sensitivity to everyday emotional states I think you might be getting just a *wee* bit bored – maybe a coffee?

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 6 affective computing applications detect specific incidents/situations that need human intervention –e.g. anger detection in a call center naturalistic interfaces –keyboard/mouse/pointer paradigm can be difficult for the elderly, handicapped people or children –speech and gesture interfaces can be useful

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 7 the EU perspective Until 2002, related research was dominated by mid-scale projects –ERMIS: multimodal emotion recognition (facial expressions, linguistic and prosody analysis) –NECA: networked affective ECAs –SAFIRA: affective input interfaces –NICE: Natural Interactive Communication for Edutainment –MEGA: Multisensory Expressive Gesture Applications –INTERFACE: Multimodal Analysis/Synthesis System for Human Interaction to Virtual and Augmented Environments

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 8 the EU perspective FP6 (2002-2006) issued two calls for multimodal interfaces –Call 1 (April 2003) and Call 5 (September 2005) covering multimodal and multilingual areas –Integrated Projects: AMI – Augmented Multi-party Interaction and CHIL - Computers In the Human Interaction Loop –Networks of Excellence: Humaine and Similar –Other calls covered “Leisure and entertainment”, “e- Inclusion”, “Cognitive systems” and “Presence and Interaction”

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 9 the HUMAINE Network of Excellence FP6 Call 1 Network of Excellence: Research on Emotions and Human-Machine Interaction start: 1st January 2004, duration: 48 months IST thematic priority: Multimodal Interfaces –emotions in human-machine interaction –creation of a new, interdisciplinary research community –advancing the state of the art in a principled way

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 10 the HUMAINE Network of Excellence 33 partner groups from 14 countries coordinated by Queen’s University of Belfast goals of HUMAINE: –integrate existing expertise in psychology, computer engineering, cognition, interaction and usability –promote shared insight http://emotion-research.net

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 11 moving forward future EU orientations include (extracted from Call 1 evaluation, 2004): –adaptability and re-configurable interfaces –collaborative technologies and interfaces in the arts –less explored modalities, e.g. haptics, bio-sensing –affective computing, including character and facial expression recognition and animation –more product innovation and industrial impact FP7 direction: Simulation, Visualization, Interaction, Mixed Reality –blending semantic/knowledge and interface technologies

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 12 the special session segment-based approach to the recognition of emotions in speech –M. Shami, M. Kamel, University of Waterloo comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition –T. Vogt, E. Andre, University of Augsburg annotation and detection of blended emotions in real human-human dialogs recorded in a call center –L. Vidrascu, L. Devillers, LIMSI-CNRS, France a real-time lip sync system using a genetic algorithm for automatic neural network configuration –G. Zoric, I. Pandzic, University of Zagreb visual/acoustic emotion recognition –Cheng-Yao Chen, Yue-Kai Huang, Perry Cook, Princeton University an intelligent system for facial emotion recognition –R. Cowie, E. Douglas-Cowie, Queen’s University of Belfast, J. Taylor, King's College, S. Ioannou, M. Wallace, IVML/NTUA

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 13 the big picture feature extraction from multiple modalities –prosody, words, face, gestures, biosignals… unimodal recognition multimodal recognition using detected features to cater for affective interaction

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 14 audiovisual emotion recognition the core system combines modules dealing with –visual signs –linguistic content of speech (what you say) –paralinguistic content (how you say it) and recognition based on all the signs

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 15 facial analysis module face detection, i.e. finding a face without prior information about its location using prior knowledge about where to look –face tracking –extraction of key regions and points in the face –monitoring of movements over time (as features for user’s expressions/emotions) provide confidence level for the validity of each detected feature

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 16 facial analysis module face detection, obtained through SVM classification facial feature extraction, by robust estimation of the primary facial features, i.e., eyes, mouth, eyebrows and nose fusion of different extraction techniques, with confidence level estimation. MPEG-4 FP and FAP feature extraction to feed the expression and emotion recognition task. 3-D modeling for improved accuracy in FP and FAP feature estimation, at an increased computational load, when the facial user model is known.

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 17 facial analysis module the extracted mask for the eyesdetected feature points in the masks

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 18 FAP estimation Absence of clear quantitative definition of FAPs It is possible to model FAPs through FDP feature points movement using distances s(x,y) e.g. close_t_r_eyelid (F 20 ) - close_b_r_eyelid (F 22 )  D 13 =s (3.2,3.4)  f 13 = D 13 - D 13-NEUTRAL

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 19 … … … Classify face no face Quick rejection - variance - skin color PreprocessingSubspace Projection SVM classifier face detection

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 20 detected faceestimation of the active contour of the face extraction of the facial area face detection

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 21 extraction of eyes and mouth – key regions within the face extraction of MPEG-4 Facial Points (FPs) – key points in the eye & mouth regions facial feature extraction

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 22 other visual features visemes, eye gaze, head pose –movement patterns, temporal correlations hand gestures, body movements –deictic/conversational gestures –“body language” measurable parameters to render expressivity on affective ECAs –spatial extent, repetitiveness, volume, etc.

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 23 Step 1: Scan or approximate 3d model ( in this case estimated from video data only using face space approach ) video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 24 Step 2: Represent 3d model using a predefined template geometry, the same template is used for expressions. This template shows higher density around eyes, and mouth and lower density around flatter areas such as cheeks, forehead, etc. video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 25 Step 3: Construct database of facial expressions by recording various actors. The statistics derived from these performances is stored in terms of a “Dynamic Face Space” Step 4: Apply the expressions to the actor in the video data: video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 26 Step 5: Matching : rotate head + apply various expressions and match current state with 2D video frame - Global Minimization process video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 27 video analysis using 3D the global matching/minimization process is complex it is sensitive to –illumination, which may vary across sequence, –shading, shadowing effects on the face, –color changes, or color differences –variability in expressions, some expressions –can not be generated using the statistics of the a priori recorded sequences it is time consuming (several minutes per frame)

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 28 Local template matchingPose estimation video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 29 video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 30 3D models video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 31 Add expressions video analysis using 3D

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 32 auditory module linguistic analysis aims to extract the words that the speaker produces paralinguistic analysis aims to extract significant variations in the way words are produced - mainly in pitch, loudness, timing, and ‘voice quality’ both are designed to cope with the less than perfect signals that are likely to occur in real use

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 33 linguistic analysis (a) The Linguistic Analysis Subsystem (b) The Speech Recognition Module (a) (b) Parameter Extraction Module Search Engine Dictionary Acoustic Modeling Language Modeling Enhanced Speech Signal Text

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 34 paralinguistic analysis ASSESS, developed by QUB, describes speech at multiple levels –intensity & spectrum; edits, pauses, frication; raw pitch estimates & a smooth fitted curve; rises & falls in intensity & pitch

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 35 integrating the evidence level 1: –facial emotion –phonetic emotion –linguistic emotion level 2: –“total” emotional state (result of the "level 1 emotions") modeling technique: fuzzy set theory (research by Massaro suggests this models the way humans integrate signs)

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 36 integrating the evidence mechanisms linking attention and emotion in the brain form a useful model Goals (Inhibition) ACG Salience NBM (Ach source) Valence (Amygdala) Goals (SFG) Thalamus /Superior Colliculus IMC (hetermodal CX) Visual Input

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 37 biosignal analysis different emotional expressions produce different changes in autonomic activity: –anger: increased heart rate and skin temperature –fear: increased heart rate, decreased skin temperature –happiness: decreased heart rate, no change in skin temperature easily integrated with external channels (face and speech) presentation by J. Kim in the HUMAINE WP4 workshop, September 2004

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 38 biosignal analysis BVP- Blood volume pulse EMG – Muscle tension EKG– Heart rate Respiration – Breathing rate Temperature GSR – Skin conductivity Acoustics and noise EEG – Brain waves

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 39 biosignal analysis skin-sensing requires physical contact need to improve accuracy, robustness to motion artifacts –vulnerable to distortion most research measures artificially elicited emotions in a lab environment and a from single subject different individuals show emotion with different response in autonomic channels (hard for multi-subjects) rarely studied physiological emotion recognition, literature offers ideas rather than well-defined solutions

July 7, 2005 multimodal emotion recognition and expressivity analysis ICME 2005 Special Session 40 multimodal emotion recognition recognition models- application dependency –discrete / dimensional / appraisal theory models theoretical models of multimodal integration –direct / separate / dominant / motor integration modality synchronization –visemes/ EMGs & FAPs / SC-RSP & speech temporal evolution and modality sequentiality –multimodal recognition techniques classifiers + context + goals + cognition/attention + modality significance in interaction

Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

Similar presentations

Presentation on theme: "Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems.

Similar presentations

Presentation on theme: "Multimodal emotion recognition and expressivity analysis ICME 2005 Special Session Stefanos Kollias, Kostas Karpouzis Image, Video and Multimedia Systems."— Presentation transcript:

Similar presentations

About project

Feedback