Facial Expression Analysis Theoretical Results –Low-level and mid-level segmentation –High-level feature extraction for expression analysis (FACS – MPEG4 FAPs)
Research Issues Which models/features (spatial /temporal) Which emotion representation Generalization over races / individuals Environment, context Multimodal, synchronization (hand gestures, postures, visemes, pauses)
Emotion analysis system overview f : Values derived from the calculated distances G : the value of a corresponding FAP
Multiple cue Facial Feature boundary extraction: eyes & mouth, eyebrows, nose Edge-based mask Intensity-based mask NN-based (Y,Cr,Cb, DCT coefficients of neighborhood) mask Each mask is validated independently
Multiple cue feature extraction – an example
Final mask validation through Anthropometry Facial distances measured by US Army 30 year period, Male/Female separation The measured distances are normalized by division with Distance 7, i.e. the distance between the inner corners of left and right eye, both points the human cannot move.
Detected Feature Points (FPs)
FAPs estimation Absence of clear quantitative definition of FAPs It is possible to model FAPs through FDP feature points movement using distances s(x,y) e.g. close_t_r_eyelid (F 20 ) - close_b_r_eyelid (F 22 ) D 13 =s (3.2,3.4) f 13= D 13 - D 13-NEUTRAL
Sample Profiles of Anger A 1 : F 4 [22, 124], F 31 [-131, -25], F 32 [-136,-34], F 33 [-189,-109], F 34 [- 183,-105], F 35 [-101,-31], F 36 [-108,-32], F 37 [29,85], F 38 [27,89] A 2 : F 19 [-330,-200], F 20 [-335,-205], F 21 [200,330], F 22 [205,335], F 31 [-200,-80], F 32 [-194,-74], F 33 [-190,-70], F 34 =[-190,-70] A 3 : F 19 [-330,-200], F 20 [-335,-205], F 21 [200,330], F 22 [205,335], F 31 [-200,-80], F 32 [-194,-74], F 33 [70,190], F 34 [70,190]
Problems Low-level segmentation –environmental changes –Illumination –Pose –capturing device characteristics –noise
Problems Low-level to high level feature (FAP) generation –Accuracy of estimation –Validation of results Anthripometric/psychological constraints 3D information, analysis by synthesis –Adaptation to context
Problems Statistical / rule-based recognition of high level features –Definition of general rules –Adaptation of rules to context/individuals –Multimodal recognition – dynamic analysis speech/face/gesture/biosignal/temporal Relation between modalities (significance, attention, adaptation) Neurofuzzy approaches –Portability of systems to avatars/applications (ontologies, languages)