Multimodal Caricatural Mirror Olivier Martin, UCL (Belgium)
Project Goals Create a Multimodal caricatural mirror : Multimodal = facial + vocal Caricatural = Amplify emotions Mirror = Face your avatar! 2/24/2019
Motivations Emotion Recognition intelligent systems Modelling emotions emotion synthesis Interactions real emotions database Multimodal Gain 2/24/2019
Technical challenges Multimodal Face Tracking Facial features’ extraction Vocal features’ extraction Multimodal emotion recognition Multimodal emotion synthesis 2/24/2019
Multimodal Face Tracking Automatic tracking of the face, based upon: Skin colour information Ellipsoid-shaped properties (Hough transform,…) Luminance/Chrominance gradient Pre-segmentation of user’s body Array of microphones Infering face from facial features… … 2/24/2019
Skin detection 2/24/2019
Trace Transform using luminance gradient 2/24/2019
Technical challenges Multimodal Face Tracking Facial features’ extraction Vocal features’ extraction Multimodal emotion recognition Multimodal emotion synthesis 2/24/2019
Facial features extraction Detect and track facial features : Localization : learning and/or heuristics Extraction : exploiting a priori knowledge Shape/contour information ‘Crucial points’ information (MPEG-4,…) Temporal ripples … 2/24/2019
Facial features’ extraction 2/24/2019
Facial features’ extraction 2/24/2019
‘Emotional Mask’ 2/24/2019
Technical challenges Multimodal Face Tracking Facial features’ extraction Vocal features’ extraction Multimodal emotion recognition Multimodal emotion synthesis 2/24/2019
Vocal features’ extraction Pitch, energy, speaking rate, noise, MFCC,…are related to ‘the way we speak’ (prosody) Statistics about the features (mean, std dev, enveloppe, …) Learning strategy for features’ selection, for each emotion (forward/backward selection) 2/24/2019
Technical challenges Multimodal Face Tracking Facial features’ extraction Vocal features’ extraction Multimodal emotion recognition Multimodal emotion synthesis 2/24/2019
Multimodal emotion recognition Compare monomodal systems’ performances to multimodal system’s performances, for each emotion build intelligent classifiers How to synchronize the modalities ? Fusion at which level of the decision process ? (signal level vs semantic level) 2/24/2019
Technical challenges Multimodal Face Tracking Facial features’ extraction Vocal features’ extraction Multimodal emotion recognition Multimodal emotion synthesis 2/24/2019
Multimodal emotion synthesis How to amplify the expression of an emotion ? Build an effective and realistic mapping Synchronisation (lips!) 2/24/2019
Real-time aspects Ideally, facial modality should be real-time Ideally, vocal modality should not be real-time Goal : Minimize the delay between end of user’s actions and system reactions. 2/24/2019
Technology This has to be discussed within the team… Two types of Machine Learning techniques seem efficient & we have skills! : Support Vector Machines Dynamic Bayesian Networks Powerful animation engines (Maya, 3DSMax,…) Communication between modules : OpenInterface 2/24/2019
The Team ! Jordi Adell (UPC, Barcelona) Ana Huerta (T.U. Madrid) Irene Kotsia (A.U. Thessaloniki) Benoit Macq (UCL, Belgium) Olivier Martin (UCL, Belgium) Hannes Pirker (OFAI, Vienna) Arman Savran (Boun, Istanbul) Rafaël Sebbe (TCTS, Mons) [Alexandre Benoît(INPG, Grenoble)] 2/24/2019