Multimodal Caricatural Mirror FINAL PROJECT PRESENTATION
Project Summary Create a Multimodal caricatural mirror : Multimodal = facial + vocal Caricatural = Amplify emotions Mirror = Face your avatar! 2/24/2019
Outline Project Architecture Visual Modality : Audio Modality : Face Tracking Facial Features Detection Facial Features Tracking Facial Expression Recognition Emotion Modeling Facial Animation Audio Modality : Vocal Features Extraction Prosody Amplification Multimodal Fusion : Multimodal Synchronized Emotion Synthesis 2/24/2019
Project Architecture The ‘Mamama’ option User Face tracking Speech Signal Avatar Facial Features Tracking Prosodic Features Extraction Emotion Recognition Prosody Processing t’ = f(t) Facial Animation Prosody Amplification Fusion 2/24/2019
Face Tracking We chose to use an open-source software : OpenCV face tracker Trained on a large database (no tuning necessary) Color Tracking using CAMSHIFT algorithm (Mean-Shift) 2/24/2019
Facial Features Detection Step 1 : Facial Features Detection (1st frame) Transformation to grayscale image Computation of image’s trace transform (luminance on M vertical lines) From sets of local minima, infer positions of facial features (eyebrows, eyes and mouth) using a priori face’s morphology knowledge (heuristics) Automatic Initialization of the Candide grid (1st frame) 2/24/2019
Facial Features Tracking 2/24/2019
Facial Features Tracking Step 2 : Facial Features Tracking (all frames > 1) 2/24/2019
Emotion Recognition (visual modality) We use Support Vector Machines (SVMs) as emotion classifier : Find the hyperplanes such as the margins between classes are maximized (in a low-dimensional subspace), using an appropriate kernel function. Classification for every frame, but possibility to introduce temporal dependancies between successive decisions, to recover short tracking errors (not ‘error bursts’) Good robustness against overfitting (only training samples defining the margins are kept) 2/24/2019
Emotions modeling 2/24/2019
Emotions modeling 2/24/2019
Facial Animation Among 3D face models, we chose to use Candide3 for the animation It includes animation units and MPEG-4 FAPs Animation software is written in C++ by using OpenGL and SDL APIs, which are open source and can run on many platforms. 2/24/2019
Vocal Features extraction Pitch Variations Amplification Pitch is extracted using an algorithm based on autocorrelation function Pitch variations are then modified using PSOLA Figure: downtrend of the pitch is first removed, then pitch movements are amplified and finally downtrend is set back 2/24/2019
‘Speaking Rate’ Processing Low-pass filtering (emphasize voiced regions) Sliding window (~0.75s) # energy maxima ? Estimation of ‘speaking rate’ over time ‘Speaking-rate’ distorsion function : sigmoïdal transition function 2/24/2019
Multimodal Emotion Synthesis Changing ‘speaking rate’ is equivalent to changing time scale … The ‘speaking rate processing’ stage generates a time scale distorsion function: t < T0 : t’ = F (t) which is fed to the facial animation engine, which generates animation synchronized on output speech signal (‘mamama’ only ;-) 2/24/2019
Come & Try it by yourself this afternoon !