Multimodal Caricatural Mirror

Multimodal Caricatural Mirror
FINAL PROJECT PRESENTATION

Project Summary Create a Multimodal caricatural mirror :
Multimodal = facial + vocal Caricatural = Amplify emotions Mirror = Face your avatar! 2/24/2019

Outline Project Architecture Visual Modality : Audio Modality :
Face Tracking Facial Features Detection Facial Features Tracking Facial Expression Recognition Emotion Modeling Facial Animation Audio Modality : Vocal Features Extraction Prosody Amplification Multimodal Fusion : Multimodal Synchronized Emotion Synthesis 2/24/2019

Project Architecture The ‘Mamama’ option
User Face tracking Speech Signal Avatar Facial Features Tracking Prosodic Features Extraction Emotion Recognition Prosody Processing t’ = f(t) Facial Animation Prosody Amplification Fusion 2/24/2019

Face Tracking We chose to use an open-source software :
OpenCV face tracker Trained on a large database (no tuning necessary) Color Tracking using CAMSHIFT algorithm (Mean-Shift) 2/24/2019

Facial Features Detection
Step 1 : Facial Features Detection (1st frame) Transformation to grayscale image Computation of image’s trace transform (luminance on M vertical lines) From sets of local minima, infer positions of facial features (eyebrows, eyes and mouth) using a priori face’s morphology knowledge (heuristics)  Automatic Initialization of the Candide grid (1st frame) 2/24/2019

Facial Features Tracking
2/24/2019

Facial Features Tracking
Step 2 : Facial Features Tracking (all frames > 1) 2/24/2019

Emotion Recognition (visual modality)
We use Support Vector Machines (SVMs) as emotion classifier : Find the hyperplanes such as the margins between classes are maximized (in a low-dimensional subspace), using an appropriate kernel function. Classification for every frame, but possibility to introduce temporal dependancies between successive decisions, to recover short tracking errors (not ‘error bursts’) Good robustness against overfitting (only training samples defining the margins are kept) 2/24/2019

Emotions modeling 2/24/2019

Facial Animation Among 3D face models, we chose to use Candide3
for the animation It includes animation units and MPEG-4 FAPs Animation software is written in C++ by using OpenGL and SDL APIs, which are open source and can run on many platforms. 2/24/2019

Vocal Features extraction
Pitch Variations Amplification Pitch is extracted using an algorithm based on autocorrelation function Pitch variations are then modified using PSOLA Figure: downtrend of the pitch is first removed, then pitch movements are amplified and finally downtrend is set back 2/24/2019

‘Speaking Rate’ Processing
Low-pass filtering (emphasize voiced regions) Sliding window (~0.75s)  # energy maxima ? Estimation of ‘speaking rate’ over time ‘Speaking-rate’ distorsion function : sigmoïdal transition function 2/24/2019

Multimodal Emotion Synthesis
Changing ‘speaking rate’ is equivalent to changing time scale …  The ‘speaking rate processing’ stage generates a time scale distorsion function:  t < T0 : t’ = F (t) which is fed to the facial animation engine, which generates animation synchronized on output speech signal (‘mamama’ only ;-) 2/24/2019

Come & Try it by yourself this afternoon !

Multimodal Caricatural Mirror

Similar presentations

Presentation on theme: "Multimodal Caricatural Mirror"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multimodal Caricatural Mirror

Similar presentations

Presentation on theme: "Multimodal Caricatural Mirror"— Presentation transcript:

Similar presentations

About project

Feedback