Download presentation
Presentation is loading. Please wait.
1
Multimodal Caricatural Mirror
FINAL PROJECT PRESENTATION
2
Project Summary Create a Multimodal caricatural mirror :
Multimodal = facial + vocal Caricatural = Amplify emotions Mirror = Face your avatar! 2/24/2019
3
Outline Project Architecture Visual Modality : Audio Modality :
Face Tracking Facial Features Detection Facial Features Tracking Facial Expression Recognition Emotion Modeling Facial Animation Audio Modality : Vocal Features Extraction Prosody Amplification Multimodal Fusion : Multimodal Synchronized Emotion Synthesis 2/24/2019
4
Project Architecture The ‘Mamama’ option
User Face tracking Speech Signal Avatar Facial Features Tracking Prosodic Features Extraction Emotion Recognition Prosody Processing t’ = f(t) Facial Animation Prosody Amplification Fusion 2/24/2019
5
Face Tracking We chose to use an open-source software :
OpenCV face tracker Trained on a large database (no tuning necessary) Color Tracking using CAMSHIFT algorithm (Mean-Shift) 2/24/2019
6
Facial Features Detection
Step 1 : Facial Features Detection (1st frame) Transformation to grayscale image Computation of image’s trace transform (luminance on M vertical lines) From sets of local minima, infer positions of facial features (eyebrows, eyes and mouth) using a priori face’s morphology knowledge (heuristics) Automatic Initialization of the Candide grid (1st frame) 2/24/2019
7
Facial Features Tracking
2/24/2019
8
Facial Features Tracking
Step 2 : Facial Features Tracking (all frames > 1) 2/24/2019
9
Emotion Recognition (visual modality)
We use Support Vector Machines (SVMs) as emotion classifier : Find the hyperplanes such as the margins between classes are maximized (in a low-dimensional subspace), using an appropriate kernel function. Classification for every frame, but possibility to introduce temporal dependancies between successive decisions, to recover short tracking errors (not ‘error bursts’) Good robustness against overfitting (only training samples defining the margins are kept) 2/24/2019
10
Emotions modeling 2/24/2019
11
Emotions modeling 2/24/2019
12
Facial Animation Among 3D face models, we chose to use Candide3
for the animation It includes animation units and MPEG-4 FAPs Animation software is written in C++ by using OpenGL and SDL APIs, which are open source and can run on many platforms. 2/24/2019
13
Vocal Features extraction
Pitch Variations Amplification Pitch is extracted using an algorithm based on autocorrelation function Pitch variations are then modified using PSOLA Figure: downtrend of the pitch is first removed, then pitch movements are amplified and finally downtrend is set back 2/24/2019
14
‘Speaking Rate’ Processing
Low-pass filtering (emphasize voiced regions) Sliding window (~0.75s) # energy maxima ? Estimation of ‘speaking rate’ over time ‘Speaking-rate’ distorsion function : sigmoïdal transition function 2/24/2019
15
Multimodal Emotion Synthesis
Changing ‘speaking rate’ is equivalent to changing time scale … The ‘speaking rate processing’ stage generates a time scale distorsion function: t < T0 : t’ = F (t) which is fed to the facial animation engine, which generates animation synchronized on output speech signal (‘mamama’ only ;-) 2/24/2019
16
Come & Try it by yourself this afternoon !
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.