Download presentation
Presentation is loading. Please wait.
Published byBertram Carter Modified over 8 years ago
1
Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU
2
Overview Objectives Scenario Equipment specifications Subjects & Procedure Visual aspects Acoustic aspects Future processing Please try this at home…
3
Objectives Collection of emotional multimodal data Process different modalities Holy Grail: “EMOTION RECOGNITION”
4
Scenario Inspired by GEMEP corpus Pseudo-language sentence (“Toko”, damato ma gali sa) Standing body posture 10 subjects 8 emotions uniformly distributed through the quadrants (2D emotion theory, valence-arousal) 3 repetitions of emotion specific gesture 3 repetitions of emotion independent gesture
5
Emotion specific gestures despairleave me alone hot angerviolent descend of hands irritationsmooth go away sadnesssmooth falling hands interestraise hands pleasureopen hands joyitalianate/explain prideclose hands
6
Equipment specifications 2 DV cameras Full body Face Wireless microphone (shirt-mounted) PC + External sound card Uniform dark background 2 artificial light sources Light coloured, long sleeves shirt ;)
7
Subjects & Procedure Subjects 10 “actors” 6 males 4 females despair, hot anger, irritation sadness, interest, pleasure, joy, pride Procedure Subject instructions Clap before every execution: synchronize streams
8
Video quality issues Highest possible resolution Progressive video (not interlaced) Correct exposure Good color quality No compression artifacts Uniform lighting
9
Interlacing / Over-exposure Interlacing / De- Interlacing Over-exposure 70% zebra pattern Prefer lower-exposure so signal will not be clipped
10
Colour/Lighting Medium Y/C Resolution Compression Artifacts Exposure Good Video quality Source: DV
11
Archiving PAL: 720x576 @ 25 frames/second DV Format: ~36Mbit/sec ~16 GBytes/hour MPEG2 @ 4-8Mbit/sec (DVD quality) ~1.8-3.5 GB/hour MPEG-1 @ 1.1 Mbit/sec ~500MBytes/hour
12
Visual Aspects Summary Video Camera DV or Better Progressive Scan Capability Over-Exposure Indication, Zebra Patterns Shooting Use the zebra patterns at 70% Zoom in as much as possible to increase subject’s resolution Facial features must be visible for facial analysis Try to avoid occlusions (hair, glasses, clothes, hand movement) Uniform lighting conditions Archive DV tapes, DV Video or Frames, (not MPEG-1)
13
Acoustic aspects Why: “Toko, damato ma gali sa”? Toko: solicitation by naming the interlocutor Vowels found in majority of language Meaning: Toko, can you open it? (request) for maintaining semantic aspect Sampling frequency 44.1 kHz 16 bits mono information depth Uncompressed.wav files
14
Future processing Process different modalities Facial feature extraction Gesture expressiveness analysis Acoustic analysis Gesture recognition Synchronization Modalities fusion RNN RSOM + Markov SVM … Emotion recognition
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.