W M AM A I AI IM AIM Time (samples) Response (V) True rating Predicted rating =0.94 =0.86 Irritation Pleasantness
Extract acoustic features Synchronized audio-visual recording Pre-processing and windowing Add context time Extract Video frames Track MPEG-4 facial markers 3D reconstruction from stereo Articulatory trajectories Audio-visual lookup table Acoustic analysis (PCBF, Energy, F0) Table lookup: Nearest-neighbors in acoustic feature space 3D Animation/Synthesis TRAINING PHASE RECALL PHASE Extract speech waveform AUDIO PROCESSING VIDEO PROCESSING PCBF Energy F0 Width Height Novel speech signal