Dmitri Bitouk Shree K. Nayar Columbia University Creating a Speech Enabled Avatar from a Single Photograph
Speech Enabled Avatar Input photograph
Speech Enabled Avatar Input photographAvatar
Speech Enabled Avatar Input photographAvatar Applications: mobile messaging and video conferencing news reporting and information kiosks novel user interfaces
Facial Motion Synthesis Challenges Mapping phonemes to static mouth shapes produces unrealistic, jerky animations Co-articulation: facial articulations can be dominated the preceding as well upcoming phonemes Asynchrony: facial motion may precede the corresponding sound
Related Work Avatars from video sequences Bregler et al 1997, Ezzat et al 2002, etc 2D Avatars from photographs Blanz et al 2003, CrazyTalk TM, MotionPortrait TM
Generic Facial Motion Model - Facial motion parameters Bitouk 2006 Prototype SurfaceDeformed Surface
Generic Facial Motion Model
Facial Motion Transfer Bitouk 2006 Prototype FaceNovel Faces
Facial Motion Transfer Bitouk 2006 Prototype FaceNovel Faces
Phonemes: /B/, /K/, /AA/, /IY/, etc With lexical: /B/, /K/, /AA0/, /AA1/, /IY0/, /IY1/, etc stress Triphones: Hidden Markov Models s1s1 s2s2 Facial motion parameters
Training Hidden Markov Models Training set consists of motion capture data Baum-Welch embedded re-estimation Cluster triphone states to predict triphones not seen in the training set
Facial Motion Synthesis from Text Text-to-Speech Engine Hidden Markov Models TextSpeech Facial Motion Parameters Time-labeled phonemes
Fitting the Prototype Model to an Image 2D Prototype FacePhotograph
Fitting the Prototype Model to an Image 2D Prototype FacePhotograph
Facial Motion Synthesis
Eye Motion Synthesis
Eyeball Texture Synthesis Eye ImageSynthesized Eyeball Texture
Eye Motion Synthesis Eye Motion Geometry
Eye Motion and Blinking
Visual Text-to-Speech Synthesis
Facial Motion Synthesis from Speech Speech Recognition Hidden Markov Models Speech Facial Motion Parameters Time-labeled phonemes
Facial Motion Synthesis from Speech
3D Avatars Mirror ViewDirect View Captured Stereo Image Gluckman & Nayar, 2001
3D Avatars Rectified Images3D Model Mirror ViewDirect View
3D Avatars Digital projector Point cloud engraved inside a glass cube Nayar & Anand, 2007
3D Avatars
Limitations and Future Work Automatic facial feature detection Synthesis of rigid head motion Expressive speech Web demo of our system will be available in early April
The End