Download presentation
Presentation is loading. Please wait.
1
Dmitri Bitouk Shree K. Nayar Columbia University Creating a Speech Enabled Avatar from a Single Photograph
2
Speech Enabled Avatar Input photograph
3
Speech Enabled Avatar Input photographAvatar
4
Speech Enabled Avatar Input photographAvatar Applications: mobile messaging and video conferencing news reporting and information kiosks novel user interfaces
5
Facial Motion Synthesis Challenges Mapping phonemes to static mouth shapes produces unrealistic, jerky animations Co-articulation: facial articulations can be dominated the preceding as well upcoming phonemes Asynchrony: facial motion may precede the corresponding sound
6
Related Work Avatars from video sequences Bregler et al 1997, Ezzat et al 2002, etc 2D Avatars from photographs Blanz et al 2003, CrazyTalk TM, MotionPortrait TM
7
Generic Facial Motion Model - Facial motion parameters Bitouk 2006 Prototype SurfaceDeformed Surface
8
Generic Facial Motion Model
9
Facial Motion Transfer Bitouk 2006 Prototype FaceNovel Faces
10
Facial Motion Transfer Bitouk 2006 Prototype FaceNovel Faces
11
Phonemes: /B/, /K/, /AA/, /IY/, etc With lexical: /B/, /K/, /AA0/, /AA1/, /IY0/, /IY1/, etc stress Triphones: Hidden Markov Models s1s1 s2s2 Facial motion parameters
12
Training Hidden Markov Models Training set consists of motion capture data Baum-Welch embedded re-estimation Cluster triphone states to predict triphones not seen in the training set
13
Facial Motion Synthesis from Text Text-to-Speech Engine Hidden Markov Models TextSpeech Facial Motion Parameters Time-labeled phonemes
14
Fitting the Prototype Model to an Image 2D Prototype FacePhotograph
15
Fitting the Prototype Model to an Image 2D Prototype FacePhotograph
16
Facial Motion Synthesis
17
Eye Motion Synthesis
18
Eyeball Texture Synthesis Eye ImageSynthesized Eyeball Texture
19
Eye Motion Synthesis Eye Motion Geometry
20
Eye Motion and Blinking
21
Visual Text-to-Speech Synthesis
23
Facial Motion Synthesis from Speech Speech Recognition Hidden Markov Models Speech Facial Motion Parameters Time-labeled phonemes
24
Facial Motion Synthesis from Speech
25
3D Avatars Mirror ViewDirect View Captured Stereo Image Gluckman & Nayar, 2001
26
3D Avatars Rectified Images3D Model Mirror ViewDirect View
27
3D Avatars Digital projector Point cloud engraved inside a glass cube Nayar & Anand, 2007
28
3D Avatars
29
Limitations and Future Work Automatic facial feature detection Synthesis of rigid head motion Expressive speech Web demo of our system will be available in early April www.cs.columbia.edu/CAVE/
30
The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.