Download presentation
Presentation is loading. Please wait.
Published byDoddy Suharto Sugiarto Modified over 6 years ago
1
MikeTalk:An Adaptive Man-Machine Interface
Tony Ezzat Volker Blanz Tomaso Poggio
2
TTVS Overview Input: Text
Output: Photo-realistic talking face uttering text
3
Desktop Agents
4
You have received 1 email
Desktop Agents You have received 1 from Tommy Poggio.
5
Customer Support
6
Customer Support You have bought 20 shares of SONY at $40 each.
7
Advertisements
8
Advertisements Hi Tony, would you be interested
in a ticket from Boston to New York for $50.00?
9
Modules
10
Phoneme Corpus Step 1: collect a visual corpus from a subject
corpus contains 44 words one word for each American English phoneme
11
6 Consonantal Visemes Step 2: extract one image per phoneme: viseme
group visemes together by visual similarity
12
9 Vocalic Visemes (+ 1 SilenceViseme)
13
Problem1:Need to Interpolate!
14
Simultaneous interpolation of shape & texture. (Beier & Neely 1992)
Solution: Morphing! Simultaneous interpolation of shape & texture. (Beier & Neely 1992) Problem 2: too tedious to specify correspondence by hand across many images!
15
Solution: Optical Flow
(Horn & Schunk 1986) (Lucas & Kanade 1988) To interpolate between two visemes, optical flow is first computed A 2D motion vector field is produced: dx(x,y) dy(x,y)
16
Morphing Forward warping A to B Forward warping B to A Blending
Holefilling
17
Synthesis Database 16 Visemes total
256 Optical flow vectors total, from every viseme to every other viseme
18
Concatenation and Lip Sync
Load the correct viseme transitions Concatenate viseme transitions Sample the viseme transitions using audio durations
19
Examples “1, 2, 3, 4, 5” “you have received 10 email messages.”
“cat, dog, pig, cow, moose, horse, sheep”
20
Current Work Coarticulation Eye + head movements Emotion
3D instead of 2d Psychophysics
21
3D With Volker Blanz
22
The End
23
Co-articulation Problem: Current method does not handle coarticulation, so speech looks overly articulated Can record all possible triphones/ quadriphones but this approach requires a lot of data! Best method is to learn a model for coarticulation, but what is the representation for the lips?
24
Principal Components Analysis
Each image is a vector in a high-dimensional space Using PCA, find the optimal set of vectors that span the space Project the entire corpus onto those basis vectors
25
Top 2 PCA Bases for /buut/
26
Problem: Too nonlinear!
Top 2 PCA Bases for /get/ Problem: Too nonlinear!
27
Flow Component Analysis
Compute optical from a reference lip image to all other images in the corpus Compute PCA on all the flows
28
Top 2 FPCA Bases for /buut/
29
Top 2 FPCA Bases for /get/
Much more linear behavior!
30
Current Work Now that we have parameterized the mouth, what is the model for mouth synthesis? How is that model fit to the PCA data?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.