Presentation is loading. Please wait.

Presentation is loading. Please wait.

MikeTalk:An Adaptive Man-Machine Interface

Similar presentations


Presentation on theme: "MikeTalk:An Adaptive Man-Machine Interface"— Presentation transcript:

1 MikeTalk:An Adaptive Man-Machine Interface
Tony Ezzat Volker Blanz Tomaso Poggio

2 TTVS Overview Input: Text
Output: Photo-realistic talking face uttering text

3 Desktop Agents

4 You have received 1 email
Desktop Agents You have received 1 from Tommy Poggio.

5 Customer Support

6 Customer Support You have bought 20 shares of SONY at $40 each.

7 Advertisements

8 Advertisements Hi Tony, would you be interested
in a ticket from Boston to New York for $50.00?

9 Modules

10 Phoneme Corpus Step 1: collect a visual corpus from a subject
corpus contains 44 words one word for each American English phoneme

11 6 Consonantal Visemes Step 2: extract one image per phoneme: viseme
group visemes together by visual similarity

12 9 Vocalic Visemes (+ 1 SilenceViseme)

13 Problem1:Need to Interpolate!

14 Simultaneous interpolation of shape & texture. (Beier & Neely 1992)
Solution: Morphing! Simultaneous interpolation of shape & texture. (Beier & Neely 1992) Problem 2: too tedious to specify correspondence by hand across many images!

15 Solution: Optical Flow
(Horn & Schunk 1986) (Lucas & Kanade 1988) To interpolate between two visemes, optical flow is first computed A 2D motion vector field is produced: dx(x,y) dy(x,y)

16 Morphing Forward warping A to B Forward warping B to A Blending
Holefilling

17 Synthesis Database 16 Visemes total
256 Optical flow vectors total, from every viseme to every other viseme

18 Concatenation and Lip Sync
Load the correct viseme transitions Concatenate viseme transitions Sample the viseme transitions using audio durations

19 Examples “1, 2, 3, 4, 5” “you have received 10 email messages.”
“cat, dog, pig, cow, moose, horse, sheep”

20 Current Work Coarticulation Eye + head movements Emotion
3D instead of 2d Psychophysics

21 3D With Volker Blanz

22 The End

23 Co-articulation Problem: Current method does not handle coarticulation, so speech looks overly articulated Can record all possible triphones/ quadriphones but this approach requires a lot of data! Best method is to learn a model for coarticulation, but what is the representation for the lips?

24 Principal Components Analysis
Each image is a vector in a high-dimensional space Using PCA, find the optimal set of vectors that span the space Project the entire corpus onto those basis vectors

25 Top 2 PCA Bases for /buut/

26 Problem: Too nonlinear!
Top 2 PCA Bases for /get/ Problem: Too nonlinear!

27 Flow Component Analysis
Compute optical from a reference lip image to all other images in the corpus Compute PCA on all the flows

28 Top 2 FPCA Bases for /buut/

29 Top 2 FPCA Bases for /get/
Much more linear behavior!

30 Current Work Now that we have parameterized the mouth, what is the model for mouth synthesis? How is that model fit to the PCA data?


Download ppt "MikeTalk:An Adaptive Man-Machine Interface"

Similar presentations


Ads by Google