Download presentation
Presentation is loading. Please wait.
Published byMarlene Chambers Modified over 9 years ago
1
110/20/2015 01:06 Graphics II 91.547 Paper Reviews Facial Animation Session 8
2
210/20/2015 01:06 Making Faces Brian Guenter, Cindy Grimm, Daniel Wood Henrique Malvar, Fredrick Pighin Microsoft Corporation, University of Washington Computer Graphics, Proceedings of SIGGRAPH, July 1998.
3
310/20/2015 01:06 The Problem 0 Animation of human faces one of the most difficult problems in graphics -Many degrees of freedom -Humans sensitive to small discrepancies 0 Applications -Believable animated characters for movies, television -Avatars for VR -Video compression for teleconferencing
4
410/20/2015 01:06 Approach 0 Single 3D mesh model of speaker’s face is developed 0 Key locations on face are marked by colored dots 0 Video of speaking face is recorded from multiple directions 0 Positions of colored dots are tracked in 3D 0 Mesh is deformed accordingly 0 Texture is evolved from video and applied
5
510/20/2015 01:06 Distinctions from Previous Work 0 Not a physical or procedural model of face, as Cassell (1994), Lee (1995) and Waters (1987) 0 Williams (1990) similar, except used only a single static texture and tracked points only in 2D 0 Bregler et al. Use speech recognition techniques to locate visemes in video sequence - then blend and synthesize new video without physical 3D model 0 Primarily a method to track and recreate facial details
6
610/20/2015 01:06 Data Capture 0 Face first digitized by Cyberware laser scanner to build 3D model 0 182 colored dots affixed to face -Fluorescent colors used -Visible & UV light to maximize color differences -Arranged so similar colors as far apart as possible -Arranged along contours of face 0 6 calibrated video cameras to capture motion 0 Head held stationary
7
710/20/2015 01:06 Cyberware scanner Digitizes both shape and color 1 mm x 1 mm resolution 30 seconds to complete scan 3D model within seconds of completing scan
8
810/20/2015 01:06 Camera views
9
910/20/2015 01:06 Mesh Generation 0 Cyberware scan of head 0 460,000 polygons reduced to 4800 using Hoppe’s method (Hoppe 1996) 0 Some manual adjustment where scan was inadequate -One side of nose and ear -Inside mouth
10
1010/20/2015 01:06 Training the color classifier
11
1110/20/2015 01:06 Determination of dot centroids
12
1210/20/2015 01:06 Tracking dot movements over time
13
1310/20/2015 01:06 Matching Dots Between Cameras Camera 1 COP Camera 2 COP Image 1 Image 2 Match Line of closest approach
14
1410/20/2015 01:06 Matching Dots Between Cameras Camera 1 COP Camera 2 COP Image 1 Image 2 No Match Line of closest approach
15
1510/20/2015 01:06 Determining frame to frame dot correspondence
16
1610/20/2015 01:06 Mesh Deformation Step 1:
17
1710/20/2015 01:06 Areas treated specially in frame to frame mapping
18
1810/20/2015 01:06 Creating the texture maps
19
1910/20/2015 01:06 Removal of dots from texture images
20
2010/20/2015 01:06 Texture Mapping
21
2110/20/2015 01:06 Results 0 Animated sequences from arbitrary viewpoints surprisingly realistic 0 Little evidence of the removed dots 0 Some artifacts visible -Some polygonalization artifacts due to the use of only 4500 polygons -Occasional jitter due to algorithms chosen - and dots disappearing from view on more than 3 cameras -Teeth incorrect when viewed from glancing angle -Shading/highlights do not move with viewpoint properly
22
2210/20/2015 01:06 Resulting animation sequence
23
2310/20/2015 01:06 Synthesizing Realistic Facial Expressions from Photographs Frederic Pighin, Jamie Hecker, Dani Lischinski, Richard Szeliski, David H. Salesin Computer Graphics, Proceedings of SIGGRAPH, July 1998.
24
2410/20/2015 01:06 Approach 0 Based upon multiple, uncalibrated views of a human subject 0 User-assisted optimization technique recovers camera parameters and 3D point locations of selected features 0 Scattered data interpolation technique is used to deform a generic face mesh to correspond to face of subject 0 Extract one or more texture maps from camera images 0 Process repeated for multiple expressions of subject -Morphing techniques developed to animate change of expression -Expressions combined globally and locally to generate new expressions
25
2510/20/2015 01:06 Extraction of 3D points from multiple images 0 A few distinct feature points are selected (13 in this case) on each image 0 Approximate starting point assumed -Generic head model gives starting 3D point -Camera pose parameters are approximated 0 Progressive refinement via algorithm (Szeliska and Kang, 1997)
26
2610/20/2015 01:06 Refinement of generic mesh 0 Starting points are coordinates of feature points 0 Smooth interpolation function gives 3D displacement for other mesh vertices
27
2710/20/2015 01:06 View-independent texture mapping 0 Texture map defined in cylindrical coordinates 0 Construct a visibility map for each image 0 For each image: -Compute 3D point on the surface of the face mesh whose cylindrical projection is (u,v) -Project back using camera geometry to determine (x,y) -Weight by “positional certainty and visibility function
28
2810/20/2015 01:06 View-dependent texture mapping 0 View-independent method involves blending resampled images and thus potentially looses detail 0 Alternate method: -Render model several times, each time with separate texture map -Blend the results weighted in accordance with the viewing direction 0 Advantage: makes up for potential lack of detail in the geometry model 0 Disadvantage: requires multi-pass rendering and more rendering time
29
2910/20/2015 01:06 Comparison of view-independent and view-dependent texture generation View Independent View Dependent
30
3010/20/2015 01:06 Expression morphing 0 Because topology of models for differing expressions is equivalent, morphing can be reduced to simple linear interpolation of vertex locations 0 Face model is rendered once with one texture and once with the second. The resulting images are blended
31
3110/20/2015 01:06 Global vs. regional expression changes Global Blend “Surprised” “Sad” “Worried” “Neutral” “Happy” “Fake Smile” Local Blend
32
3210/20/2015 01:06 Painterly Interface
33
3310/20/2015 01:06 User Interface
34
3410/20/2015 01:06 Voice Puppetry Matthew Brand, Mitsubishi Research Laboratory Computer Graphics, Proceedings of ACM SIGGRAPH, August 1999.
35
3510/20/2015 01:06 Background 0 Significant mutual information between vocal and facial gesture -Facial expression adds significantly to information content of speech -Facial expression conveys emotional content 0 Synthesis of believable faces difficult problem in animation -Human sensitivity to facial details -Central problem for the creation of avatars in virtual environments
36
3610/20/2015 01:06 Previous Work 0 Manual lip-synching is a laborious process 0 Semi-automated systems usually based on phonemes -Phonemic tokens mapped to lip poses -Problem of coarticulation, i.e. interaction of adjacent speech components -Some systems deal with context using triphonemes (Berger et al. 1997) -Discretizing to phonemes loses information on phrasing, dynamics 0 Most systems ignore upper facial expressions, which convey a significant portion of the emotional content
37
3710/20/2015 01:06 Proposed System 0 A model of a face’s observed dynamics during speech is learned 0 A mapping from vocal patterns to facial motion trajectories is learned 0 Animation is produced by using voice information to steer the model Voice Information Facial Motion Training Voice Input Training Video Input Model
38
3810/20/2015 01:06 Advantages 0 System makes full use of forward and backward context in speech 0 Single video training can be used to animate other persons or creatures to novel audio 0 Animates speech and non-speech sounds 0 Predicts full facial motion, neck to hairline 0 Can drive 2D, 3D or image-based animations
39
3910/20/2015 01:06 Overview of the Process Training Remapping Analysis Synthesis
40
4010/20/2015 01:06 Facial Tracking 26 Facial points tracked Tracking based upon texture features Marks added where inadequate texture variations
41
4110/20/2015 01:06 Reuse of Hidden Markov Model State Machine
42
4210/20/2015 01:06 Mean Position Content for Learned States
43
4310/20/2015 01:06 Results 0 Multiple readers of children’s stories were recorded for 180 seconds -60 seconds of video selected for training -25 facial features tracked -Result: 26 state model 0 Using different voice, the model was used to animate Mt. Rushmore (see next slide) 0 Simple test conducted to evaluate opinions of naïve users -6 subjects compared: =Synthesized facial motion =Actual tracked motion -Half found actual motion “more natural” -Half found synthetic motion “more natural”
44
4410/20/2015 01:06 Animation of Jefferson at Mt. Rushmore UnanimatedAnimated
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.