Facial Animation By: Shahzad Malik CSC2529 Presentation March 5, 2003.

Slides:

Advertisements

Similar presentations

Graphics Pipeline.

Advertisements

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Automatic Feature Extraction for Multi-view 3D Face Recognition

A new approach for modeling and rendering existing architectural scenes from a sparse set of still photographs Combines both geometry-based and image.

Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.

Video Rewrite Driving Visual Speech with Audio Christoph Bregler Michele Covell Malcolm Slaney Presenter : Jack jeryes 3/3/2008.

SIGGRAPH Course 30: Performance-Driven Facial Animation Section: Markerless Face Capture and Automatic Model Construction Part 2: Li Zhang, Columbia University.

Human Face Modeling and Animation Example of problems in application of multimedia signal processing.

Direct Methods for Visual Scene Reconstruction Paper by Richard Szeliski & Sing Bing Kang Presented by Kristin Branson November 7, 2002.

(conventional Cartesian reference system)

Segmentation Divide the image into segments. Each segment:

Create Photo-Realistic Talking Face Changbo Hu * This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang.

CS223B Assignment 1 Recap. Lots of Solutions! 37 Groups Many different approaches Let’s take a peek at all 37 results on one image from the test set.

Face Recognition Based on 3D Shape Estimation

Texture Synthesis on Surfaces Paper by Greg Turk Presentation by Jon Super.

1 Expression Cloning Jung-yong Noh Ulrich Neumann Siggraph01.

A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.

Image Morphing, Triangulation CSE399b, Spring 07 Computer Vision.

3D object capture Capture N “views” (parts of the object) –get points on surface of object –create mesh (infer connectivity) Hugues Hoppe –filter data.

Visualization and graphics research group CIPIC February 13, 2003ECS289L – Multiresolution Methods – Winter Illumination Dependent Refinement of.

CSE554SimplificationSlide 1 CSE 554 Lecture 7: Simplification Fall 2014.

Special Applications If we assume a specific application, many image- based rendering tools can be improved –The Lumigraph assumed the special domain of.

Computer Graphics Inf4/MSc Computer Graphics Lecture 11 Texture Mapping.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

1 Computer Graphics Week13 –Shading Models. Shading Models Flat Shading Model: In this technique, each surface is assumed to have one normal vector (usually.

CSE554Laplacian DeformationSlide 1 CSE 554 Lecture 8: Laplacian Deformation Fall 2012.

COMP 175: Computer Graphics March 24, 2015

Computer graphics & visualization REYES Render Everything Your Eyes Ever Saw.

A D V A N C E D C O M P U T E R G R A P H I C S CMSC 635 January 15, 2013 Quadric Error Metrics 1/20 Quadric Error Metrics.

Multimodal Interaction Dr. Mike Spann

Use and Re-use of Facial Motion Capture M. Sanchez, J. Edge, S. King and S. Maddock.

Facial animation retargeting framework using radial basis functions Tamás Umenhoffer, Balázs Tóth Introduction Realistic facial animation16 is a challenging.

Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom

CS 551/651 Advanced Computer Graphics Warping and Morphing Spring 2002.

Real-Time Animation of Realistic Virtual Humans. 1. The 3D virtual player is controlled by the real people who has a HMD and many sensors people who has.

Geometric Operations and Morphing.

Digital Face Replacement in Photographs CSC2530F Project Presentation By: Shahzad Malik January 28, 2003.

Computer Graphics An Introduction. What’s this course all about? 06/10/2015 Lecture 1 2 We will cover… Graphics programming and algorithms Graphics data.

09/09/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Event management Lag Group assignment has happened, like it or not.

ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.

High-Resolution Interactive Panoramas with MPEG-4 발표자 : 김영백 임베디드시스템연구실.

CS 638, Fall 2001 Today Project Stage 0.5 Environment mapping Light Mapping.

110/20/ :06 Graphics II Paper Reviews Facial Animation Session 8.

1 Adding charts anywhere Assume a cow is a sphere Cindy Grimm and John Hughes, “Parameterizing n-holed tori”, Mathematics of Surfaces X, 2003 Cindy Grimm,

Course 13 Curves and Surfaces. Course 13 Curves and Surface Surface Representation Representation Interpolation Approximation Surface Segmentation.

CS-378: Game Technology Lecture #4: Texture and Other Maps Prof. Okan Arikan University of Texas, Austin V Lecture #4: Texture and Other Maps.

Advanced Computer Graphics Advanced Shaders CO2409 Computer Graphics Week 16.

1 Reconstructing head models from photograph for individualized 3D-audio processing Matteo Dellepiane, Nico Pietroni, Nicolas Tsingos, Manuel Asselot,

CSE554SimplificationSlide 1 CSE 554 Lecture 7: Simplification Fall 2013.

CSE554Fairing and simplificationSlide 1 CSE 554 Lecture 6: Fairing and Simplification Fall 2012.

Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.

Rick Parent - CIS681 Motion Capture Use digitized motion to animate a character.

Performance Driven Facial Animation

Presented by: Idan Aharoni

Robust Watermarking of 3D Mesh Models. Introduction in this paper, it proposes an algorithm that extracts 2D image from the 3D model and embed watermark.

Facial Animation Wilson Chang Paul Salmon April 9, 1999 Computer Animation University of Wisconsin-Madison.

Image-Based Rendering Geometry and light interaction may be difficult and expensive to model –Think of how hard radiosity is –Imagine the complexity of.

CDS 301 Fall, 2008 Domain-Modeling Techniques Chap. 8 November 04, 2008 Jie Zhang Copyright ©

Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.

Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.

Deformation Modeling for Robust 3D Face Matching Xioguang Lu and Anil K. Jain Dept. of Computer Science & Engineering Michigan State University.

Acquiring, Stitching and Blending Diffuse Appearance Attributes on 3D Models C. Rocchini, P. Cignoni, C. Montani, R. Scopigno Istituto Scienza e Tecnologia.

Applications and Rendering pipeline

Graphcut Textures:Image and Video Synthesis Using Graph Cuts

3D Rendering Pipeline Hidden Surface Removal 3D Primitives

Computer Animation and Visualisation Lecture 4. Skinning

CS-378: Game Technology Lecture #4: Texture and Other Maps

MP4 Walkthrough Video Animation

EE 492 ENGINEERING PROJECT

Presentation transcript:

Facial Animation By: Shahzad Malik CSC2529 Presentation March 5, 2003

Motivation Realistic human facial animation is a challenging problem (DOF, deformations) Realistic human facial animation is a challenging problem (DOF, deformations) Would like expressive and plausible animations of a photorealistic 3D face Would like expressive and plausible animations of a photorealistic 3D face Useful for virtual characters in film, video games Useful for virtual characters in film, video games

Papers Three major areas are Lip Syncing, Face Modeling, and Expression Synthesis Three major areas are Lip Syncing, Face Modeling, and Expression Synthesis Video Rewrite (Bregler – SIGGRAPH 1997) Video Rewrite (Bregler – SIGGRAPH 1997) Making Faces (Guenter – SIGGRAPH 1998) Making Faces (Guenter – SIGGRAPH 1998) Expression Cloning (Noh – SIGGRAPH 2001) Expression Cloning (Noh – SIGGRAPH 2001)

Video Rewrite Generate a new video of an actor mouthing a new utterance by piecing together old footage Generate a new video of an actor mouthing a new utterance by piecing together old footage Two stages: Two stages: Analysis Analysis Synthesis Synthesis

Analysis Stage Given footage of the subject speaking, extract mouth position and lip shape Given footage of the subject speaking, extract mouth position and lip shape Hand label 26 training images: Hand label 26 training images: 34 points on mouth (20 outer boundary, 12 inner boundary, 1 at bottom of upper teeth, 1 at top of lower teeth) 34 points on mouth (20 outer boundary, 12 inner boundary, 1 at bottom of upper teeth, 1 at top of lower teeth) 20 points on chin and jaw line 20 points on chin and jaw line Morph training set to get to 351 images Morph training set to get to 351 images

EigenPoints Create EigenPoint models using this set Create EigenPoint models using this set Use derived EigenPoints model to label features in all frames of the training video Use derived EigenPoints model to label features in all frames of the training video

EigenPoints (continued) Problem: EigenPoints assumes features are undergoing pure translation Problem: EigenPoints assumes features are undergoing pure translation

Face Warping Before EigenPoints labeling, warp each image into a reference plane Before EigenPoints labeling, warp each image into a reference plane Use a minimization algorithm to register images Use a minimization algorithm to register images

Face Warping (continued) Use rigid parts of face to estimate warp M Use rigid parts of face to estimate warp M Warp face by M -1 Warp face by M -1 Perform eigenpoint analysis Perform eigenpoint analysis Back-project features by M onto face Back-project features by M onto face

Audio Analysis Want to capture visual dynamics of speech Want to capture visual dynamics of speech Phonemes are not enough Phonemes are not enough Consider coarticulation Consider coarticulation Lip shapes for many phonemes are modified based on phoneme’s context (e.g. /T/ in “beet” vs. /T/ in “boot”) Lip shapes for many phonemes are modified based on phoneme’s context (e.g. /T/ in “beet” vs. /T/ in “boot”)

Audio Analysis (continued) Segment speech into triphones Segment speech into triphones e.g. “teapot” becomes /SIL-T-IY/, /T-IY- P/, /IY-P-AA/, /P-AA-T/, and /AA-T-SIL/) e.g. “teapot” becomes /SIL-T-IY/, /T-IY- P/, /IY-P-AA/, /P-AA-T/, and /AA-T-SIL/) Emphasize middle of each triphone Emphasize middle of each triphone Effectively captures forward and backward coarticulation Effectively captures forward and backward coarticulation

Audio Analysis (continued) Training footage audio is labeled with phonemes and associated timing Training footage audio is labeled with phonemes and associated timing Use gender-specific HMMs for segmentation Use gender-specific HMMs for segmentation Convert transcript into triphones Convert transcript into triphones

Synthesis Stage Given some new speech utterance Given some new speech utterance Mark it with phoneme labels Mark it with phoneme labels Determine triphones Determine triphones Find a video example with the desired transition in database Find a video example with the desired transition in database Compute a matching distance to each triphone: Compute a matching distance to each triphone: error = αD p + (1- α)D s

Viseme Classes Cluster phonemes into viseme classes Cluster phonemes into viseme classes Use 26 viseme classes (10 consonant, 15 vowel): Use 26 viseme classes (10 consonant, 15 vowel): (1) /CH/, /JH/, /SH/, /ZH/ (2) /K/, /G/, /N/, /L/ … (25) /IH/, /AE/, /AH/ (26) /SIL/

Phoneme Context Distance D p is phoneme context distance Distance is 0 if phonemic categories are the same (e.g. /P/ and /P/) Distance is 1 if viseme classes are different (e.g. /P/ and /IY/) Distance is between 0 and 1 if different phonemic classes but same viseme class (e.g. /P/ and /B/) Compute for the entire triphone Weight the center phoneme most

Lip Shape Distance D s is distance between lip shapes in overlapping triphones Eg. for “teapot”, contours for /IY/ and /P/ should match between /T-IY-P/ and /IY-P-AA/ Compute Euclidean distance between 4- element vectors (lip width, lip height, inner lip height, height of visible teeth) Solution depends on neighbors in both directions (use DP)

Time Alignment of Triphone Videos Need to combine triphone videos Need to combine triphone videos Choose portion of overlapping triphones where lip shapes are close as possible Choose portion of overlapping triphones where lip shapes are close as possible Already done when computing Ds Already done when computing Ds

Time Alignment to Utterance Still need to time align with target audio Still need to time align with target audio Compare corresponding phoneme transcripts Compare corresponding phoneme transcripts Start time of center phoneme in triphone is aligned with label in target transcript Start time of center phoneme in triphone is aligned with label in target transcript Video is then stretched/compressed to fit time needed between target phoneme boundaries Video is then stretched/compressed to fit time needed between target phoneme boundaries

Combining Lips and Background Need to stitch new mouth movie into background original face sequence Need to stitch new mouth movie into background original face sequence Compute transform M as before Compute transform M as before Warping replacement mask defines mouth and background portions in final video Warping replacement mask defines mouth and background portions in final video Mouth maskBackground mask

Combining Lips and Background Mouth shape comes from triphone image, and is warped using M Mouth shape comes from triphone image, and is warped using M Jaw shape is combination of background jaw and triphone jaw lines Jaw shape is combination of background jaw and triphone jaw lines Near ears, jaw dependent on background, near chin, jaw depends on mouth Near ears, jaw dependent on background, near chin, jaw depends on mouth Illumination matching is used to avoid seams mouth and background Illumination matching is used to avoid seams mouth and background

Video Rewrite Results Video: 8 minutes of video, 109 sentences Video: 8 minutes of video, 109 sentences Training Data: front-facing segments of video, around 1700 triphones Training Data: front-facing segments of video, around 1700 triphones “Emily” sequences

Video Rewrite Results 2 minutes of video, 1157 triphones 2 minutes of video, 1157 triphones JFK sequences

Video Rewrite Image-based facial animation system Image-based facial animation system Driven by audio Driven by audio Output sequence created from real video Output sequence created from real video Allows natural facial movements (eye blinks, head motions) Allows natural facial movements (eye blinks, head motions)

Making Faces Allows capturing facial expressions in 3D from a video sequence Allows capturing facial expressions in 3D from a video sequence Provides a 3D model and texture that can be rendered on 3D hardware Provides a 3D model and texture that can be rendered on 3D hardware

Data Capture Actor’s face digitized using a Cyberware scanner to get a base 3D mesh Actor’s face digitized using a Cyberware scanner to get a base 3D mesh Six calibrated video cameras capture actor’s expressions Six calibrated video cameras capture actor’s expressions Six camera views

Data Capture 182 dots are glued to actor’s face 182 dots are glued to actor’s face Each dot is one of six colors with fluorescent pigment Each dot is one of six colors with fluorescent pigment Dots of same color are placed as far apart as possible Dots of same color are placed as far apart as possible Dots follow the contours of the face (eyes, lips, nasio-labial furrows, etc.) Dots follow the contours of the face (eyes, lips, nasio-labial furrows, etc.)

Dot Labeling Each dot needs a unique label Each dot needs a unique label Dots will be used to warp the 3D mesh Dots will be used to warp the 3D mesh Also used later for texture generation from the six views Also used later for texture generation from the six views For each frame in each camera: For each frame in each camera: Classify each pixel as belonging to one of six categories Classify each pixel as belonging to one of six categories Find connected components Find connected components Compute the centroid Compute the centroid

Dot Labeling (continued) Need to compute dot correspondences between camera views Need to compute dot correspondences between camera views Must handle occlusions, false matches Must handle occlusions, false matches Compute all point correspondences between k cameras and n 2D dots Compute all point correspondences between k cameras and n 2D dots point correspondences

Dot Labeling (continued) For each correspondence For each correspondence Triangulate a 3D point based on closest intersection of rays cast through 2D dots Triangulate a 3D point based on closest intersection of rays cast through 2D dots Check if back-projection is above some threshold Check if back-projection is above some threshold All 3D candidates below threshold are stored All 3D candidates below threshold are stored

Dot Labeling (continued) Project stored 3D points into a reference view Project stored 3D points into a reference view Keep points that are within 2 pixels of dots in reference view Keep points that are within 2 pixels of dots in reference view These points are potential 3D matches for a given 2D dot These points are potential 3D matches for a given 2D dot Compute average as final 3D position Compute average as final 3D position Assign to 2D dot in reference view Assign to 2D dot in reference view

Dot Labeling (continued) Need to assign consistent labels to 3D dot locations across entire sequence Need to assign consistent labels to 3D dot locations across entire sequence Define a reference set of dots D (frame 0) Define a reference set of dots D (frame 0) Let d j Є D be the neutral location for dot j Let d j Є D be the neutral location for dot j Position of d j at frame i is d j i =d j +v j i Position of d j at frame i is d j i =d j +v j i For each reference dot, find the closest 3D dot of same color within some distance ε For each reference dot, find the closest 3D dot of same color within some distance ε

Moving the Dots Move reference dot to matched location Move reference dot to matched location For unmatched reference dot d k, let n k be the set of neighbor dots with match in current frame i For unmatched reference dot d k, let n k be the set of neighbor dots with match in current frame i

Constructing the Mesh Cyberware scan has problems: Cyberware scan has problems: Fluorescent markers cause bumps on mesh Fluorescent markers cause bumps on mesh No mouth opening No mouth opening Too many polygons Too many polygons Bumps removed manually Bumps removed manually Split mouth polygons, add teeth and tongue polygons Split mouth polygons, add teeth and tongue polygons Run mesh simplification algorithm (Hoppe’s algorithm: 460k to 4800 polys) Run mesh simplification algorithm (Hoppe’s algorithm: 460k to 4800 polys)

Moving the Mesh Move vertices by linear combination of offsets of nearest dots Move vertices by linear combination of offsets of nearest dots

Assigning Blend Coefficients Assign blend coefficients for a grid of 1400 evenly distributed points on face Assign blend coefficients for a grid of 1400 evenly distributed points on face

Assigning Blend Coefficients Label each dot, vertex, grid point as above, below, or neither Label each dot, vertex, grid point as above, below, or neither Find 2 closest dots to each grid point p Find 2 closest dots to each grid point p D n is set of dots within 1.8(d 1 +d 2 )/2 of p D n is set of dots within 1.8(d 1 +d 2 )/2 of p Remove points in relatively same direction Remove points in relatively same direction Assign blend values based on distance from p Assign blend values based on distance from p

Assign Blend Coefficients (cont.) If dot not in Dn, then a is 0 If dot not in Dn, then a is 0 If dot in Dn: If dot in Dn: For vertices, find closest grid points For vertices, find closest grid points Copy blend coefficients Copy blend coefficients

Dot Removal Substitute skin color for dot colors Substitute skin color for dot colors First low-pass filter the image First low-pass filter the image Directional filter prevents color bleeding Directional filter prevents color bleeding Black=symmetric, White=directional Black=symmetric, White=directional Face with dotsLow-pass mask

Dot Removal (continued) Extract rectangular patch of dot-free skin Extract rectangular patch of dot-free skin High-pass filter this patch High-pass filter this patch Register patch to center of dot regions Register patch to center of dot regions Blend it with the low-frequency skin Blend it with the low-frequency skin Clamp hue values to narrow range Clamp hue values to narrow range

Dot Removal (continued) OriginalLow-pass High-passHue clamped

Texture Generation Texture map generated for each frame Texture map generated for each frame Project mesh onto cylinder Project mesh onto cylinder Compute mesh location (k, β 1, β 2 ) for each texel (u,v) Compute mesh location (k, β 1, β 2 ) for each texel (u,v) For each camera, transform mesh into view For each camera, transform mesh into view For each texel, get 3D coordinates in mesh For each texel, get 3D coordinates in mesh Project 3D point to camera plane Project 3D point to camera plane Get color at (x,y) and store as texel color Get color at (x,y) and store as texel color

Texture Generation (continued) Compute a texel weight (dot product between texel normal on mesh and direction to camera) Compute a texel weight (dot product between texel normal on mesh and direction to camera) Merge the texture maps from all the cameras based on the weight map Merge the texture maps from all the cameras based on the weight map

Results

Making Faces Summary 3D geometry and texture of a face 3D geometry and texture of a face Data generated for each frame of video Data generated for each frame of video Shading and highlights “glued” to face Shading and highlights “glued” to face Nice results, but not totally automated Nice results, but not totally automated Need to repeat entire process for every new face we want to animate Need to repeat entire process for every new face we want to animate

Expression Cloning Allows facial expressions to be mapped from one model to another Allows facial expressions to be mapped from one model to another Source model Animation Target model

Expression Cloning Outline Motion capture data or any animation mechanism Deform Dense surface correspondences Vertex displacements Cloned expressions Motion transfer Source model Target model Source animation Target animation

Source Animation Creation Use any existing facial animation method Use any existing facial animation method Eg. “Making Faces” paper described earlier Eg. “Making Faces” paper described earlier Motion capture data Source modelSource animation

Dense Surface Correspondence Manually select correspondences Manually select correspondences Morph the source model using RBFs Morph the source model using RBFs Perform cylindrical projection (ray through source vertex, into target mesh) Perform cylindrical projection (ray through source vertex, into target mesh) Compute Barycentric coords of intersection Compute Barycentric coords of intersection After RBF Initial Features After projection

Automatic Feature Selection Can also automate initial correspondences Can also automate initial correspondences Use basic facts about human geometry: Use basic facts about human geometry: Tip of Nose = point with highest Z value Tip of Nose = point with highest Z value Top of Head = point with highest Y value Top of Head = point with highest Y value Currently use around 15 such heuristic rules Currently use around 15 such heuristic rules

Example Deformations Source Deformed sourceTarget Closely approximates the target models

Animation with Motion Vectors Animate by displacing target vertex by motion of corresponding source point Animate by displacing target vertex by motion of corresponding source point Interpolate Barycentric coordinates of target vertices based on source vertices Interpolate Barycentric coordinates of target vertices based on source vertices Need to project target model onto source model (opposite of what we did before) Need to project target model onto source model (opposite of what we did before)

Motion Vector Transfer Need to adjust direction and magnitude Need to adjust direction and magnitude SourceTarget Source Target Source motion vector Proper target motion vector

Motion Vector Transfer (cont.) Attach local coordinate system for each vertex in source and deformed source Attach local coordinate system for each vertex in source and deformed source X-axis = average normal X-axis = average normal Y-axis = proj. of any adjacent edge onto plane with X-axis as normal Y-axis = proj. of any adjacent edge onto plane with X-axis as normal Z-axis = cross product of X and Y Z-axis = cross product of X and Y Source Target Deformed source X Y Z X Y Z T m m M

Motion Vector Transfer Compute transformation between two coordinate systems Compute transformation between two coordinate systems Mapping determines the deformed source model motion vectors Mapping determines the deformed source model motion vectors

Example Motion Transfer Adjusted motions Models Motion vectors Source Target More horizontal Smaller

Results Angry expression Distorted mouth SourceTargets Big open mouth

Summary EC can animate new models using a library of existing expressions EC can animate new models using a library of existing expressions Transfers motion vectors from source animations to target models Transfers motion vectors from source animations to target models Process is fast and can be fully automated Process is fast and can be fully automated