Face Animation Overview with Shameless Bias Toward MPEG-4 Face Animation Tools Dr. Eric Petajan Chief Scientist and Founder face2face animation, inc. eric@f2f-inc.com.

Slides:



Advertisements
Similar presentations
2. What is Multimedia? Multimedia can have a many definitions these include: Multimedia means that computer information can be represented through audio,
Advertisements

Digital Interactive Entertainment Dr. Yangsheng Wang Professor of Institute of Automation Chinese Academy of Sciences
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
3D Face Modeling Michaël De Smet.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
EE442—Multimedia Networking Jane Dong California State University, Los Angeles.
LYU0603 A Generic Real-Time Facial Expression Modelling System Supervisor: Prof. Michael R. Lyu Group Member: Cheung Ka Shun ( ) Wong Chi Kin ( )
From User-friendly to User’s Friend Dr. Eric Petajan Founder and Chief Scientist face2face animation, inc. Why vision.
MPEG-4, NETWORKED MULTIMEDIA STANDARD
Gaze Awareness for Videoconferencing: A Software Approach Nicolas Werro.
A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes ( ) A thousand bytes - modern translation.
1 Expression Cloning Jung-yong Noh Ulrich Neumann Siggraph01.
MPEG-4 Applications Interactive TV (broadcast) Streaming media on the web (internet) Multimedia titles (CD-ROM) Network games Mobile multimedia (picture.
Artificial Intelligence & Information Analysis Group (AIIA) Centre of Research and Technology Hellas INFORMATICS & TELEMATICS INSTITUTE.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
05/09/02(c) 2002 University of Wisconsin Last Time Global illumination algorithms Grades so far.
CIS 601 Fall 2004 Introduction to Computer Vision and Intelligent Systems Longin Jan Latecki Parts are based on lectures of Rolf Lakaemper and David Young.
Chapter II The Multimedia Sysyem. What is multimedia? Multimedia means that computer information can be represented through audio, video, and animation.
Helsinki University of Technology Laboratory of Computational Engineering Modeling facial expressions for Finnish talking head Michael Frydrych, LCE,
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
A FACEREADER- DRIVEN 3D EXPRESSIVE AVATAR Crystal Butler | Amsterdam 2013.
Eyes Alive Sooha Park - Lee Jeremy B. Badler - Norman I. Badler University of Pennsylvania - The Smith-Kettlewell Eye Research Institute Presentation Prepared.
1 Seminar Presentation Multimedia Audio / Video Communication Standards Instructor: Dr. Imran Ahmad By: Ju Wang November 7, 2003.
MPEG-4 Technology Strategy Analysis Sonja Kangas, Mihai Burlacu T Research Seminar on Telecommunications Business II Telecommunications Software.
Facial animation retargeting framework using radial basis functions Tamás Umenhoffer, Balázs Tóth Introduction Realistic facial animation16 is a challenging.
Week 5 Video on the Internet. 2 Overview Video & Internet: The problem Solutions & Technologies in use Video Compression Available products Future Direction.
Invitation to Computer Science 5th Edition
The MPEG Standard MPEG-1 (1992) actually a video player
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 8 This presentation © 2004, MacAvon Media Productions Animation.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
Three Topics Facial Animation 2D Animated Mesh MPEG-4 Audio.
Real-Time Animation of Realistic Virtual Humans. 1. The 3D virtual player is controlled by the real people who has a HMD and many sensors people who has.
Advanced Computer Technology II FTV and 3DV KyungHee Univ. Master Course Kim Kyung Yong 10/10/2015.
Passage Three Multimedia Application. Training target: In this part , you should try your best to form good reading habits. In order to avoid your ill.
Zavod za telekomunikacije Igor S. Pandžić Department of telecommunications Faculty of electrical engineering and computing University of Zagreb, Croatia.
Multimedia Elements: Sound, Animation, and Video.
Computer Graphics 2 In the name of God. Outline Introduction Animation The most important senior groups Animation techniques Summary Walking, running,…examples.
1 Mpeg-4 Overview Gerhard Roth. 2 Overview Much more general than all previous mpegs –standard finished in the last two years standardized ways to support:
 The creation of moving pictures one frame at a time Literally 'to bring to life' e.g. make a sequence of drawings on paper, in which a character's position.
Realistic Modeling of Animatable Faces in MPEG-4 Marco Fratarcangeli and Marco Schaerf University of Rome “La Sapienza”
Algirdas Beinaravičius Gediminas Mazrimas.  Introduction  Motion capture and motion data  Used techniques  Animating human body  Problems  Conclusion.
Presented by Matthew Cook INFO410 & INFO350 S INFORMATION SCIENCE Paper Discussion: Dynamic 3D Avatar Creation from Hand-held Video Input Paper Discussion:
Class 13 LBSC 690 Information Technology More Multimedia Compression and Recognition, and Social Issues.
Computer Vision Michael Isard and Dimitris Metaxas.
MIRALab Where Research means Creativity SVG Open 2005 University of Geneva 1 Converting 3D Facial Animation with Gouraud shaded SVG A method.
CS-378: Game Technology Lecture #13: Animation Prof. Okan Arikan University of Texas, Austin Thanks to James O’Brien, Steve Chenney, Zoran Popovic, Jessica.
1 Perception and VR MONT 104S, Fall 2008 Lecture 21 More Graphics for VR.
Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.
Character Setup In addition to rigging for character models, rigging artists are also responsible for setting up animation controls for anything that is.
Animated Speech Therapist for Individuals with Parkinson Disease Supported by the Coleman Institute for Cognitive Disabilities J. Yan, L. Ramig and R.
Computer Graphics Researched via: Student Name: Timothy Rorie Date: 4 / 11 / 11.
Performance Driven Facial Animation
Video Compression and Standards
1 Chinese-Speaking 3D Talking Head Project No: H08040 Sang Siew Hoon Supervisor: Dr Ng Teck Khim.
Where We Stand So far we know how to: –Transform between spaces –Rasterize –Decide what’s in front Next –Deciding its intensity and color.
Facial Animation Wilson Chang Paul Salmon April 9, 1999 Computer Animation University of Wisconsin-Madison.
Animation Animation is about bringing things to life Technically: –Generate a sequence of images that, when played one after the other, make things move.
UCL Human Representation in Immersive Space. UCL Human Representation in Immersive Space Body ChatSensing Z X Y Zr YrXr Real–Time Animation.
Facial Expression Analysis Theoretical Results –Low-level and mid-level segmentation –High-level feature extraction for expression analysis (FACS – MPEG4.
Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.
CIRP Annals - Manufacturing Technology 60 (2011) 1–4 Augmented assembly technologies based on 3D bare-hand interaction S.K. Ong (2)*, Z.B. Wang Mechanical.
Unity 3D Rolfe Bozier 24-Apr-2017
MPEG-4 Binary Information for Scenes (BIFS)
Visual Information Retrieval
(c) 2002 University of Wisconsin
Computer Graphics Lecture 15.
Lecture 3. Virtual Worlds : Representation,Creation and Simulation ( II ) 고려대학교 그래픽스 연구실.
Presentation transcript:

Face Animation Overview with Shameless Bias Toward MPEG-4 Face Animation Tools Dr. Eric Petajan Chief Scientist and Founder face2face animation, inc. eric@f2f-inc.com

Computer-generated Face Animation Methods Morph targets/key frames (traditional) Speech articulation model (TTS) Facial Action Coding System (FACS) Physics-based (skin and muscle models) Marker-based (dots glued to face) Video-based (surface features)

Morph targets/key frames Advantages Complete manual control of each frame Good for exaggerated expressions Disadvantages Hard to achieve good lipsync without manual tweeking Morph targets must be downloaded to terminal for streaming animation (delay)

Speech articulation model Advantages High level control of face Enables TTS Disadvantages Robotic character Hard to sync with real voice

Facial Action Coding System Advantages Very high level control of face Maps to morph targets Explicit specification of emotional states Disadvantages Not good for speech Not quantified

Physics-based Advantages Disadvantages Good for realistic skin, muscle and fat Collision detection Disadvantages High complexity Must be driven by high level articulation parameters (TTS) Hard to drive with motion capture data

Marker-based Advantages Disadvantages Can provide accurate motion data from most of the face Face models can be animated directly from surface feature point motion Disadvantages Dots glued to face Dots must be manually registered Not good for accurate inner lip contour or eyelid tracking

Video-based Advantages Disadvantages Simple to capture video of face Face models can be animated directly from surface feature motion Disadvantages Must have good view of face

What is MPEG-4 Multimedia? Natural audio and video objects 2D and 3D graphics (based on VRML) Animation (virtual humans) Synthetic speech and audio

Samples versus Objects Traditional video coding is sample based (blocks of pixels are compressed) MPEG-4 provides visual object representation for better compression and new functionalities Objects are rendered in the terminal after decoding object descriptors

Object-based Functionalities User can choose display of content layers Individual objects (text, models) can be searched or stored for later used Content is independent of display resolution Content can be easily repurposed by provider for different networks and users

MPEG-4 Object Composition Objects are organized in a scene graph Scene graphs are specified using a binary format called BIFS (based on VRML) Both 2D and 3D objects, properties and transforms are specified in BIFS BIFS allows objects to be transmitted once and instanced repeatedly in the scene after transformations

MPEG-4 Operation Sequence

Faces are Special Humans are hard-wired to respond to faces The face is the primary communication interface Human faces can be automatically analyzed and parameterized for a wide variety of applications

MPEG-4 Face and Body Animation Coding Face animation is in MPEG-4 version 1 Body animation is in MPEG-4 version 2 Face animation parameters displace feature points from neutral position Body animation parameters are joint angles Face and body animation parameter sequences are compressed to low bitrates

Neutral Face Definition Head axes parallel to the world axes Gaze is in direction of Z axis Eyelids tangent to the iris Pupil diameter is one third of iris diameter Mouth is closed and the upper and lower teeth are touching Tongue is flat, horizontal with the tip of tongue touching the boundary between upper and lower teeth

Face Feature Points Right eye Left eye Nose Teeth Mouth Tongue y y x z 2.1 2.12 2.11 2.14 2.10 2.13 10.6 10.8 10.4 10.2 10.10 5.4 5.2 5.3 5.1 10.1 10.9 10.3 10.5 10.7 4.1 4.3 4.5 4.6 4.4 4.2 11.1 11.2 11.3 11.4 11.5 x y z 11.5 11.4 11.2 10.2 10.4 10.10 10.8 10.6 2.14 7.1 11.6 4.6 4.4 4.2 5.2 5.4 2.10 2.12 2.1 11.1 x y z Right eye Left eye 3.13 3.7 3.9 3.5 3.1 3.3 3.11 3.14 3.10 3.12 3.6 3.4 3.2 3.8 Nose 9.6 9.7 9.14 9.13 9.12 9.2 9.4 9.15 9.5 9.3 9.1 9.10 9.11 9.8 9.9 Teeth Mouth 8.1 8.9 8.10 8.5 8.3 8.7 8.2 8.8 8.4 8.6 2.2 2.3 2.6 2.8 2.9 2.7 2.5 2.4 Tongue 6.2 6.4 6.3 6.1 Feature points affected by FAPs Other feature points

Face Animation Parameter Normalization Face Animation Parameters (FAPs) are normalized to facial dimensions Each FAP is measured as a fraction of neutral face mouth width, mouth-nose distance, eye separation, or iris diameter 3 Head and 2 eyeball rotation FAPs are Euler angles

Neutral Face Dimensions for FAP Normalization

FAP Groups

Mouth closed if sum of upper and lower lip FAPs = 0

Face Model Independence FAPs are always normalized for model independence FAPs (and BAPs) can be used without MPEG-4 systems/BIFS Private face models can be accurately animated with FAPs Face models can be simple or complex depending on terminal resources

MPEG-4 BIFS Face Node Face node contains FAP node, Face scene graph, Face Definition Parameters (FDP), FIT,and FAT FIT (Face Interpolation Table) specifies interpolation of FAPs in terminal FAT (Face Animation Table) maps FAPs to Face model deformation FDP information included face feature points positions and texture map

Face Model Download 3D graphical models (e.g. faces) can be downloaded to the terminal with MPEG-4 3D model specification is based on VRML Face Animation Table( FAT) maps FAPs to face model vertex displacements Appearance and animation of downloaded face models is exactly predictable

FAP Compression FAPs are adaptively quantized to desired quality level Quantized FAPs are differentially coded Adaptive arithmetic coding further reduces bitrate Typical compressed FAP bitrate is less than 2 kilobits/second

- FAP Predictive Coding + FAP(t) Q Bitstream Arithmetic Coder Frame Delay Q-1

Face Analysis System MPEG-4 does not specify analysis systems face2face face analysis system tracks nostrils for robust operation Inner lip contour estimated using adaptive color thresholding and lip modeling Eyelids, eyebrows and gaze direction

Nostril Tracking

Inner Lip Contour Estimation

FAP Estimation Algorithm Head scale is normalized based on neutral mouth (closed mouth) width Head pitch is approximated based on vertical nostril deviation from neutral head position Head roll is computed from smoothed eye or nostril orientation depending on availability Inner lip FAPs are measured directly from the inner lips contour as deviations from the neutral lip position (closed mouth)

FAP Sequence Smoothing

MPEG-4 Visemes and Expressions A weighted combination of 2 visemes and 2 facial expressions for each frame Decoder is free to interpret effect of visemes and expressions after FAPs are applied Definitions of visemes and expressions using FAPs can also be downloaded

Visemes

Facial Expressions

Free Face Model Software Wireface is an openGL-based, MPEG-4 compliant face model Good starting point for building high quality face models for web applications Reads FAP file and raw audio file Renders face and audio in real time Wireface source is freely available

Body Animation Harmonized with VRML Hanim spec Body Animation Parameters (BAPs) are humanoid skeleton joint Euler angles Body Animation Table (BAT) can be downloaded to map BAPs to skin deformation BAPs can be highly compressed for streaming

Body Animation Parameters (BAPs) 186 humanoid skeleton euler angles 110 free parameters for use with downloaded body surface mesh Coded using same codecs as FAPs Typical bitrates for coded BAPs is 5-10kbps

Body Definition Parameters (BDPs) Humanoid joint center positions Names and hierarchy harmonized with VRML/Web3D H-Anim working group Default positions in standard for broadcast applications Download just BDPs to accurately animate unknown body model

Faces Enhance the User Experience Virtual call center agents News readers (e.g. Ananova) Story tellers for the child in all of us eLearning Program guide Multilingual (same face different voice) Entertainment animation Multiplayer games

Visual Content for the Practical Internet Broadband deployment is happening slowly DSL availability is limited and cable is shared Talking heads need high frame-rate Consumer graphics hardware is cheap and powerful MPEG-4 SNHC/FBA tools are matched to available bandwidth and terminals

Visual Speech Processing FAPs can be used to improve speech recognition accuracy Text-to-speech systems can use FAPs to animate face models FAPs can be used in computer-human dialogue systems to communicate emotions, intentions and speech especially in noisy environments

Video-driven Face Animation Facial expressions, lip movements and head motion transferred to face model FAPs extracted from talking head video with special computer vision system No face markers or lipstick is required Normal lighting is used Communicates lip movements and facial expressions with visual anonymity

Automatic Face Animation Demonstration FAPs extracted from camcorder video FAPs compressed to less than 2 kbits/sec 30 frames/sec animation generated automatically Face models animated with bones rig or fixed deformable mesh (real-time)

What is easy, solved, or almost solved Can we do photorealistic non-animated face models? YES Can we do near-real-time lip sync'ing that is indistinguishable from a human? NO

What is really hard Synthesizing human speech and facial expressions Hair

What we have assumed someone else is solving Graphics acceleration Video camera cost and resolution Multimedia communication infrastructure

Where we need help We have a face with 68 parameters but we need the psychologists to tell us how to drive it autonomously We need to embody our agents into graphical models that have a couple of thousand parameters to control gaze, gesture, body language, and do collision detection-> NEED MORE SPEED

Core functionality of the face Speech Lips, teeth, tongue Emotional expressions Gaze, eyebrow, eyelids, head pose Non-verbal communication Sensory responsivity Technical requirements Framerate Synchronization Latency Bitrate Spatial resolution Complexity Common framework withbody Interaction Different faces should respond similarly to common commands Accessible to everyone

Interaction with other components Language and discourse Phoneme to viseme mapping Given/new Action in the environment Global information Emotional state Personality Culture World knowledge Central time-base and timestamps

Open questions Central vs peripheral functionality Degree of interface commonality Degree of agent autonomy What should the VH be capable of