Three Topics Facial Animation 2D Animated Mesh MPEG-4 Audio
Facial Animation A synthetic visual object to allow animate face to reproduce expression, emotion and speech pronunciation. Facial Definition Parameters (FDP) --- Define facial shape and texture Facial Animation Parameters (FAP) --- Represent the facial expression, emotion and speech pronunciation(based on 84 feature points) Facial Animation Parameter Unit (FAPU) --- Represent the spatial distance between major facial features
Facial Animation Two FAP parameters 1. Viseme -- A visual correlate to a phoneme -- Set 14 static visemes. e.g. #1 is “b,p”; #11 is “e”; #4 is “t,d”. Example is “bed” -- Coordination of speech and mouth movement. e.g. The shape of mouth is influenced by previous, current and next phoneme
Facial Animation 2. Expression -- Be animated by a value e.g. expression #1 is “Joy”, description is “ the eyebrows are relaxed, the mouth is open….” -- Multiple expressions can be animated simultaneously e.g. “joy” and “surprise”
2-D Animated Meshes For nature or synthetic visual objects –Content manipulation –Animation –Augmentation (Overlay) –Transfiguration (Merge or replace with synthetic visual object)
2-D Animated Meshes Representation –Partition a 2D planar region into a set of triangle patches –Vertices of triangle of patches refer to as node points –Node points of initial mesh are tracked as the VOP (Visual Object Plane) move with scene
2-D Animated Meshes Dynamic mesh Mesh bit-stream consists of mesh geometry and mesh motion 1. Mesh geometry coding -- Uniform mesh: A set of rectangles, and then split each one into two triangles -- Delaunay Mesh: By coding boundary nodes first, and then interior nodes of meshes.
2-D Animated Meshes 2.Mesh Motion - Each mesh node has a motion vector - Breadth-first traversal - Motion vector are encoded/decoded predictively
MPEG-4 Audio Definition The spatio-temporal combination of audio objects –Audio object is a single audio stream –Audio object related with mixing, effects processing, switching and delaying –Using a signal-processing language
MPEG-4 Audio Natural Audio Coding –Parametric Coding (Speech coding) bit-rate between 2 and 6 kbit/s –Excited linear predictive (CELP) bit-rate between 6 and 24 kbit/s –Advanced Audio Coding (AAC) bit-rate start at 16 kbit/s
MPEG-4 Audio Text-To-Speech (Synthesize Speech) –Translate the accessed text into a string of phoneme symbols –Retrieve corresponding synthetic units from database –Concatenate the synthetic unit to synthesize the output speech