Introduction to MPEG-4 MC2008 2018/11/19 MC2009
Outline Multimedia MPEG-4 Profiles Key Features of MPEG-4 Systems DMIF Audiovisual Objects and Scene Graph Editing, Composition and Rendering Coding Basics Coding Techniques 2018/11/19 MC2009
Multimedia What is multimedia? Why does multimedia need to be coded? Combination of audio, video, image, graphic, and text. Coverage of all human I/O’s. Why does multimedia need to be coded? 2018/11/19 MC2009
2018/11/19 MC2009
Multimedia Coding for Different Applications Mobile devices Low data-rate, error resilience, scalability Streaming service Scalability, low to medium data-range, interactivity On-disk distribution (DVD) Interactivity Broadcast On-demand services 2018/11/19 MC2009
Profiles in MPEG-4 Visual Profiles Audio Profiles Graphics Profiles Scene Graph Profiles MPEG-J Profiles Object Descriptor Profile 2018/11/19 MC2009
NewPred 2018/11/19 MC2009
H.263 Baseline 2018/11/19 MC2009
Key Features of MPEG-4 Systems Provides a consistent and complete architecture for the coded representation of the desired combination of streamed elementary audio-visual information. Covers a broad range of applications, functionality and bit rates. Through profile and level definitions, it establishes a framework that allows consistent progression from simple applications (e.g., an audio broadcast application with graphics) to more complex ones (e.g., a virtual reality home theater). 2018/11/19 MC2009
Key Features of MPEG-4 Systems (2) A set of tools for the representation of the multimedia content a framework for object description (the OD framework), BIFS: a binary language for the representation (format) of multimedia interactive 2D and 3D scene description, SDM and SyncLayer: a framework for monitoring and synchronizing elementary data stream, and MPEG-J: programmable extensions to access and monitor MPEG-4 content. 2018/11/19 MC2009
Key Features of MPEG-4 Systems (3) MPEG-4 System defines an efficient mapping of the MPEG-4 content on existing delivery infrastructures. FlexMux: an efficient and simple multiplexing tool to optimize the carriage of MPEG-4 data (into different QoS channels), Extensions allowing the carriage of MPEG-4 content on MPEG-2 and IP systems, and a flexible file format for authoring, streaming and exchanging MPEG-4 data. 2018/11/19 MC2009
MPEG-4 IS0/IEC 14496 Terminal Architecture 2018/11/19 MC2009
Systems Timing Model Buffer Model Multiplexing of Streams Synchronization of Streams The Compression Layer Object Description Framework Scene Description Streams Audio-visual Streams Upchannel Streams 2018/11/19 MC2009
Systems Decoder Model 2018/11/19 MC2009
2018/11/19 MC2009
IS0/IEC 14496 Terminal Architecture 2018/11/19 MC2009
Network-based Multimedia System 一個網路為主的多媒體系統可以分成Application Layer, Compression Layer, Transport Layer, Transmission Layer四個階層來看。Traffic shaping及Scalable rate control(SRC)都是常用來消除由於網路的delay jitter及可用的網路資源(如頻寛和Buffer)的方法。 Traffic shaping是一個transport layer的方法,而本篇論文討論的SRC則是一個Compression Layer的方法。 Traffic shaping的基本觀念是在Encode Video之前就先把Traffic Pattern先shaping 到想要的特性,如訂出最大延遲時間及瞬間峰值等。然後整個系統從Sender到Receiver就依給定的QoS來配置適當的resource及優先權。 SRC則是反過來要在壓縮原始視訊讓壓縮的結果可以滿足現有的網路Resource需求,如每秒10個frame的播放速度及最多只能累積500ms的delay. 這篇paper中的SRC目標在於能夠有效率地管理及使用網路的頻寬,以提供夠好的視訊品質來支援目前的多媒體應目系統。 2018/11/19 MC2009
The Objectives of DMIF Delivery Multimedia Integration Framework to hide the delivery technology details from the DMIF User to manage real time, QoS sensitive channels to allow service providers to log resources per session for usage accounting to ensure interoperability between end-systems 2018/11/19 MC2009
2018/11/19 MC2009
DMIF Communication Architecture signaling 2018/11/19 MC2009
High View of a Service Activation 2018/11/19 MC2009
Audiovisual Objects Audiovisual scene is with “objects” Mixed different objects on the screen Visual Video Animated face & body; 2D and 3D animated meshes Text and Graphics Audio General audio – mono, stereo, and multichannel Speech Synthetic sounds (“Structured audio”) Environmental spatialization 2018/11/19 MC2009
Example of MPEG-4 Video Objects Rectangular shape video object Arbitrary shape video object Animated Face 2018/11/19 MC2009 From Olivier Avaro
2018/11/19 MC2009
The Scene Graph 2018/11/19 MC2009
Description & Synchronization Delivery of streaming data Composition Description & Synchronization Delivery of streaming data Interaction with media objects Management and identification of intellectual property 2018/11/19 MC2009
Major Components 2018/11/19 MC2009
Media Objects Composition Rendering Scene Graph 2018/11/19 MC2009
Adding or Removing Objects (1) – = + 2018/11/19 MC2009
Adding or Removing Objects (2) 2018/11/19 MC2009 From Igor S. Pandžić
Adding or Removing Objects (3) Applications Video conferencing Real-time, automatic Separate foreground (communication partner) from background Object tracking in video May allow off-line and semi-automatic Separate moving object from others 2018/11/19 MC2009
MPEG-4 Coding Basics 2018/11/19 MC2009
Toolbox Approach tools for synthetic scenes tools for natural scenes ALGORITHMS PROFILES 2018/11/19 MC2009
Coding Techniques Video objects Audio objects Face and Body 2D Mesh Shape Motion vectors texture Audio objects MPEG AAC (Advanced Audio Coder) TTS (Text-To-Speech) Face and Body Animation parameters 2D Mesh Triangular patches Motion vector 2018/11/19 MC2009
Content-based Audio-Visual Representation Audio-Visual Object (AVO) Video object component (video object plane, VOP) natural or synthetic 2D or 3D Audio object component mono, stereo or multi-channel 2018/11/19 MC2009
Video Object Planes (VOP) Characteristics of VOP may have different spatial temporal resolutions may be associated with different degrees of accessibility sub-VOPs may be separated or overlapping VOP type Traditional I, P, B type S-VOP (Sprite) for background 2018/11/19 MC2009
Video Object Plane Type S-VOP Time S-VOP B-VOP B-VOP B-VOP B-VOP B-VOP B-VOP I-VOP P-VOP P-VOP 2018/11/19 MC2009
Content-based Object Manipulation change of the spatial position of a VOP application of a spatial scaling factor to a VOP change of the speed with which an VOP moves insertion of new VOPs deletion of an object in the scene change of the scene area 2018/11/19 MC2009
Segmentation Process Depending on applications, segmentation can be perform Online (real-time) or offline (non-real-time) Automatic or semi-automatic Examples Video conferencing real-time, automatic separate foreground (communication partner) from background Object Tracking in Video May allow off-line and semi-automatic separate moving object from others 2018/11/19 MC2009
Compression Improved coding efficiency 5-64 kbps for mobile applications up to 20Mbps for TV/film applications subjectively better quality compared to existing standard Coding of multiple concurrent data streams can code multiple views of a scene efficiently, e.g. stereo video 2018/11/19 MC2009
Coding VO in MPEG-4 Reduce temporal redundancy Motion estimation for arbitrary shaped VOPs padding and modified block (polygon) matching motion estimation P-VOP B-VOP time I-VOP 2018/11/19 MC2009
Encoding of Visual Objects Binary alpha block Motion vector Context-based arithmetic encoding Texture DCT 2018/11/19 MC2009
New Coding Features For each macroblock, the motion vectors can be computed on a 16 16 or 8 8 block basis Unrestricted motion estimation: prediction can extend over image boundary Overlapped block motion compensation Each component of texture can range from 1 to 12 bits More robust coding 2018/11/19 MC2009
Robust Video Coding Resynchronization Data partition Reversible VLC Allow insertion of resync marker within each VOP Video packet header: include macroblock number, qunatizer value and timing information Data partition Allow shape, motion and texture data to be separated within a packet Reversible VLC Offer partial recovery from errors. 2018/11/19 MC2009
Sprite VOP Represent background image Can be used for very efficient coding of scenes involving camera pan and zoom Much larger than the size of image and thus require more memory 2018/11/19 MC2009
Example of Sprite VOP 2018/11/19 MC2009
Object Mesh Useful for animation, content manipulation, content overlay, merging natural and synthetic video and others Tesselate with triangular patches Define motion vector for each node 2D motion of video objects are represented by the motion vectors of the node points Motion compensation is achieved by warping of texture map corresponding to patches by affine transform 2018/11/19 MC2009
Example of Object Mesh 2018/11/19 MC2009
Face Animation Face model Low-level facial animation Default face model Download from the encoder Low-level facial animation A set of 66 facial animation parameters High-level facial animation A set of primary facial expression like joy, sadness, surprise and disgust Speech animation 14 visemes for mouth shape Text-to-speech synthesizer 2018/11/19 MC2009
Facial Animation 2018/11/19 MC2009 From Eine Übersicht
Still Texture Coding Discrete Wavelet Transform (DWT) Spatial and quality scalability Use 2D Daubechies (9, 3)-tap biorthogonal filter Lowest band is lossless coded by arithmetic coding Higher bands are coded by multilevel quantization, zero-tree scanning and arithmetic coding 2018/11/19 MC2009
Audio Coding Different bit-rates, different types of source material and different algorithms Combination of parameter based coding, LPC-based coding, time/frequency based coding High quality speech with 2 kbps: Harmonic Vector eXcitation Coding (HVXC) Text-to-Speech (TTS) 2018/11/19 MC2009
Natural Audio Coder General audio (AAC, TwinVQ) Quality Cellular AM FM CD 2 4 8 16 32 64 kbit/s Parametric speech (HVXC) High quality speech (CELP) General audio (AAC, TwinVQ) Parametric audio (HILN) Telephone 2018/11/19 MC2009 From Olivier Dechazal
Multiview Video 2018/11/19 MC2009
Stereo Sequence Coding Multiview profile of MPEG-2 Coding left view seqence Sl, first, for the right view sequence, each frame is predicated from the corresponding frame in Sl, based on an estimated disparity field and the prediction error image are coded. P B B B Right view I B B P Left view 2018/11/19 MC2009
Intermediate View Synthesis xl,n xc,n xr,n 2018/11/19 MC2009
The mesh-based scheme yields a visually more accurate prediction Original left Original right Regular mesh on the left image Corresponding mesh on the right image Predictive right image by BMA (32.03 dB) Predictive right image by mesh (27.48 dB) The mesh-based scheme yields a visually more accurate prediction 2018/11/19 MC2009
MPEG-4 Coding Techniques Shape Coding Shape-adaptive DCT Object-based Inter-frame Coding Overlapped Motion Estimation Bit-plane Coding and FGS 2018/11/19 MC2009
Object-Based Coding 2018/11/19 MC2009
Shape Coding Bitmap Coding Contour Coding Quadtree Coding Context-Based Arithmetic Encoding (CAE) Contour Coding Chain Coding Baseline Shape Coding Polygon Approximation Skeleton-Based Shape Coding Quadtree Coding 2018/11/19 MC2009
Context-Based Arithmetic Encoding 16 16 Transparent block Boundary blocks Opaque block BOUNDING BOX 2018/11/19 Conditional entropy coding MC2009
Context-Based Arithmetic Encoding 16 16 Transparent block Boundary blocks Conditional entropy coding Opaque block BOUNDING BOX 2018/11/19 MC2009
Chain Coding starting points 3 3 3 3 2 3 3 2 2 2 1 2 1 2 1 1 1 1 3 3 3 3 2 3 3 2 2 starting points 2 1 2 1 2 1 1 1 1 1 2 3 5 6 7 4 4 - connected 8 - connected 2018/11/19 MC2009
Chain Coding 0 7 0 6 6 5 6 4 4 3 3 2 0 1 2 starting points 1 2 3 5 6 7 4 4 - connected 8 - connected 2018/11/19 MC2009
Differential Chain Code DCC records the move (forward, leftward or rightward) regarding two consecutive directional links. F F F R L F L R 2018/11/19 MC2009
Baseline Shape Coding S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 Baseline (horizontal) Distance between contour sample S23 and the baseline : D(S23) Trace and get distances : TPs (S7, S9, S12, S22) 2018/11/19 MC2009
Polygon Approximation d2 d1 d3 Select vertices that are optimal in the rate-distortion sense. Splines are adopted to approximate the contour. 2018/11/19 MC2009
Skeleton-Based Shape Coding 2018/11/19 MC2009
Quadtree Coding 2018/11/19 MC2009
Shape-adaptive DCT 2018/11/19 MC2009
Inter-frame Coding Reconstruction of Object Shape MVS = MVPS + MVDS MVS: MV for shape MVPS: predication MVDS: difference (BAC) 2018/11/19 MC2009
The context for Inter-frame Coding 2018/11/19 MC2009
Overlapped Motion Estimation 2018/11/19 MC2009
Weighting Coefficients in Overlapped Motion Estimation 2018/11/19 MC2009
Fine Granularity Scalable Bad Moderate Good Low High Channel bitrate 2018/11/19 MC2009
FGS Video Encoder Structure 2018/11/19 MC2009
Enhancement layer bitstream Bit-plane Coding quantized residual 5 7 8 6 2 4 3 1 4 6 8 2 binary transfer MSB 1 1 LSB reordering 0010000001000000110110010000000101011100100110101101000010101011……… 0010000001000000110110010000000101011100100110101101000010101011……… run-length coding Enhancement layer bitstream 2018/11/19 MC2009
FGS Video Decoder Structure 2018/11/19 MC2009
Binary Shape Encoder 2018/11/19 MC2009
Padding 2018/11/19 MC2009