Download presentation
Presentation is loading. Please wait.
1
Introduction to MPEG-4 MC2008 2018/11/19 MC2009
2
Outline Multimedia MPEG-4 Profiles Key Features of MPEG-4 Systems
DMIF Audiovisual Objects and Scene Graph Editing, Composition and Rendering Coding Basics Coding Techniques 2018/11/19 MC2009
3
Multimedia What is multimedia? Why does multimedia need to be coded?
Combination of audio, video, image, graphic, and text. Coverage of all human I/O’s. Why does multimedia need to be coded? 2018/11/19 MC2009
4
2018/11/19 MC2009
5
Multimedia Coding for Different Applications
Mobile devices Low data-rate, error resilience, scalability Streaming service Scalability, low to medium data-range, interactivity On-disk distribution (DVD) Interactivity Broadcast On-demand services 2018/11/19 MC2009
6
Profiles in MPEG-4 Visual Profiles Audio Profiles Graphics Profiles
Scene Graph Profiles MPEG-J Profiles Object Descriptor Profile 2018/11/19 MC2009
7
NewPred 2018/11/19 MC2009
8
H.263 Baseline 2018/11/19 MC2009
9
Key Features of MPEG-4 Systems
Provides a consistent and complete architecture for the coded representation of the desired combination of streamed elementary audio-visual information. Covers a broad range of applications, functionality and bit rates. Through profile and level definitions, it establishes a framework that allows consistent progression from simple applications (e.g., an audio broadcast application with graphics) to more complex ones (e.g., a virtual reality home theater). 2018/11/19 MC2009
10
Key Features of MPEG-4 Systems (2)
A set of tools for the representation of the multimedia content a framework for object description (the OD framework), BIFS: a binary language for the representation (format) of multimedia interactive 2D and 3D scene description, SDM and SyncLayer: a framework for monitoring and synchronizing elementary data stream, and MPEG-J: programmable extensions to access and monitor MPEG-4 content. 2018/11/19 MC2009
11
Key Features of MPEG-4 Systems (3)
MPEG-4 System defines an efficient mapping of the MPEG-4 content on existing delivery infrastructures. FlexMux: an efficient and simple multiplexing tool to optimize the carriage of MPEG-4 data (into different QoS channels), Extensions allowing the carriage of MPEG-4 content on MPEG-2 and IP systems, and a flexible file format for authoring, streaming and exchanging MPEG-4 data. 2018/11/19 MC2009
12
MPEG-4 IS0/IEC 14496 Terminal Architecture
2018/11/19 MC2009
13
Systems Timing Model Buffer Model Multiplexing of Streams
Synchronization of Streams The Compression Layer Object Description Framework Scene Description Streams Audio-visual Streams Upchannel Streams 2018/11/19 MC2009
14
Systems Decoder Model 2018/11/19 MC2009
15
2018/11/19 MC2009
16
IS0/IEC 14496 Terminal Architecture
2018/11/19 MC2009
17
Network-based Multimedia System
一個網路為主的多媒體系統可以分成Application Layer, Compression Layer, Transport Layer, Transmission Layer四個階層來看。Traffic shaping及Scalable rate control(SRC)都是常用來消除由於網路的delay jitter及可用的網路資源(如頻寛和Buffer)的方法。 Traffic shaping是一個transport layer的方法,而本篇論文討論的SRC則是一個Compression Layer的方法。 Traffic shaping的基本觀念是在Encode Video之前就先把Traffic Pattern先shaping 到想要的特性,如訂出最大延遲時間及瞬間峰值等。然後整個系統從Sender到Receiver就依給定的QoS來配置適當的resource及優先權。 SRC則是反過來要在壓縮原始視訊讓壓縮的結果可以滿足現有的網路Resource需求,如每秒10個frame的播放速度及最多只能累積500ms的delay. 這篇paper中的SRC目標在於能夠有效率地管理及使用網路的頻寬,以提供夠好的視訊品質來支援目前的多媒體應目系統。 2018/11/19 MC2009
18
The Objectives of DMIF Delivery Multimedia Integration Framework to hide the delivery technology details from the DMIF User to manage real time, QoS sensitive channels to allow service providers to log resources per session for usage accounting to ensure interoperability between end-systems 2018/11/19 MC2009
19
2018/11/19 MC2009
20
DMIF Communication Architecture
signaling 2018/11/19 MC2009
21
High View of a Service Activation
2018/11/19 MC2009
22
Audiovisual Objects Audiovisual scene is with “objects”
Mixed different objects on the screen Visual Video Animated face & body; 2D and 3D animated meshes Text and Graphics Audio General audio – mono, stereo, and multichannel Speech Synthetic sounds (“Structured audio”) Environmental spatialization 2018/11/19 MC2009
23
Example of MPEG-4 Video Objects
Rectangular shape video object Arbitrary shape video object Animated Face 2018/11/19 MC2009 From Olivier Avaro
24
2018/11/19 MC2009
25
The Scene Graph 2018/11/19 MC2009
26
Description & Synchronization Delivery of streaming data
Composition Description & Synchronization Delivery of streaming data Interaction with media objects Management and identification of intellectual property 2018/11/19 MC2009
27
Major Components 2018/11/19 MC2009
28
Media Objects Composition Rendering Scene Graph 2018/11/19 MC2009
29
Adding or Removing Objects (1)
– = + 2018/11/19 MC2009
30
Adding or Removing Objects (2)
2018/11/19 MC2009 From Igor S. Pandžić
31
Adding or Removing Objects (3)
Applications Video conferencing Real-time, automatic Separate foreground (communication partner) from background Object tracking in video May allow off-line and semi-automatic Separate moving object from others 2018/11/19 MC2009
32
MPEG-4 Coding Basics 2018/11/19 MC2009
33
Toolbox Approach tools for synthetic scenes tools for natural scenes
ALGORITHMS PROFILES 2018/11/19 MC2009
34
Coding Techniques Video objects Audio objects Face and Body 2D Mesh
Shape Motion vectors texture Audio objects MPEG AAC (Advanced Audio Coder) TTS (Text-To-Speech) Face and Body Animation parameters 2D Mesh Triangular patches Motion vector 2018/11/19 MC2009
35
Content-based Audio-Visual Representation
Audio-Visual Object (AVO) Video object component (video object plane, VOP) natural or synthetic 2D or 3D Audio object component mono, stereo or multi-channel 2018/11/19 MC2009
36
Video Object Planes (VOP)
Characteristics of VOP may have different spatial temporal resolutions may be associated with different degrees of accessibility sub-VOPs may be separated or overlapping VOP type Traditional I, P, B type S-VOP (Sprite) for background 2018/11/19 MC2009
37
Video Object Plane Type
S-VOP Time S-VOP B-VOP B-VOP B-VOP B-VOP B-VOP B-VOP I-VOP P-VOP P-VOP 2018/11/19 MC2009
38
Content-based Object Manipulation
change of the spatial position of a VOP application of a spatial scaling factor to a VOP change of the speed with which an VOP moves insertion of new VOPs deletion of an object in the scene change of the scene area 2018/11/19 MC2009
39
Segmentation Process Depending on applications, segmentation can be perform Online (real-time) or offline (non-real-time) Automatic or semi-automatic Examples Video conferencing real-time, automatic separate foreground (communication partner) from background Object Tracking in Video May allow off-line and semi-automatic separate moving object from others 2018/11/19 MC2009
40
Compression Improved coding efficiency
5-64 kbps for mobile applications up to 20Mbps for TV/film applications subjectively better quality compared to existing standard Coding of multiple concurrent data streams can code multiple views of a scene efficiently, e.g. stereo video 2018/11/19 MC2009
41
Coding VO in MPEG-4 Reduce temporal redundancy
Motion estimation for arbitrary shaped VOPs padding and modified block (polygon) matching motion estimation P-VOP B-VOP time I-VOP 2018/11/19 MC2009
42
Encoding of Visual Objects
Binary alpha block Motion vector Context-based arithmetic encoding Texture DCT 2018/11/19 MC2009
43
New Coding Features For each macroblock, the motion vectors can be computed on a 16 16 or 8 8 block basis Unrestricted motion estimation: prediction can extend over image boundary Overlapped block motion compensation Each component of texture can range from 1 to 12 bits More robust coding 2018/11/19 MC2009
44
Robust Video Coding Resynchronization Data partition Reversible VLC
Allow insertion of resync marker within each VOP Video packet header: include macroblock number, qunatizer value and timing information Data partition Allow shape, motion and texture data to be separated within a packet Reversible VLC Offer partial recovery from errors. 2018/11/19 MC2009
45
Sprite VOP Represent background image
Can be used for very efficient coding of scenes involving camera pan and zoom Much larger than the size of image and thus require more memory 2018/11/19 MC2009
46
Example of Sprite VOP 2018/11/19 MC2009
47
Object Mesh Useful for animation, content manipulation, content overlay, merging natural and synthetic video and others Tesselate with triangular patches Define motion vector for each node 2D motion of video objects are represented by the motion vectors of the node points Motion compensation is achieved by warping of texture map corresponding to patches by affine transform 2018/11/19 MC2009
48
Example of Object Mesh 2018/11/19 MC2009
49
Face Animation Face model Low-level facial animation
Default face model Download from the encoder Low-level facial animation A set of 66 facial animation parameters High-level facial animation A set of primary facial expression like joy, sadness, surprise and disgust Speech animation 14 visemes for mouth shape Text-to-speech synthesizer 2018/11/19 MC2009
50
Facial Animation 2018/11/19 MC2009 From Eine Übersicht
51
Still Texture Coding Discrete Wavelet Transform (DWT)
Spatial and quality scalability Use 2D Daubechies (9, 3)-tap biorthogonal filter Lowest band is lossless coded by arithmetic coding Higher bands are coded by multilevel quantization, zero-tree scanning and arithmetic coding 2018/11/19 MC2009
52
Audio Coding Different bit-rates, different types of source material and different algorithms Combination of parameter based coding, LPC-based coding, time/frequency based coding High quality speech with 2 kbps: Harmonic Vector eXcitation Coding (HVXC) Text-to-Speech (TTS) 2018/11/19 MC2009
53
Natural Audio Coder General audio (AAC, TwinVQ)
Quality Cellular AM FM CD 2 4 8 16 32 64 kbit/s Parametric speech (HVXC) High quality speech (CELP) General audio (AAC, TwinVQ) Parametric audio (HILN) Telephone 2018/11/19 MC2009 From Olivier Dechazal
54
Multiview Video 2018/11/19 MC2009
55
Stereo Sequence Coding
Multiview profile of MPEG-2 Coding left view seqence Sl, first, for the right view sequence, each frame is predicated from the corresponding frame in Sl, based on an estimated disparity field and the prediction error image are coded. P B B B Right view I B B P Left view 2018/11/19 MC2009
56
Intermediate View Synthesis
xl,n xc,n xr,n 2018/11/19 MC2009
57
The mesh-based scheme yields a visually more accurate prediction
Original left Original right Regular mesh on the left image Corresponding mesh on the right image Predictive right image by BMA (32.03 dB) Predictive right image by mesh (27.48 dB) The mesh-based scheme yields a visually more accurate prediction 2018/11/19 MC2009
58
MPEG-4 Coding Techniques
Shape Coding Shape-adaptive DCT Object-based Inter-frame Coding Overlapped Motion Estimation Bit-plane Coding and FGS 2018/11/19 MC2009
59
Object-Based Coding 2018/11/19 MC2009
60
Shape Coding Bitmap Coding Contour Coding Quadtree Coding
Context-Based Arithmetic Encoding (CAE) Contour Coding Chain Coding Baseline Shape Coding Polygon Approximation Skeleton-Based Shape Coding Quadtree Coding 2018/11/19 MC2009
61
Context-Based Arithmetic Encoding
16 16 Transparent block Boundary blocks Opaque block BOUNDING BOX 2018/11/19 Conditional entropy coding MC2009
62
Context-Based Arithmetic Encoding
16 16 Transparent block Boundary blocks Conditional entropy coding Opaque block BOUNDING BOX 2018/11/19 MC2009
63
Chain Coding starting points 3 3 3 3 2 3 3 2 2 2 1 2 1 2 1 1 1 1
3 3 3 3 2 3 3 2 2 starting points 2 1 2 1 2 1 1 1 1 1 2 3 5 6 7 4 4 - connected 8 - connected 2018/11/19 MC2009
64
Chain Coding 0 7 0 6 6 5 6 4 4 3 3 2 0 1 2 starting points
1 2 3 5 6 7 4 4 - connected 8 - connected 2018/11/19 MC2009
65
Differential Chain Code
DCC records the move (forward, leftward or rightward) regarding two consecutive directional links. F F F R L F L R 2018/11/19 MC2009
66
Baseline Shape Coding S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14
Baseline (horizontal) Distance between contour sample S23 and the baseline : D(S23) Trace and get distances : TPs (S7, S9, S12, S22) 2018/11/19 MC2009
67
Polygon Approximation
d2 d1 d3 Select vertices that are optimal in the rate-distortion sense. Splines are adopted to approximate the contour. 2018/11/19 MC2009
68
Skeleton-Based Shape Coding
2018/11/19 MC2009
69
Quadtree Coding 2018/11/19 MC2009
70
Shape-adaptive DCT 2018/11/19 MC2009
71
Inter-frame Coding Reconstruction of Object Shape
MVS = MVPS + MVDS MVS: MV for shape MVPS: predication MVDS: difference (BAC) 2018/11/19 MC2009
72
The context for Inter-frame Coding
2018/11/19 MC2009
73
Overlapped Motion Estimation
2018/11/19 MC2009
74
Weighting Coefficients in Overlapped Motion Estimation
2018/11/19 MC2009
75
Fine Granularity Scalable
Bad Moderate Good Low High Channel bitrate 2018/11/19 MC2009
76
FGS Video Encoder Structure
2018/11/19 MC2009
77
Enhancement layer bitstream
Bit-plane Coding quantized residual 5 7 8 6 2 4 3 1 4 6 8 2 binary transfer MSB 1 1 LSB reordering ……… ……… run-length coding Enhancement layer bitstream 2018/11/19 MC2009
78
FGS Video Decoder Structure
2018/11/19 MC2009
79
Binary Shape Encoder 2018/11/19 MC2009
80
Padding 2018/11/19 MC2009
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.