Download presentation
Presentation is loading. Please wait.
Published byDwight Holmes Modified over 9 years ago
1
UNIT V Video Compression
2
2 Outline 1. Introduction to Video Compression 2 Video Compression with Motion Compensation 3 Search for Motion Vectors 4 H.261 5 H.263 6 MPEG 1,2,4,7 7 Digital video interface
3
3 Introduction to Video Compression A video consists of a time-ordered sequence of frames, i.e., images. An obvious solution to video compression would be predictive coding based on previous frames. Compression proceeds by subtracting images: subtract in time order and code the residual error. It can be done even better by searching for just the right parts of the image to subtract from the previous frame.
4
4 Video Compression with Motion Compensation Consecutive frames in a video are similar - temporal redundancy exists. Temporal redundancy is exploited so that not every frame of the video needs to be coded independently as a new image. The difference between the current frame and other frame(s) in the sequence will be coded - small values and low entropy, good for compression.
5
5 Video Compression with Motion Compensation Steps of Video compression based on Motion Compensation (MC): 1. Motion estimation (motion vector search). 2. MC-based Prediction. 3. Derivation of the prediction error, i.e., the difference.
6
6 Motion Compensation Each image is divided into macroblocks of size N×N. By default, N = 16 for luminance images. For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted.
7
Chap 10 Basic Video Compression Techniques 7 Motion Compensation Motion compensation is performed at the macroblock level. The current image frame is referred to as Target Frame. A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (Reference frame(s)). The displacement of the reference macroblock to the target macroblock is called a motion vector MV.
8
Chap 10 Basic Video Compression Techniques 8 Fig. 10.1: Macroblocks and Motion Vector in Video Compression.
9
Chap 10 Basic Video Compression Techniques 9 Figure 10.1 shows the case of forward prediction in which the Reference frame is taken to be a previous frame. MV search is usually limited to a small immediate neighborhood – both horizontal and vertical displacements in the range [−p, p]: This makes a search window of size (2p+1)×(2p+1).
10
10 Search for Motion Vectors The difference between two macroblocks can then be measured by their Mean Absolute Difference (MAD)
11
11 Search for Motion Vectors The goal of the search is to find a vector (i, j) as the motion vector MV = (u,v), such that MAD(i, j) is minimum:
12
12 Sequential Search Sequential search: sequentially search the whole (2p+1)×(2p+1) window in the reference frame (also referred to as full search or exhaustive search). A macroblock centered at each of the positions within the window is compared to the macroblock in the Target frame pixel by pixel and their respective MAD is then derived The vector (i, j) that offers the least MAD is designated as the MV (u, v) for the macroblock in the Target frame.
13
13 Sequential search method is very costly Assuming each pixel comparison requires three operations (subtraction, absolute value, addition), the cost for obtaining a motion vector for a single macroblock is
14
14 Motion-vector: sequential-search
15
15 2D Logarithmic Search Logarithmic search: a cheaper version, that is suboptimal but still usually effective. The procedure for 2D Logarithmic Search of motion vectors takes several iterations and is akin to a binary search: Initially only nine locations in the search window are used as seeds for a MAD-based search; they are marked as ‘1’.
16
16 After the one that yields the minimum MAD is located, the center of the new search region is moved to it and the step-size (offset) is reduced to half. In the next iteration, the nine new locations are marked as ‘2’, and so on.
17
17 Fig. 1: 2D Logarithmic Search for Motion Vectors.
18
18 Motion-vector: 2D-logarithmic-search
19
Chap 10 Basic Video Compression Techniques 19 Using the same example as in the previous subsection, the total operations per second is dropped to:
20
Chap 10 Basic Video Compression Techniques 20 Hierarchical Search The search can benefit from a hierarchical (multiresolution) approach in which initial estimation of the motion vector can be obtained from images with a significantly reduced resolution. Figure 10.3: a three-level hierarchical search in which the original image is at Level 0, images at Levels 1 and 2 are obtained by down-sampling from the previous levels by a factor of 2, and the initial search is conducted at Level 2. Since the size of the macroblock is smaller and p can also be proportionally reduced, the number of operations required is greatly reduced.
21
Chap 10 Basic Video Compression Techniques 21 Fig. 10.3: A Three-level Hierarchical Search for Motion Vectors.
22
22 Table 10.1 Comparison of Computational Cost of Motion Vector Search based on examples
23
Chap 10 Basic Video Compression Techniques 23 H.261: An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards. The standard was designed for videophone, video conferencing and other audiovisual services over ISDN. The video codec supports bit-rates of p×64 kbps, where p ranges from 1 to 30. Require that the delay of the video encoder be less than 150 msec so that the video can be used for real-time bidirectional video conferencing. H.261
24
Chap 10 Basic Video Compression Techniques 24 Table 10.2 Video Formats Supported by H.261
25
25 Fig. 10.4: H.261 Frame Sequence.
26
26 Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame. P-frames are not independent: coded by a forward predictive coding method (prediction from previous I-frame or P-frame is allowed). H.261 Frame Sequence
27
27 Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal. To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video. Motion vectors in H.261 are always measured in units of full pixel and they have a limited range of ±15 pixels, i.e., p = 15. H.261 Frame Sequence
28
28 Intra-frame (I-frame) Coding Fig. 10.5: I-frame Coding.
29
29 Macroblocks are of size 16×16 pixels for the Y frame, and 8×8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8×8 blocks. For each 8×8 block a DCT transform is applied, the DCT coefficients then go through quantization, zigzag scan, and entropy coding. Intra-frame (I-frame) Coding
30
30 Inter-frame (P-frame) Coding Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation.
31
31 For each macroblock in the Target frame, a motion vector is allocated by one of the search methods discussed earlier. After the prediction, a difference macroblock is derived to measure the prediction error. Each of these 8x8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures. Inter-frame (P-frame) Coding
32
32 The P-frame coding encodes the difference macroblock (not the Target macroblock itself). Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level. The MB itself is then encoded (treated as an Intra MB) and in this case it is termed a non-motion compensated MB. For motion vector, the difference MVD is sent for entropy coding: MVD = MV Preceding −MV Current Inter-frame (P-frame) Coding
33
33 The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock. If we use DCT and QDCT to denote the DCT coefficients before and after the quantization, then for DC coefficients in Intra mode: Quantization in H.261 For all other coefficients: scale - an integer in the range of [1, 31].
34
34 Fig. 10.7 shows a relatively complete picture of how the H.261 encoder and decoder work. A scenario is used where frames I, P1, and P2 are encoded and then decoded. Note: decoded frames (not the original frames) are used as reference frames in motion estimation. The data that goes through the observation points indicated by the circled numbers are summarized in Tables 10.3 and 10.4. H.261 Encoder and Decoder
35
35 original image decoded image Fig. 10.6(a): H.261 Encoder (I-frame).
36
36 decoded image Fig. 10.6(b): H.261 Decoder (I-frame).
37
37 original image decoded image prediction prediction error decoded prediction error Fig. 10.6(a): H.261 Encoder (P-frame).
38
38 prediction decoded (reconstructed) image decoded prediction error Fig. 10.6(b): H.261 Decoder (P-frame).
39
39
40
40 Fig..1: Macroblocks and Motion Vector in Video Compression.
41
41 Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation.
42
42 Fig. 10.8 shows the syntax of H.261 video bitstream: a hierarchy of four layers: Picture, Group of Blocks (GOB), Macroblock, and Block. 1.The Picture layer: PSC (Picture Start Code) delineates boundaries between pictures. TR (Temporal Reference) provides a time-stamp for the picture. Syntax of H.261 Video Bitstream
43
43 2. The GOB layer: H.261 pictures are divided into regions of 11×3 macroblocks, each of which is called a Group of Blocks (GOB). Fig. 10.9 depicts the arrangement of GOBs in a CIF or QCIF luminance image. For instance, the CIF image has 2×6 GOBs, corresponding to its image resolution of 352×288 pixels. Each GOB has its Start Code (GBSC) and Group number (GN). In case a network error causes a bit error or the loss of some bits, H.261 video can be recovered and resynchronized at the next identifiable GOB.
44
44 3. The Macroblock layer: Each Macroblock (MB) has its own Address indicating its position within the GOB, Quantizer (MQuant), and six 8×8 image blocks (4 Y, 1 Cb, 1 Cr). 4. The Block layer: For each 8x8 block, the bitstream starts with DC value, followed by pairs of length of zero-run (Run) and the subsequent non-zero value (Level) for ACs, and finally the End of Block (EOB) code. The range of Run is [0, 63]. Level reflects quantized values - its range is [−127; 127] and Level ≠ 0.
45
Ch 45 Fig. 10.8: Syntax of H.261 Video Bitstream.
46
46 Fig. 10.9: Arrangement of GOBs in H.261 Luminance Images.
47
47 H.263 is an improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN). Aims at low bit-rate communications at bit-rates of less than 64 kbps. Uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction). H.263
48
Li & Drew; 인터넷미디어공학부 임창훈 48 Table 10.5 Video Formats Supported by H.263
49
Chap 10 Basic Video Compression Techniques Li & Drew; 인터넷미디어공학부 임창훈 49 As in H.261, H.263 standard also supports the notion of Group of Blocks (GOB). The difference is that GOBs in H.263 do not have a fixed size, and they always start and end at the left and right borders of the picture. As shown in Fig. 10.10, each QCIF luminance image consists of 9 GOBs and each GOB has 11×1 MBs (176×16 pixels), whereas each 4CIF luminance image consists of 18 GOBs and each GOB has 44×2 MBs (704×32 pixels). H.263 & Group of Blocks (GOB)
50
Chap 10 Basic Video Compression Techniques Li & Drew; 인터넷미디어공학부 임창훈 50 Fig. 10.10: Arrangement of GOBs in H.263 Luminance Images.
51
Chap 10 Basic Video Compression Techniques 51 The horizontal and vertical components of the MV are predicted from the median values of the horizontal and vertical components, respectively, of MV1, MV2, MV3 from the “previous", “above" and “above and right" MBs (see Fig. 10.11 (a)). For the Macroblock with MV(u; v): Motion Compensation if H.263
52
52 Fig. 10.11: Prediction of Motion Vector in H.263.
53
Chap 10 Basic Video Compression Techniques 53 In order to reduce the prediction error, half-pixel precision is supported in H.263 vs. full-pixel precision only in H.261. The default range for both the horizontal and vertical components u and v of MV(u, v) are now [−16, 15.5]. The pixel values needed at half-pixel positions are generated by a simple bilinear interpolation method, as shown in Fig. 10.12. Half-Pixel Precision
54
Chap 10 Basic Video Compression Techniques 54 Fig. 10.12: Half-pixel Prediction by Bilinear Interpolation in H.263.
55
Chap 11 MPEG Video Coding 55 MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video. It is appropriately recognized that proprietary interests need to be maintained within the family of MPEG standards: Accomplished by defining only a compressed bitstream that implicitly defines the decoder. The compression algorithms, and thus the encoders, are completely up to the manufacturers. 11.1 Overview
56
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 56 MPEG-1 adopts the CCIR601 digital TV format also known as SIF (Source Input Format). MPEG-1 supports only non-interlaced video. Normally, its picture resolution is: 352×240 for NTSC video at 30 fps 352×288 for PAL video at 25 fps It uses 4:2:0 chroma subsampling The MPEG-1 standard has five parts: Systems, Video, Audio, Conformance, Software. 11.2 MPEG-1
57
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 57 Motion Compensation (MC) based video encoding in H.261 works as follows: In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction. Prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps. The prediction is from a previous frame - forward prediction. Motion Compensation in MPEG-1
58
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 58 Fig. 11.1: The Need for Bidirectional Search. The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.
59
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 59 MPEG introduces a third frame type - B-frame, and its accompanying bi-directional motion compensation. The MC-based B-frame coding idea is illustrated in Fig. 11.2: Motion Compensation in MPEG-1
60
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 60 Fig. 11.2: B-frame Coding Based on Bidirectional Motion Compensation.
61
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 61 Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction). If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged before comparing to the Target MB for generating the prediction error. If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.
62
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 62 Fig. 11.3: MPEG frame sequence.
63
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 63 Instead of GOBs as in H.261, an MPEG-1 picture can be divided into one or more slices (Fig. 11.4): May contain variable numbers of macroblocks in a slice. May also start and end anywhere as long as they fill the whole picture. Each slice is coded independently – additional flexibility in bit-rate control. Slice concept is important for error recovery. Other Major Differences from H.261
64
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 64 Fig. 11.4: Slices in an MPEG-1 Picture.
65
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 65 Fig. 11.5: Layers of MPEG-1 Video Bitstream.
66
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 66 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps. Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview. Within each profile, up to four levels are defined (Table 11.5). The DVD video specification allows only four display resolutions: 720×480, 704×480, 352×480, and 352×240. 11.3 MPEG-2
67
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 67
69
MPEG-2
70
Need for MPEG-2 MPEG-1 allowed rates of 1.5 Mbps at SIF resolution and higher resolution coding standards were needed for direct video broadcasting and storage on DVB, DVD MPEG-1 allowed encoding only of progressive scan sources, not interlaced scan sources MPEG-1 provides limited error concealment for noisy channels a more flexible choice of formats, resolutions and bitrates was needed
71
MPEG-2 MPEG-2 was designed mainly for storage (DVD, DVB) and transmission on noisy channels (direct terrestrial or satellite TV broadcast) MPEG-2 standards were published as ISO/IEC 13818 like MPEG-1, the MPEG-2 standard only specifies the syntax of the bit stream and the semantics/operation of the decoding process and leaves out the design of the encoder and decoder (to stimulate competition and industry product differentiation) although it provides a reference implementation developed between 1991-1993 parts of MPEG-2 reached International Standard in 1994, 1996, 1997, 1999 MPEG-3 was originally intended for HDTV at higher bitrates, but was merged with MPEG-2
72
MPEG-2 parts part 1, Systems : synchronization and multiplexing of audio and video part 2, video part 3, audio part 4, testing compliance part 5, software simulation part 6, extensions for Digital Storage Media Command and Control (DSM-CC) part 7, Advanced Audio Coding (AAC) part 9, extensions for real time interfaces part 10, conformance extensions for DSM-CC part 11, Intellectual Property Management and Protection [ part 8 withdrawn due to lack of industry interest ]
73
MPEG-2 target applications coding high-quality video at 4-15 Mbps for video on demand (VOD), standard definition (SD) and high- definition (HD) digital TV broadcasting and for storing video on digital storage media like the DVD MPEG-2 should have scalable coding and should include error resilience techniques MPEG-2 should provide good NTSC quality video at 4-6 Mbps and transparent NTSC quality video at 8-10 Mbps MPEG-2 should provide random access to frames MPEG-2 should be compatible with MPEG-1 (an MPEG-2 decoder should be able to decode an MPEG-1 bitstream) low cost decoders
74
MPEG-2 Systems MPEG-2 Systems offers 2 types of multiplexation bitstreams: Program Stream: it consists of a sequence of PESs, similar and compatible to MPEG-1 Program (System) Stream, but containing additional features; MPEG-2 PS is a superset of MPEG-1 PS; it is suited for error- free transmission environments and has long and variable length packets (typically 1-2KB, but can also be 64KB) for coding efficiency; it has features not present in MPEG-1 PS like: scrambling of data, assigning different priorities to packets, alignment of elementary stream packets, copyright indication, fast forward and fast reverse indication. Transport Stream: designed for transmission through noisy channels; has a small fixed size packet of 188 bytes; it is suited for cable/satellite TV broadcasting, ATM networks; allows synchronous multiplexing of programs with independent time bases, fast access to the desired program for channel hoping PES (Packetized Elementary Stream) – is the central structure used in both Program and Transport Streams; results from packetizing cntinuous streams of compressed audio or video
75
MPEG-2 Systems multiplexation (digital storage media)
76
Packetized Elementary Streams (PES)
77
Program Stream structure (simplified)
78
Transport Stream Structure (simplified)
79
MPEG-2 Profiles and Levels MPEG-2 is designed o cover a wide range of applications, but not all features are needed by all applications MPEG-2 groups application features into 7 profiles and profiles have different levels simple profile – for low-delay video conferencing applications using only I- and P- frames main profile – most used, high quality digital video apps. SNR (signal to noise ratio) scalable – supports multiple grades of video quality spatially scalable - supports multiple grades of resolution high – supports multiple grades of quality, resolution and chroma formats 4:2:2 multiview
80
MPEG-2 Profiles and Levels (2) there are 4 levels for each profile: low (for SIF pictures) main (for ITU-R BT 601 resolution pictures) high-1440 (for European HDTV resolution pictures) high (for North America HDTV resolution pictures)
81
MPEG-2 Profiles and Levels (3)
82
Scalable coding scalable coding – means coding the audio-video stream into a base layer and some enhancement layers, so that when the base layer is decoded basic quality is achieved, but if the transmission channel allows it, decoding enhancement layers brings additional quality to the decoded stream There are 4 types of scalability SNR scalability spatial scalability temporal scalability hybrid (combination of the above)
83
SNR scalability
84
Spatial scalability
85
Encoding of interlaced video MPEG-2 allows encoding of interlaced video and a frame can be intracoded or intercoded as a picture or as a field of picture motion estimation/compensation can be between frames or between fields
86
MPEG-4
87
MPEG-4, or ISO/IEC 14496 is an international standard describing coding of audio-video objects the 1 st version of MPEG-4 became an international standard in 1999 and the 2 nd version in 2000 (6 parts); since then many parts were added and some are under development today MPEG-4 included object-based audio-video coding for Internet streaming, television broadcasting, but also digital storage MPEG-4 included interactivity and VRML support for 3D rendering has profiles and levels like MPEG-2 has 27 parts
88
MPEG-4 parts Part 1, Systems – synchronizing and multiplexing audio and video Part 2, Visual – coding visual data Part 3, Audio – coding audio data, enhancements to Advanced Audio Coding and new techniques Part 4, Conformance testing Part 5, Reference software Part 6, DMIF (Delivery Multimedia Integration Framework) Part 7, optimized reference software for coding audio- video objects Part 8, carry MPEG-4 content on IP networks
89
MPEG-4 parts (2) Part 9, reference hardware implementation Part 10, Advanced Video Coding (AVC) Part 11, Scene description and application engine; BIFS (Binary Format for Scene) and XMT (Extensible MPEG-4 Textual format) Part 12, ISO base media file format Part 13, IPMP extensions Part 14, MP4 file format, version 2 Part 15, AVC (advanced Video Coding) file format Part 16, Animation Framework eXtension (AFX) Part 17, timed text subtitle format Part 18, font compression and streaming Part 19, synthesized texture stream
90
MPEG-4 parts (3) Part 20, Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF) Part 21, MPEG-J Graphics Framework eXtension (GFX) Part 22, Open Font Format Part 23, Symbolic Music Representation Part 24, audio and systems interaction Part 25, 3D Graphics Compression Model Part 26, audio conformance Part 27, 3D graphics conformance
91
Motivations for MPEG-4 Broad support for MM facilities are available 2D and 3D graphics, audio and video – but Incompatible content formats 3D graphics formats as VRML are badly integrated to 2D formats as FLASH or HTML Broadcast formats (MHEG) are not well suited for the Internet Some formats have a binary representation – not all SMIL, HTML+, etc. solve only a part of the problems Both authoring and delivery are cumbersome Bad support for multiple formats
92
MPEG-4: Audio/Visual (A/V) Objects Simple video coding (MPEG-1 and –2) A/V information is represented as a sequence of rectangular frames: Television paradigm Future: Web paradigm, Game paradigm … ? Object-based video coding (MPEG-4) A/V information: set of related stream objects Individual objects are encoded as needed Temporal and spatial composition to complex scenes Integration of text, “natural” and synthetic A/V A step towards semantic representation of A/V Communication + Computing + Film (TV…)
93
Main parts of MPEG-4 1. Systems – Scene description, multiplexing, synchronization, buffer management, intellectual property and protection management 2. Visual – Coded representation of natural and synthetic visual objects 3. Audio – Coded representation of natural and synthetic audio objects 4. Conformance Testing – Conformance conditions for bit streams and devices 5. Reference Software – Normative and non-normative tools to validate the standard 6. Delivery Multimedia Integration Framework (DMIF) – Generic session protocol for multimedia streaming
94
Main objectives – rich data Efficient representation for many data types Video from very low bit rates to very high quality 24 Kbs.. several Mbps (HDTV) Music and speech data for a very wide bit rate range Very low bit rate speech (1.2 – 2 Kbps).. Music (6 – 64 Kbps).. Stereo broadcast quality (128 Kbps) Synthetic objects Generic dynamic 2D and 3D objects Specific 2D and 3D objects e.g. human faces and bodies Speech and music can be synthesized by the decoder Text Graphics
95
Main objectives – robust + pervasive Resilience to residual errors Provided by the encoding layer Even under difficult channel conditions – e.g. mobile Platform independence Transport independence MPEG-2 Transport Stream for digital TV RTP for Internet applications DAB (Digital Audio Broadcast)... However, tight synchronization of media Intellectual property management + protection For both A/V contents and algorithms
96
Main objectives - scalability Scalability Enables partial decoding Audio - Scalable sound rendering quality Video - Progressive transmission of different quality levels - Spatial and temporal resolution Profiling Enables partial decoding Solutions for different settings Applications may use a small portion of the standard “Specify minimum for maximum usability”
97
Main objectives - genericity Independent representation of objects in a scene Independent access for their manipulation and re-use Composition of natural and synthetic A/V objects into one audiovisual scene Description of the objects and the events in a scene Capabilities for interaction and hyper linking Delivery media independent representation format Transparent communication between different delivery environments
98
Object-based architecture
99
MPEG-4 as a tool box MPEG-4 is a tool box (no monolithic standard) Main issue is not a better compression No “killer” application (as DTV for MPEG-2) Many new, different applications are possible Enriched broadcasting, remote surveillance, games, mobile multimedia, virtual environments etc. Profiles Binary Interchange Format for Scenes (BIFS) Based on VRML 2.0 for 3D objects “Programmable” scenes Efficient communication format
100
MPEG-4 Systems part
101
MPEG-4 scene, VRML-like model
102
Logical scene structure
103
MPEG-4 Terminal Components
104
Digital Terminal Architecture
105
BIFS tools – scene features 3D, 2D scene graph (hierarchical structure) 3D, 2D objects (meshes, spheres, cones etc.) 3D and 2D Composition, mixing 2D and 3D Sound composition – e.g. mixing, “new instruments”, special effects Scalability and scene control Terminal capabilities (TermCab) MPEG-J for terminal control Face and body animation XMT - Textual format; a bridge to the Web world
106
BIFS tools – command protocol Replace a scene with this new scene A replace command is an entry point like an I-frame The whole context is set to the new value Insert node in a grouping node Instead of replacing a whole scene, just adds a node Enables progressive downloads of a scene Delete node - deletion of an element costs a few bytes Change a field value; e.g. color, position, switch on/off an object
107
BIFS tools – animation protocol The BIFS Command Protocol is a synchronized, but non streaming media Anim is for continuous animation of scenes Modification of any value in the scene – Viewpoints, transforms, colors, lights The animation stream only contains the animation values Differential coding – extremely efficient
108
Elementary stream management Object description Relations between streams and to the scene Auxiliary streams: IPMP – Intellectual Property Management and Protection OCI – Object Content Information Synchronization + packetization – Time stamps, access unit identification, … System Decoder Model File format - a way to exchange MPEG-4 presentations
109
An example MPEG-4 scene
110
MPEG-7 Standard for the description of multimedia content – XML Schema for content description – Does not standardize extraction of descriptions – MPEG1, 2, and 4 make content available – MPEG7 makes content semantics available
111
Digital Video Interactive Digital Video Interactive (DVI) was the first multimedia desktop video standard for IBM- compatible personal computers. multimediapersonal computers It enabled full-screen, full motion video, as well as stereo audio, still images, and graphics to be presented on a DOS-based desktop computer. stereo The scope of Digital Video Interactive encompasses a file format, including a digital container format, a number of video and audio compression formats, as well as hardware associated with the file format. [1] [1] Contents 1 His
112
Digital Video Interactive The DVI format specified two video compression schemes, Presentation Level Video or Production Level Video (PLV) and Real-Time Video (RTV) and two audio compression schemes, ADPCM and PCM8. [3][1] [3][1] The original video compression scheme, called Presentation Level Video (PLV), was asymmetric in that a Digital VAX- 11/750 minicomputer was used to compress the video in non-real time to 30 frames per second with a resolution of 320x240.DigitalVAX- 11/750 Encoding was performed by Intel at its facilities or at licensed encoding facilities set up by Intel. [4] Video compression involved coding both still frames and motion-compensated residuals using Vector Quantization (VQ) in dimensions 1, 2, and 4. [4] Vector Quantization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.