MPEG-4 Video Compression

MPEG-4 Video Compression
The MPEG-4 visual standard has been explicitly optimised for three bit rate ranges: below 64 kbit/sec, kbit/sec, Mbit/sec It provides content-based interactivity through the coding and representation of video objects rather than video frames to enable content-based applications. It represents arbitrarily shaped video objects where each object can be encoded with different parameters, and at different qualities. The shape of a video object can be represented in MPEG-4 by a binary or a gray-level (alpha) plane. The texture is coded separately from its shape. It provides support for both interlaced and progressive material Chrominance 4:2:0 format that is supported where the number of Cb and Cr samples are half the number of samples of the luminance samples in both horizontal and vertical directions. Each component can be represented by a number of bits ranging from 4 to 12 bits

MPEG-4 Video Compression Data structure in visual part of MPEG-4

Visual Object Sequence (VS): The complete MPEG-4 scene which may contain any 2-D or 3-D natural or synthetic objects and their enhancement layers Video Object (VO): A video object corresponds to a particular (2-D) object in the scene. In the most simple case this can be a rectangular frame, or it can be an arbitrarily shaped object corresponding to an object or background of the scene. Video Object Layer (VOL): Each video object can be encoded in scalable (multi-layer) or non-scalable form (single layer), depending on the application, represented by the video object layer (VOL). The VOL provides support for scalable coding. A video object can be encoded using spatial or temporal scalability, going from coarse to fine resolution. Group of Video Object Planes (GOV): The GOV groups together video object planes. GOVs can provide points in the bitstream where video object planes are encoded independently from each other, and can thus provide random access points into the bitstream. GOVs are optional Video Object Plane (VOP): A VOP is a time sample of a video object. VOPs can be encoded independently of each other, or dependent on each other by using motion compensation. A conventional video frame can be represented by a VOP with rectangular shape.

MPEG-4 Video Compression Block diagram of natural video decoding
The shape, texture and motion of every VOP is coded together. Shape of VOPs Reconstruction of VOP from Motion compensated previous VOP bounded by shape Composition of All VOPs MVs of VOP Macroblocks Image Store of Previous frames Prediction Error data

MPEG-4: Object-based Shape Coding

MPEG-4 Video Compression Shape coding tool
The shape of a VOP is bounded by a rectangular window with a size of multiples of 16 pixels in horizontal and vertical directions The position of the bounding rectangle is chosen such that it contains the minimum number of blocks of size 16x16 with non transparent pixels. The binary matrix representing the shape of a VOP is referred to as binary mask. In this mask every pixel belonging to the VOP is set to 255, and all other pixels are set to 0. Every VOP is partitioned into smaller 16x16 Binary Alfa Blocks (BABs) for coding

Shape Coding BAB: Binary Alfa Block
Context-based arithmetic encoding (CAE) is used in intra shape or update for inter shape Context: formed by neighboring shape pixels Intra context Inter context Context = Context is used to access probability table, which generates probability intervals for arithmetic coding Pixel by pixel, from L to R, top to bottom  build up the arithmetic word for the BAB Each BAB  coded into one arithmetic codeword Pixels outside context bounding box are assumed to be 0. Single binary arithmetic codeword C9 C0 C1 C2 C3 C4 C6 C5 C7 C8 ? Current Intra context C5 C6 C7 C4 C8 Current MC Inter context C0 C1 C2 C3 ?

For a particular context what is the prob that mask bit = 0 and 1
Shape Coding Each context has a probability of occurrence that is derived from the analysis of shapes and is mapped onto an arithmetic 0.0 -> 1.0 interval. This is used to arithmetic code the shape of objects For a particular context what is the prob that mask bit = 0 and 1 P(0) P(1) Final interval  Codeword for BAB Context P(0) P(1)

MPEG-4 Video Compression Grey Scale Shape coding tool
The grey scale shape information has a similar corresponding structure to that of binary shape with the difference that every pixel (element of the matrix) can take on a range of values (usually 0 to 255) representing the degree of the transparency of that pixel. Gray scale shape information is encoded using a block based motion compensated DCT similar to that of texture coding Grey scale shapes are required to feather in boundaries of objects with their backgrounds so that the object boundaries do not appear harsh.

MPEG-4 Video Compression Motion compensation tools
The approaches for motion compensation in the MPEG-4 standard have adapted the block-based techniques used in the other standards to the VOP structure: A VOP may be encoded independently of any other VOP. In this case the encoded VOP is called an Intra VOP (I-VOP). A VOP may be predicted (using motion compensation) based on another previously decoded VOP. Such VOPs are called Predicted VOPs (P-VOP). A VOP may be predicted based on past as well as future VOPs. Such VOPs are called Bidirectional Interpolated VOPs (B-VOP). B-VOPs may only be interpolated based on I-VOPs or P-VOPs.

MPEG-4 Video Compression Motion compensation tools
Motion compensated coding modes (I, B, P)

MPEG-4 Video Compression Motion vector computation
MVs of macroblocks totally within an object are predicted in the normal way: Contentional MB matching; Advanced Prediciton; Unrestricted; Predicition; Prediction-Bidirectional; MVs of macroblocks across an object border are padded to minimise the prediction errors at the boundary of objects and then prediction is computed. MVs of macroblocks totally outside an object are not encoded

MPEG-4 Video Compression Motion compensation - padding
Padding repeats the pixel value at the boundary to the edge of the MB. Overlapping repeats are averaged. Extended padding repeats this process to MBs that are adjacent to edge MBs Normal Padding Extended Padding Process of normal padding of a block Process of extended padding of a block Process of padding of a VOP

MPEG-4 Video Compression Texture coding tools
8x8 block-based DCT is used. To encode an arbitrarily shaped VOP, an 8x8 grid is super-imposed on the VOP. Using this grid, 8x8 blocks that are internal to VOP are encoded without modifications. Blocks that straddle the VOP are called boundary blocks, and are treated differently from internal blocks. The transformed blocks are quantized, and individual coefficient prediction can be used from neighbouring blocks to further reduce the entropy value of the coefficients. This is followed by a scanning of the coefficients, to reduce to average run length between two coded coefficients.

MPEG-4 Video Compression Texture coding tools
Macroblocks totally within an object are encoded in the normal way Macroblocks totally outside an object are not encoded Macroblocks across an object border are padded to avoid DCT coefficients ringing in the spatial frequency domain.

MPEG-4 Video Compression Texture decoding tools

MPEG-4 Video Compression Adaptive AC/DC prediction
Direction of the prediction is adaptive and is selected based on comparison of horizontal and vertical DC gradients (increase or reduction in its value) of surrounding blocks A, B, and C. two types of prediction possible, DC prediction and AC prediction: DC prediction: The prediction is performed for the DC coefficient only, and is either from the DC coefficient of block A, or from the DC coefficient of block C. AC prediction: Either the coefficients from the first row, or the coefficients from the first column of the current block are predicted from the co-sited coefficients of the selected candidate block. Differences in the quantization of the selected candidate block are accounted for by appropriate scaling by the ratio of quantization step sizes.

MPEG-4 Video Compression Coefficients scanning
1. Zig zag scan: The coefficients are read out diagonally. 2. Alternate-horizontal scan: The coefficients are read out with an emphasis on the horizontal direction first. if there is DC prediction in horizontal direction 3. Alternate-vertical scan: Similar to the horizontal scan, but applied in the vertical direction. if DC prediction is performed from the vertical direction

MPEG-4 Video Compression Quantization of AC Spectral Components
Two types of quantizations available: The first method uses one of two available quantization matrices to modify the quantization step size depending on the spatial frequency of the coefficient. The second method uses the same quantization step size for all coefficients. MPEG-4 also allows for a non-linear quantization of DC values

MPEG-4 Video Compression Interlaced coding mode
Allows progressive and interlaced mode. Motion compensation for field or frames similar to that of MPEG-2 Modified AC/DC prediction Field DCT Interlaced I, P, and B VOP coding Modified prediction for motion coding Modified scan rules 10% more efficient in compression efficiency compared to MPEG-2

MPEG-4 Video Compression Interlaced Coding
Frame DCT coding: Each luminance block is composed of lines from two fields alternately. Field DCT coding: Each luminance block is composed of lines from only one of the two fields.

MPEG-4 Video Compression Scalability
Object scalability Achieved by the data structures used and the shape coding Temporal scalability Achieved by generalized scalability mechanism Spatial scalability Achieved by generalized scalable mechanism

MPEG-4 Video Compression Scalable coding general scheme

MPEG-4 Video Compression Temporal scalability
The temporal scalability is achievable for both rectangular frames and arbitrarily shaped VOPs The base layer is encoded conventional MPEG-4 video The enhancement layer is encoded using one of the following two mechanisms: Type 1: The enhancement-layer improves the resolution of only a portion of the base-layer. Type 2: The enhancement-layer improves the resolution of the entire base-layer.

MPEG-4 Video Compression Temporal enhancement types
Only a portion of the base layer is enhanced in the enhancement layer The enhancement layer improves the resolution of the entire base layer

MPEG-4 Video Compression Temporal scalability Type 1
Only a portion of the VOP in the base layer is enhanced

MPEG-4 Video Compression Temporal Scalability Type 2
The entire VOP in the base layer is enhanced

MPEG-4 Video Compression Spatial scalability
The base layer is coded as conventional MPEG-4 video The enhancement layer is encoded using prediction mechanisms from the base layer

MPEG-4 Video Compression Spatial scalability
VOPs of the enhancement layer are encoded as P-VOPs or B-VOPs.

MPEG-4 Video Compression Error resilience tools
Resynchronization markers: There are unique markers in the bitstream so that in the case of an error, the decoder can skip the remaining bits until the next marker and restart decoding from that point on. Data partitioning:This method separates the bits for coding of motion information and those for the texture information. In the event of an error, a more efficient error concealment may be applied when for instance the error occurs on the texture bits only, by making use of the decoded motion information. Extended header code:These binary codes allow an optional inclusion of redundant header information, vital for correct decoding of video. This way, the chances of corruption of header information and complete skipping of large portions of bitstream will be reduced.

Reversible VLCs: These VLCs allow to further reduce the influence of error occurrence on the decoded data. RVLCs are codewords which can be decoded in forward as well as backward manners. In the event of an error and skipping of the bitstream until the next resynchronization marker, it is possible to still decode portions of the corrupted bitstream in the reverse order to limit the influence of the error.

For MPEG-4 resynchronization markers are located at start of picture and boundary of objects For H263 resynchronization markers are located at start of picture and Group of Blocks (GOBs). Picture Start Code MPEG4 Resync Marker H.263 Resync Marker H.263 Bitstream MPEG4 Bitstream

MPEG-4 Video Compression Static sprite coding tools
A sprite consists of those regions of a VO that are present in the scene, throughout the video segment. An obvious example is a `background sprite' (also referred to as the `background mosaic'), which would consist of all pixels belonging to the background in a camera-panning sequence.

MPEG-4 Video Compression Static sprite coding tools
+

MPEG-4 Video Compression Sprite Coding Tools
Low latency sprite coding: transmit only a portion of the sprite in the beginning. The remainder of the sprite is transmitted, piece-wise, as required or as the bandwidth allows. Another method is to transmit the entire sprite in a progressive fashion, starting with a low quality version, and gradually improving its quality by transmitting residual images.

MPEG-4 Video Compression Static Texture – Wavelet Transform
The static coding technique is based on a wavelet transform: Lx Hx Lx, Ly image can be recursively decomposed into four subimages Quantise and entropy code each sub-image, choosing number of bits/subimage to optimise quality of image A B (A+B)/2 (A-B)/2 C D (C+D)/2 (C-D)/2 LxLy HxLy LxHy HxHy Lx, Ly are low pass filters in x and y directions Hx, Hy are high pass filters in x and y directions Ly (A+B+C+D)/2 ((A-B)+(C-D))/2 Hy ((A+B)-(C+D))/2 ((A-B)-(C-D))/2

MPEG-4 Video Compression Wavelet Transform – DC Sub-band
The DC sub-band is encoded using a predictive scheme. Each coefficient is predicted from its left or top neighbour depending which is closest. The difference is then arithmetic coded.

MPEG-4 Video Compression Wavelet Transform – AC Sub-band
Many of the coefficients of the AC sub-band become zero after quantisation. There is a strong correlation between the amplitudes of the wavelets across the scales Zero Tree algorithm exploits this strong correlation. If a node on a the tree has a value X then its descendants will be very similar to it. The difference patterns are then arithmetic encoded.

MPEG-4 Video Compression Shape adaptive wavelet coding
Generalization of the wavelet transform to arbitrarily shaped VOP number of transformed coefficients in the VOP = number of pixels in the VOP Generalization of zero-tree coding no extra bit necessary for pixels outside the VOP

MPEG-4 Video Compression Wavelet coding - SNR scalability
bitstream 5kbits 8kbits 30kbits

MPEG-4 Video Compression Wavelet coding - spatial scalability
bitstream 14kbits 34kbits 47kbits 14kbits 34kbits 47kbits

MPEG-4 Video Compression 12-bit video coding tool
Allows compression of video data with precision of up to 12-bits/pixel The syntax, semantics, and coding tools are extended: bit-precision extended DC VLC tables extended quantization mechanism Insertion of marker bits to avoid start code emulations

MPEG-4 Systems Multiplexing
Place media objects anywhere in a given coordinate system. Apply transforms to change the geometrical or acoustical appearance of a media object. Group primitive media objects in order to form compound media objects. Apply streamed data to media objects, in order to modify their attributes (e.g. a sound, a moving texture belonging to an object; animation parameters driving a synthetic face). Change interactively the user’s viewing and listening points anywhere in the scene.

MPEG-4 Systems Multiplexing
d e m u l t i p x C o m p s i t r com- press decom- press com- press decom- press com- press decom- press com- press decom- press Scene Descr. com- press decom- press Scope of MPEG-4 Systems

MPEG-4 Systems System Decoder Model
U X Decoder Buffer DB1 Decoder Composition Memory CM1 C o m p s i t r Decoder Buffer DB2 Decoder Composition Memory CM2 Decoder Buffer DB3 Decoder Composition Memory CM3 Decoder Buffer DBn Decoder Composition Memory CMn Scope of MPEG-4 Systems

MPEG-4 Systems Flex Mux and Trans Mux
Multiplexes group of logical associated media FlexMux TransMux Multiplexes media for transport (utilises existing standards e.g DVB-T, IP over ATM etc.)

MPEG-4 Video Compression

Similar presentations

Presentation on theme: "MPEG-4 Video Compression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MPEG-4 Video Compression

Similar presentations

Presentation on theme: "MPEG-4 Video Compression"— Presentation transcript:

Similar presentations

About project

Feedback