Audio and Video Compression

Slides:



Advertisements
Similar presentations
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
Advertisements

Time-Frequency Analysis Analyzing sounds as a sequence of frames
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Multimedia communications EG-371Dr Matt Roach Multimedia Communications EG 371 and EG 348 Dr Matthew Roach Lecture 2 Digital.
SWE 423: Multimedia Systems
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Department of Computer Engineering University of California at Santa Cruz Video Compression Hai Tao.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
JPEG.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
CS :: Fall 2003 MPEG-1 Video (Part 1) Ketan Mayer-Patel.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Image and Video Compression
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
MPEG MPEG-VideoThis deals with the compression of video signals to about 1.5 Mbits/s; MPEG-AudioThis deals with the compression of digital audio signals.
Multimedia Data Video Compression The MPEG-1 Standard
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Concepts of Multimedia Processing and Transmission IT 481, Lecture 5 Dennis McCaughey, Ph.D. 19 February, 2007.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
JPEG. The JPEG Standard JPEG is an image compression standard which was accepted as an international standard in  Developed by the Joint Photographic.
Indiana University Purdue University Fort Wayne Hongli Luo
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
Chapter 4 Audio and video compression
June, 1999 An Introduction to MPEG School of Computer Science, University of Central Florida, VLSI and M-5 Research Group Tao.
Image Compression Supervised By: Mr.Nael Alian Student: Anwaar Ahmed Abu-AlQomboz ID: IT College “Multimedia”
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Digital Multiplexing 1- Pulse Code Modulation 2- Plesiochronous Digital Hierarchy 3- Synchronous Digital Hierarchy.
Image/Video Coding Techniques for IPTV Applications Wen-Jyi Hwang ( 黃文吉 ) Department of Computer Science and Information Engineering, National Taiwan Normal.
Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
Page 11/28/2016 CSE 40373/60373: Multimedia Systems Quantization  F(u, v) represents a DCT coefficient, Q(u, v) is a “quantization matrix” entry, and.
Block-based coding Multimedia Systems and Standards S2 IF Telkom University.
Video Compression and Standards
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
UNIT V. Linear Predictive coding With the advent of inexpensive digital signal processing circuits, the source simply analyzing the audio waveform to.
6/9/20161 Video Compression Techniques Image, Video and Audio Compression standards have been specified and released by two main groups since 1985: International.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
H. 261 Video Compression Techniques 1. H.261  H.261: An earlier digital video compression standard, its principle of MC-based compression is retained.
JPEG Compression What is JPEG? Motivation
CSI-447: Multimedia Systems
Digital Communications Chapter 13. Source Coding
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
ENEE 631 Project Video Codec and Shot Segmentation
JPEG Still Image Data Compression Standard
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Audio and Video Compression Lecture 5 Audio and Video Compression

Audio Compression- DPCM Principles

Differential pulse code modulation is a derivative of the standard PCM It uses the fact that the range of differences in amplitudes between successive samples of the audio waveform is less than the range of the actual sample amplitudes Hence fewer bits to represent the difference signals

Operation of DPCM Encoder Previously digitized sample is held in the register (R) The DPCM signal is computed by subtracting the current contents (Ro) from the new output by the ADC (PCM) The register value is then updated before transmission Decoder Decoder simply adds the previous register contents (PCM) with the DPCM Since ADC will have noise there will be cumulative errors in the value of the register signal

Audio Compression- Third-order predictive DPCM signal encoder and decoder

Operation of DPCM To eliminate this noise effect predictive methods are used to predict a more accurate version of the previous signal (use not only the current signal but also varying proportions of a number of the preceding estimated signals) These proportions used are known as predictor coefficients Difference signal is computed by subtracting varying proportions of the last three predicted values from the current output by the ADC

Operation of DPCM R1, R2, R3 will be subtracted from PCM The values in the R1 register will be transferred to R2 and R2 to R3 and the new predicted value goes into R1 Decoder operates in a similar way by adding the same proportions of the last three computed PCM signals to the received DPCM signal

Adaptive differential PCM (ADPCM) Savings of bandwidth is possible by varying the number of bits used for the difference signal depending on its amplitude (fewer bits to encode smaller difference signals) An international standard for this is defined in ITU-T recommendation G721 This is based on the same principle as the DPCM except an eight-order predictor is used and the number of bits used to quantize each difference is varied This can be either 6 bits – producing 32 kbps – to obtain a better quality output than with third order DPCM, or 5 bits- producing 16 kbps – if lower bandwidth is more important

Audio Compression- ADPCM subband encoder and decoder schematic The principle of adaptive differential PCM varies the number of bits used for the difference signal depending on its amplitude

Adaptive differential PCM (ADPCM) A second ADPCM standard which is a derivative of G-721 is defined in ITU-T Recommendation G-722 (better sound quality) This uses subband coding in which the input signal prior to sampling is passed through two filters: one which passes only signal frequencies in the range 50Hz through to 3.5kHz and the other only frequencies in the range 3.5kHz through to 7kHz By doing this the input signal is effectively divided into two separate equal-bandwidth signals, the first known as the lower subband signal and the second the upper subband signal Each is then sampled and encoded independently using ADPCM, the sampling rate of the upper subband signal being 16 ksps to allow for the presence of the higher frequency components in this subband

Adaptive differential PCM (ADPCM) The use of two subbands has the advantage that different bit rates can be used for each In general the frequency components in the lower subband have a higher perceptual importance than those in the higher subband For example with a bit rate of 64 kbps the lower subband is ADPCM encoded at 48kbps and the upper subband at 16kbps The two bitstreams are then multiplexed together to produce the transmitted (64 kbps) signal – in such a way that the decoder in the receiver is able to divide them back again into two separate streams for decoding

Adaptive predictive coding Even higher levels of compression possible at higher levels of complexity These can be obtained by also making the predictor coefficients adaptive In practice, the optimum set of predictor coefficients continuously vary since they are a function of the characteristics of the audio signal being digitized To exploit this property, the input speech signal is divided into fixed time segments and, for each segment, the currently prevailing characteristics are determined. The optimum set of coefficients are then computed and these are used to predict more accurately the previous signal This type of compression can reduce the bandwidth requirements to 8kbps while still obtaining an acceptable perceived quality

Linear predictive coding (LPC) signal encoder and decoder Linear predictive coding involves the source simply analyzing the audio waveform to determine a selection of the perceptual features it contains

Linear predictive coding With this type of coding the perceptual features of an audio waveform are analysed first These are then quantized and sent and the destination uses them, together with a sound synthesizer, to regenerate a sound that is perceptually comparable with the source audio signal With this compression technique although the speech can often sound synthetic high levels of compressions can be achieved In terms of speech, the three features which determine the perception of a signal by the ear are its: Pitch: this is closely related to the frequency of the signal. This is important since ear is more sensitive to signals in the range 2-5kHz Period: this is the duration of the signal Loudness: This is determined by the amount of energy in the signal

Linear predictive coding The input speech waveform is first sampled and quantized at a defined rate A block of digitized samples – known as segment - is then analysed to determine the various perceptual parameters of the speech that it contains The output of the encoder is a string of frames, one for each segment Each frame contains fields for pitch and loudness – the period determined by the sampling rate being used – a notification of whether the signal is voiced (generated through the vocal cords) or unvoiced (vocal cords are opened) And a new set of computed modal coefficients

Code-excited LPC (CELPC) The synthesiser used in most LPC decoders are based on a very basic model of the vocal tract These are intended for use with applications in which the amount of bandwidth available is limited but the perceived quality of the speech must be of acceptable standard for use in various multimedia applications In CELPC model instead of treating each digitized segment independently for encoding purposes, just a limited set of segments are used, each known as a wave template A pre computed set of templates are held by the encoder and the decoder in what is known as the template codebook Each of the individual digitized samples that make up a particular template in the codebook are differently encoded

Code-excited LPC (CELPC) All coders of this type have a delay associated with them which is incurred while each block of digitized samples is analysed by the encoder and the speech is reconstructed at the decoder The combined delay value is known as the coder’s processing delay In addition before the speech samples can be analysed it is necessary to buffer the block of samples The time to accumulate the block of samples is known as the algorithmic delay The coders delay an important parameter in conventional telephony application, a low-delay coder is required whereas in an interactive application delay of several seconds before the speech starts is acceptable

Perceptual Coding (PC) LPC and CELP are used for telephony applications and hence compression of speech signal PC are designed for compression of general audio such as that associated with a digital television broadcast Using this approach, sampled segments of the source audio waveform are analysed – but only those features that are perceptible to the ear are transmitted E.g although the human ear is sensitive to signals in the range 15Hz to 20 kHz, the level of sensitivity to each signal is non-linear; that is the ear is more sensitive to some signals than others Also when multiple signals are present as in audio a strong signal may reduce the level of sensitivity of the ear to other signals which are near to it in frequency, an effect known as ‘frequency masking

Perceptual Coding (PC) When the ear hears a loud sound it takes a short but a finite time before it could hear a quieter sound an effect known as temporal masking Sensitivity of the ear The dynamic range of ear is defined as the loudest sound it can hear to the quietest sound Sensitivity of the ear varies with the frequency of the signal The ear is most sensitive to signals in the range 2-5kHz hence the signals in this band are the quietest the ear is sensitive to Vertical axis gives all the other signal amplitudes relative to this signal (2-5 kHz) Signal A is above the hearing threshold and B is below the hearing threshold

Audio Compression – Perceptual properties of the human ear Perceptual encoders have been designed for the compression of general audio such as that associated with a digital television broadcast

Audio Compression – Perceptual properties of the human ear When an audio sound consists of multiple frequency signals is present, the sensitivity of the ear changes and varies with the relative amplitude of the signal

Perceptual Coding (PC) Signal B is larger than signal A. This causes the basic sensitivity curve of the ear to be distorted in the region of signal B Signal A will no longer be heard as it is within the distortion band

Audio Compression – Variation with frequency of effect of frequency masking The width of each curve at a particular signal level is known as the critical bandwidth for that frequency

Variation with frequency of effect of frequency masking The width of each curve at a particular signal level is known as the critical bandwidth It has been observed that for frequencies less than 500Hz, the critical bandwidth is around 100Hz, however, for frequencies greater than 500Hz then bandwidth increases linearly in multiples of 100Hz Hence if the magnitude of the frequency components that make up an audio sound can be determined, it becomes possible to determine those frequencies that will be masked and do not therefore need to be transmitted

Audio Compression – Temporal masking caused by loud signal After the ear hears a loud signal, it takes a further short time before it can hear a quieter sound (temporal masking)

Temporal masking After the ear hears a loud sound it takes a further short time before it can hear a quieter sound This is known as the temporal masking After the loud sound ceases it takes a short period of time for the signal amplitude to decay During this time, signals whose amplitudes are less than the decay envelope will not be heard and hence need not be transmitted In order to achieve this the input audio waveform must be processed over a time period that is comparable with that associated with temporal masking

Audio Compression – MPEG perceptual coder schematic

MPEG audio coder The audio input signal is first sampled and quantized using PCM The bandwidth available for transmission is divided into a number of frequency subbands using a bank of analysis filters The bank of filters maps each set of 32 (time related) PCM samples into an equivalent set of 32 frequency samples Processing associated with both frequency and temporal masking is carried out by the psychoacoustic model In basic encoder the time duration of each sampled segment of the audio input signal is equal to the time to accumulate 12 successive sets of 32 PCM 12 sets of 32 PCM are converted into frequency components using DFT

MPEG audio coder The output of the psychoacoutic model is a set of what are known as signal-to-mask ratios (SMRs) and indicate the frequency components whose amplitude is below the audible components This is done to have more bits for highest sensitivity regions compared with less sensitive regions In an encoder all the frequency components are carried in a frame

Audio Compression – MPEG perceptual coder schematic MPEG audio is used primarily for the compression of general audio and, in particular, for the audio associated with various digital video applications

MPEG audio coder frame format The header contains information such as the sampling frequency that has been used The quantization is performed in two stages using a form of companding The peak amplitude level in each subband is first quantized using 6 bits and a further 4 bits are then used to quantize the 12 frequency components in the subband relative to this level Collectively this is known as the subband sample (SBS) format The ancillary data field at the end of the frame optional and is used to for example to carry additional coded samples associated with the surround-sound that is present with some digital video broadcasts

MPEG audio coder frame format At the decoder section the dequantizers will determine the magnitude of each signal The synthesis filters will produce the PCM samples at the decoders

Video Compression One approach to compressing a video source is to apply the JPEG algorithm to each frame independently. This is known as moving JPEG or MJPEG If a typical movie scene has a minimum duration of 3 seconds, assuming a frame refresh rate of 60 frames/s each scene is composed of 180 frames hence by sending those segments of each frame that has movement associated with them considerable additional savings in bandwidth can be made There are two types of compressed frames - Those that are compressed independently (I- frames) - Those that are predicted (P-frame and B-frame)

Video Compression – Example frame sequences I and P frames In the context of compression, since video is simply a sequence of digitized pictures, video is also referred to as moving pictures and the terms “frames” and “picture” are used interchangeably

Video Compression – I frames I-frames (Intracoded frames) are encoded without reference to any other frames. Each frame is treated as a separate picture and the Y, Cr and Cb matrices are encoded separately using JPEG I–frames the compression level is small They are good for the first frame relating to a new scene in a movie I-frames must be repeated at regular intervals to avoid losing the whole picture as during transmission it can get corrupted and hence looses the frame The number of frames/pictures between successive I-frames is known as a group of pictures (GOP). Typical values of GOP are 3 - 12

Video Compression – P frames The encoding of the P-frame is relative to the contents of either a preceding I-frame or a preceding P-frame P-frames are encoded using a combination of motion estimation and motion compensation The accuracy of the prediction operation is determined by how well any movement between successive frames is estimated. This is known as the motion estimation Since the estimation is not exact, additional information must also be sent to indicate any small differences between the predicted and actual positions of the moving segments involved. This is known as the motion compensation No of P frames between I-frames is limited to avoid error propagation

Video Compression – Frame Sequences I-, P- and B-frames Each frame is treated as a separate (digitized) picture and the Y, Cb and Cr matrices are encoded independently using the JPEG algorithm (DCT, Quantization, entropy encoding) except that the quantization threshold values that are used are the same for all DCT coefficients

Video Compression – PB-Frames A fourth type of frame known as PB-frame has also been defined; it does not refer to a new frame type as such but rather the way two neighbouring P- and B-frames are encoded as if they were a single frame

Video Compression Motion estimation involves comparing small segments of two consecutive frames for differences and should a difference be detected a search is carried out to determine which neighbouring segments the original segment has moved To limit the time for search the comparison is limited to few segments Works well in slow moving applications like video telephony For fast moving video it will not work effectively. Hence B-frames (Bi-directional) are used. Their contents are predicted using the past and the future frames B- frames provides highest level of compression and because they are not involved in the coding of other frames they do not propagate errors

Video Compression – P-frame encoding The digitized contents of the Y matrix associated with each frame are first divided into a two-dimensional matrix of 16 X 16 pixels known as a macroblock

Video Compression- P-frame encoding 4 DCT blocks for the luminance signals in the example here and 1 each for the two chrominance signals are used To encode a p-frame the contents of each macroblock in the frame – known as the target frame are compared on a pixel-by-pixel basis with the contents of the I or P frames (reference frames) If a close match is found then only the address of the macroblock is encoded If a match is not found the search is extended to cover an area around the macroblock in the reference frame

Video Compression – P-frame encoding To encode a P-frame, the contents of each macroblock in the frame (target frame) are compared on a pixel-by-pixel basis with the contents of the corresponding macroblock in the preceeding I- or P-frame

Video Compression – B-frame encoding To encode a B-frame, any motion is estimated with reference to both the immediately preceding I- or P-frame and the immediately succeeding P- or I-frame

Video Compression- B-frame encoding To encode B-frame any motion is estimated with reference to both the preceding I or P frame and the succeeding P or I frame The motion vector and difference matrices are computed using first the preceding frame as the reference frame and then the succeeding frame as the reference Third motion vectors and set of difference , matrices are then computed using the target and the mean of the two other predicted set of values The set with the lowest set of difference matrices is chosen and is encoded

Decoding of I, P, and B frames I-frames decode immediately to recreate original frame P-frames the received information is decoded and the resulting information is used with the decoded contents of the preceding I/P frames (two buffers are used) B-frames the received information is decoded and the resulting information is used with the decoded contents of the preceding and succeeding P or I frame (three buffers are used) PB-frame A new frame type showing how two neighbouring P and B frames are encoded as if they were a single frame

Video Compression – Implementation schematic – I-frames The encoding procedure used for the macroblocks that make up an I-frame is the same as that used in the JPEG standard to encode each 8 x 8 block of pixels

Implementation Issues I-frame same as JPEG implementation FDCT, Quantization, entropy encoding Assuming 4 blocks for the luminance and 2 blocks for the chrominance, each macroblock would require six 8x8 pixel blocks to be encoded

Implementation Issues- P-frames In the case of P-frames the encoding of each macroblock is dependent on the output of the motion estimation unit which, in turn, depends on the contents of the macroblocks being encoded and the contents of the macroblock in the search area of the reference frame that produces the closest match. There are three possibilities: - If the two contents are the same, only the address of the macroblock in the reference frame is encoded - If the two contents are very close, both the motion vector and the difference matrices associated with the macroblock in the reference frame are encoded - If no close match is found, then the target macroblock is encoded in the same way as a macroblock in an I-frame

Video Compression – Implementation schematic – P-frames In order to carry out its role, the motion estimation unit containing the search logic, utilizes a copy of the (uncoded) reference frame

Video Compression – Implementation schematic – B-frames The same previous procedure is followed for encoding B-frames except both the preceding (reference) and the succeeding frame to the target frame are involved

Video Compression –example macroblock encoded bitstream format–

Implementation Issues - Bitstream format For each macroblock it is necessary to identify the type of encoding that has been used. This is the role of the formatter Type – indicates the type of frame encoded I, P or B Address – identifies the location of the macroblock in the frame Quantization Value – is the value used to quantize all the DCT coefficients in the macroblock Motion vector – encoded vector Block representation – indicates which of the six 8X8 blocks that make up the macroblcok are present B1, B2, ..B6: JPEG encoded DCF coefficients for those blocks present

Video Compression – MPEG-1 example frame sequence Uses a similar video compression technique as H.261; the digitization format used is the source intermediate format (SIF) and progressive scanning with a refresh rate of 0 Hz (NTSC) and 25 Hz (for PAL)

Performance Compression for I-frames are similar to JPEG for Video typically 10:1 through to 20:1 depending on the complexity of the frame contents P and B frames are higher compression and in the region of 20:1 through to 30:1 for P frame and 30:1 to 50:1 for B-frames

MPEG MPEG-1 ISO Recommendation 11172 uses resolution of 352x288 pixels and used for VHS quality audio and video on CD-ROM at a bit rate of 1.5 Mbps MPEG-2 ISO Recommendation 13818 Used in recording and transmission of studio quality audio and video. Different levels of video resolution possible Low: 352X288 comparable with MPEG-1 Main: 720X 576 pixels studio quality video and audio, bit rate up to 15 Mbps High: 1920X1152 pixels used in wide screen HDTV bit rate of up to 80Mbps are possible

MPEG MPEG-4: Used for interactive multimedia applications over the Internet and over various entertainment networks MPEG standard contains features to enable a user not only to passively access a video sequence using for example the start/stop/ but also enables the manipulation of the individual elements that make up a scene within a video In MPEG-4 each video frame is segmented into a number of video object planes (VOP) each of which will correspond to an AVO (Audio visual object) of interest Each audio and video object has a separate object descriptor associated with it which allows the object – providing the creator of the audio and /or video has provided the facility – to be manipulated by the viewer prior to it being decoded and played out

Video Compression – MPEG-1 video bitstream structure: composition The compressed bitstream produced by the video encoder is hierarchical: at the top level, the complete compressed video (sequence) which consists of a string of groups of pictures

Video Compression – MPEG-1 video bitstream structure: format In order for the decoder to decompress the received bitstream, each data structure must be clearly identified within the bitstream

Video Compression – MPEG-4 coding principles Content based video coding principles showing how a frame/scene is defined in the form of multiple video object planes

Video Compression – MPEG – 4 encoder/decoder schematic Before being compressed each scene is defined in the form of a background and one or more foreground audio-visual objects (AVOs)

Video Compression – MPEG VOP encoder The audio associated with an AVO is compressed using one of the algorithms described before and depends on the available bit rate of the transmission channel and the sound quality required