Introduction to JPEG and MPEG Ingemar J. Cox University College London
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox2 Outline Elementary information theory Lossless compression Quantization Fundamentals of images Discrete Cosine Transform (DCT) JPEG MPEG-1, MPEG-2
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox3 Bibliography D. MacKay, “Information Theory, Inference and learning Algorithms”, Cambridge University Press, W. B. Pennebaker and J. L. Mitchell, “JPEG Still Image Data Compression Standard”, Chapman Hall, 1993 (ISBN ). G. K. Wallace, “The JPEG Still-Picture Compression Standard”, IEEE Trans. On Consumer Electronics, 38, 1, 18-34,
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox4 Bibliography T. Sikora, “MPEG Digital Video-Coding Standards”, IEEE Signal Processing Magazine, , September 1997
Elementary Information Theory
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox6 Elementary Information Theory How much information does a symbol convey? Intuitively, the more unpredictable or surprising it is, the more information is conveyed. Conversely, if we strongly expected something, and it occurs, we have not learnt very much
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox7 Elementary Information Theory If p is the probability that a symbol will occur Then the amount of information, I, conveyed is: The information, I, is measured in bits It is the optimum code length for the symbol
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox8 Elementary Information Theory The entropy, H, is the average information per symbol Provides a lower bound on the compression that can be achieved
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox9 Elementary Information theory A simple example. Suppose we need to transmit four possible weather conditions: 1.Sunny 2.Cloudy 3.Rainy 4.Snowy If all conditions are equally likely, p(s)=0.25, and H=2 i.e. we need a minimum of 2 bits per symbol
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox10 Elementary information theory Suppose instead that it is: 1.Sunny 0.5 of the time 2.Cloudy 0.25 of the time 3.Rainy of the time, and 4.Snowy of the time Then the entropy is
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox11 Elementary Information Theory Variable length codewords Huffman code – integer code lengths Arithmetic codes – non-integer code lengths
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox12 Elementary Information Theory Huffman code WeatherProbabilityInformationInteger code Sunny0.510 Cloudy Rainy Snowy
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox13 Elementary Information Theory Previous illustration is an example of a lossless code I.e. we are able to recover the information exactly
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox14 Elementary Information Theory Note that we have assumed that each symbol is independent of the other symbols I.e. the current symbol provides no information regarding the next symbol
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox15 Quantization Quantization is the process of approximating a continuous (or range of values) by a (much) smaller range of values Where Round(y) rounds y to the nearest integer is the quantization stepsize
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox16 Quantization Example: =
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox17 Quantization Quantization plays an important role in lossy compression This is where the loss happens
Fundamentals of Images
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox19 Fundamentals of images An image consists of pixels (picture elements) Each pixel represents luminance (and colour) Typically, 8-bits per pixel
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox20 Fundamentals of images Colour Colour spaces (representations) RGB (red-green-blue) CMY (cyan-magenta-yellow) YUV Y = 0.3R+0.6G+0.1B (luminance) U=R-Y V=B-Y Greyscale Binary
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox21 Fundamentals of images A TV frame is about 640x480 pixels If each pixels is represented by 8-bits for each colour, then the total image size is 640×480*3=921,600 bytes or 7.4Mbits At 30 frames per second, this would be 220Mbits/second
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox22 Fundamentals of images Do we need all these bits?
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox23 Fundamentals of images Here is an image represented with 8-bits per pixel
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox24 Fundamentals of images Here is the same image at 7-bits per pixel
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox25 Fundamentals of images And at 6-bits per pixel
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox26 Fundamentals of images And at 5-bits per pixel
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox27 Fundamentals of images And at 4-bits per pixel
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox28 Fundamentals of images Do we need all these bits? No! The previous example illustrated the eye’s sensitivity to luminance We can build a perceptual model Only code what is important to the human visual system (HVS) Usually a function of spatial frequency
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox29 Fundamentals of Images Just as audio has temporal frequencies Images have spatial frequencies Transforms Fourier transform Discrete cosine transform Wavelet transform Hadamard transform
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox30 Discrete cosine transform Forward DCT Inverse DCT
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox31 Basis functions DC term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox32 Basis functions First term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox33 Basis functions Second term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox34 Basis functions Third term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox35 Basis functions Fourth term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox36 Basis functions Fifth term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox37 Basis functions Sixth term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox38 Basis functions Seventh term
DCT Example
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox40 Example Signal
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox41 Example DCT coefficients are: 0 0 0
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox42 Example: DCT decomposition DC term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox43 Example: DCT decomposition 2 nd AC term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox44 Example: DCT decomposition 6 th AC term
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox45 Example: summation of DCT terms First two non-zero coefficients
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox46 Example: summation of DCT terms All 3 non-zero coefficients
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox47 Example What if we quantize DCT coefficients? =1 Quantized DCT coefficients are: 4 0 -3 0
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox48 Example Approximate reconstruction
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox49 Example Exact reconstruction
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox50 2-D DCT Transform Let i(x,y) represent an image with N rows and M columns Its DCT I(u,v) is given by where
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox51 Fundamentals of images Discrete cosine transform Coefficients are approximately uncorrelated Except DC term C.f. original 8×8 pixel block Concentrates more power in the low frequency coefficients Computationally efficient Block-based DCT Compute DCT on 8×8 blocks of pixels
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox52 Fundamentals of images Basis functions for the 8×8 DCT (courtesy Wikipedia)
Fundamentals of JPEG
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox54 Fundamentals of JPEG DCTQuantizerEntropy coder IDCTDequantizer Entropy decoder Compressed image data Encoder Decoder
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox55 Fundamentals of JPEG JPEG works on 8×8 blocks Extract 8×8 block of pixels Convert to DCT domain Quantize each coefficient Different stepsize for each coefficient Based on sensitivity of human visual system Order coefficients in zig-zag order Entropy code the quantized values
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox56 Fundamentals of JPEG A common quantization table is
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox57 Fundamentals of JPEG Zig-zag ordering
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox58 Fundamentals of JPEG Entropy coding Run length encoding followed by Huffman Arithmetic DC term treated separately Differential Pulse Code Modulation (DPCM) 2-step process 1.Convert zig-zag sequence to a symbol sequence 2.Convert symbols to a data stream
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox59 Fundamentals of JPEG Modes Sequential Progressive Spectral selection Send lower frequency coefficients first Successive approximation Send lower precision first, and subsequently refine Lossless Hierarchical Send low resolution image first
Fundamentals of MPEG-1/2
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox61 Fundamentals of MPEG A sequence of 2D images Temporal correlation as well as spatial correlation TV broadcast Frame-based Field-based
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox62 MPEG Moving Picture Experts Group Standard for video compression Similarities with JPEG
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox63 MPEG Design is a compromise between Bit rate Encoder/decoder complexity Random access capability
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox64 MPEG Images Spatial redundancy Perceptual redundancy Video Spatial redundancy Intraframe coding Temporal redundancy Interframe coding Perceptual redundancy
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox65 MPEG Consider a sequence of n frames of video. It consists of: I-frames P-frames B-frames A sequence of one I-frame followed by P- and B-frames is known as a GOP Group of Pictures E.g. IBBPBBPBBPBBP
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox66 MPEG I-frames Intraframe coded No motion compensation P-frames Interframe coded Motion compensation Based on past frames only B-frames Interframe coded Motion compensation Based on past and future frames
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox67 MPEG Motion-compensated prediction Divide current frame, i, into disjoint 16×16 macroblocks Search a window in previous frame, i-1, for closest match Calculate the prediction error For each of the four 8×8 blocks in the macroblock, perform DCT-based coding Transmit motion vector + entropy coded prediction error (lossy coding)
UCL Adastral Park Postgraduate Campus Nov 27th 2006Ingemar J. Cox68 MPEG Like JPEG, the DC term is treated separately DPCM B-frame compression high Need buffer and delay