Presentation is loading. Please wait.

Presentation is loading. Please wait.

T325: Technologies for digital media Second semester – 2011/2012 Tutorial 5 – Video and Audio Coding (1-2) Arab Open University – Spring 2012 1.

Similar presentations


Presentation on theme: "T325: Technologies for digital media Second semester – 2011/2012 Tutorial 5 – Video and Audio Coding (1-2) Arab Open University – Spring 2012 1."— Presentation transcript:

1 T325: Technologies for digital media Second semester – 2011/2012 Tutorial 5 – Video and Audio Coding (1-2) Arab Open University – Spring 2012 1

2 Outline Introduction Video coding in MPEG-2 MPEG audio coding Arab Open University – Spring 2012 2

3 INTRODUCTION Arab Open University – Spring 2012 3

4 Digital vs. Analog – At the beginning Digital video coding techniques have been used since 1970s, in television studios where equipment costs and the large bandwidths required at the time were not major considerations. Digital vs. Analog Digital techniques allow much greater processing flexibility than analogue Digital material can be re-recorded many times over without loss of quality. BUT, The large bandwidth and higher costs of receivers, meant that digital video coding was not appropriate for domestic broadcast systems at that time. Arab Open University – Spring 2012 4

5 Introduction Digital coding had become a practicable possibility for the domestic market due to: Rapid reduction in costs for digital processing hardware  Reduce equipment cost Development of highly efficient digital video compression techniques  Minimize bandwidth requirements Arab Open University – Spring 2012 5

6 Question What are the advantages of Digital techniques over analogue techniques for broadcast TV? Arab Open University – Spring 2012 6

7 Digital vs. Analog for Broadcast TV Effect of transmission impairments on picture quality is far less than in the analogue case  Eliminate ‘ghost’ pictures due to the presence of multiple signal transmission paths and reflections. Digital television allows more channels to be accommodated in a given bandwidth  Different types of program material such as teletext or sub-titles in several languages can be accommodated much more flexibly with digital coding. Arab Open University – Spring 2012 7

8 Questions What do you know about the following standards: JPEG, MPEG? What are the MPEG standards you have used? Arab Open University – Spring 2012 8

9 Introduction JPEG stands for Joint Photographic Experts Group Set up to develop standards for the digital coding of still pictures. ‘Joint’  Done jointly by the CCITT (now ITU-T) and the ISO ‘Experts’  were drawn from industry, universities, broadcasting authorities, etc. Arab Open University – Spring 2012 9

10 MPEG Standards MPEG stands for Motion Picture Experts Group. Set up by the ISO to develop coding standards for moving pictures. Defined a number of standards for the compression of moving pictures MPEG-1 MPEG-2 MPEG-4 MPEG-7 MPEG-21 Arab Open University – Spring 2012 10

11 MPEG standards MPEG-1 designed mainly for the efficient storage of moving pictures on CD-ROM, but in a format not suitable for television. MPEG-2 is effectively a ‘tool box’ of compression techniques which can cater for a wide range of systems, present and future, including low, standard and high definition systems. MPEG-1 and -2 also include various audio standards, one of which -- the so-called Audio Layer III -- is the basis of MP3 coding. This standard is still widely used in digital television. Arab Open University – Spring 2012 11

12 MPEG standards MPEG-4 Initially intended to provide very high compression rates allowing for transmission of moving images at rates of 64 kbps, or less. Aims extended to the provision of flexible standards for a wide range of audiovisual material. It is proposed for the new HDTV planned for many countries over the next few years. It is already (in 2008) used in many commercial devices, such as domestic video cameras, personal digital assistants (PDAs) and web-based video such as the BBC iPlayer. Arab Open University – Spring 2012 12

13 MPEG standards MPEG-7 Specifies the way multimedia content can be indexed, and thus searched for in a variety of ways relating to the specific medium. It also has intellectual property aspects Involves the idea of ‘metadata’ : data that describes the nature of the multimedia object to ease searching. MPEG-21 Includes additional digital rights management Will be considered further in Block 2. Arab Open University – Spring 2012 13

14 VIDEO CODING IN MPEG-2 Arab Open University – Spring 2012 14

15 Introduction Both in films and television, moving scenes are shown as a series of fixed pictures Generated at a rate of about 25 per second The effect of motion being produced by changes from one picture to the next. There is often very little change between consecutive pictures MPEG-2 coding takes advantage of this to achieve high degrees of compression (inter-frame compression). Even in the case of single pictures, there can be a good deal of redundancy  it is possible to remove some fine detail without our perceiving any significant loss of quality (intra- frame compression). Arab Open University – Spring 2012 15

16 Introduction Digital audio and video systems are based on the principle of sampling the original sound or image, and processing the samples in order to achieve the desired result, whether transmission, storage or processing of the sound or vision. The sampling rate ultimately depends on the quantity of ‘information’ in the original signal. Useful information in an audio or video signal is dependent on the way human beings perceive sound, light intensity and color. Arab Open University – Spring 2012 16

17 Introduction What will be covered in this part? How the luminance and the two chrominance signals are sampled before any compression has been applied? Compressed coding of still pictures, which involves the use of JPEG techniques The way correlation between successive pictures is used by MPEG Various levels of compression available with MPEG-2 Forms of audio coding used with MPEG-2 Arab Open University – Spring 2012 17

18 Sampling formats Arab Open University – Spring 2012 18 The human eye is less sensitive to color than to brightness  The chrominance signal does not have to be sampled at such a high rate as the luminance signal.

19 Sampling formats Figure (a) represents the luminance sampling. The figure represents part of a camera scanning raster Circles show the times when the camera output luminance signal is sampled. The samples are taken consecutively along each line at the sampling rate a sample is taken every 1/(13.5 × 10 6 ) = 0.074 μs. Arab Open University – Spring 2012 19

20 Sampling formats Arab Open University – Spring 2012 20 The C b and C r chrominance signals are sampled at half the luminance rate 4:2:2 sampling takes chrominance samples which coincide with alternate luminance ones

21 Sampling format 4:2:0 sampling The chrominance sample values are obtained by averaging the values for corresponding points on two consecutive scan lines. They represent the chrominance values half-way between these lines This averaging avoids the more abrupt changes in color that would result from simply omitting half the chrominance samples. This is one of the main formats used for MPEG-2 coding. Arab Open University – Spring 2012 21

22 Sampling formats : comparison Arab Open University – Spring 2012 22

23 Sampling formats When even lower resolution is acceptable, source intermediate format (SIF) may be used. Used for MPEG-1 coding. The quality is comparable with that of a VHS video recorder. Arab Open University – Spring 2012 23

24 Sampling format 4:2:0 sampling vs. SIF sampling Sixteen luminance samples are replaced by four Four chrominance samples are replaced by just one. The net effect is that both the luminance and chrominance resolutions are halved in both the vertical and horizontal directions. Arab Open University – Spring 2012 24

25 The coding of still pictures MPEG is designed to squeeze out as much redundancy as possible in order to achieve high levels of compression. This is done in two stages: Spatial compression: uses the fact that, in most pictures, there is considerable correlation between neighboring areas in a picture to compress separately each picture in a video sequence. Temporal compression: uses the fact that, in most picture sequences, there is normally very little change during the 1/25 s interval between one picture and the next. The resulting high degree of correlation between consecutive pictures allows a considerable amount of further compression. Arab Open University – Spring 2012 25

26 Discrete Cosine Transform (DCT) The first stage of spatial compression uses a variety of Fourier transform known as a discrete cosine transform (DCT) on 8×8 blocks of data. The luminance information for a row of “n” consecutive pixels will consist of “n” numbers. Example of transform: doubling each number  form of amplification which would double the picture brightness. Reversible transform in which a set of “n” original data values are converted into a new set of “n” values in such a way that the original set can be recovered by applying what is called the inverse transform to the new set. Because the process is applied to digital data, that is to a set of discrete numbers, the transform is called a discrete transform. Arab Open University – Spring 2012 26

27 Discrete Cosine Transform The row of pixels is a digital version of the original analogue signal consisting of a time-varying luminance signal The value of each consecutive pixel representing the signal luminance at each consecutive sampling interval. If the picture is a meaningful one, abrupt changes in sample values will be relatively rare. There are a number of transforms that can be applied to digital data samples so as to yield a set of numbers which correspond, in effect, to the amplitudes of the frequency components of the spectrum of the original analogue signal. Arab Open University – Spring 2012 27

28 Activity 5.1 What do you think this implies about the frequency spectrum of the luminance signal? Abrupt changes in a signal correspond to high frequency components in its spectrum. If there are not many abrupt changes in the luminance values, then the amplitude of high frequency components will, in general, be small compared with that of low frequency components. Arab Open University – Spring 2012 28

29 Discrete Cosine Transform DCT is used for JPEG and MPEG coding. DCT is a Reversible transform Applied to “n” original samples, it yields “n” amplitude values, and applying the reverse transform to these “n” amplitudes enables one to recover the original sample values. But converting “n” original values into “n” new one does not achieve anything in terms of compression. Arab Open University – Spring 2012 29

30 If the high-frequency components are sufficiently small, then setting them to zero before carrying out the reverse transform will produce a picture which, to a human observer, is effectively the same as the original one  This is the essence of the compression process! Arab Open University – Spring 2012 30 Discrete Cosine Transform

31 At the transmitter A DCT is applied to sets of “n” picture samples to yield “n’ amplitude values. In most cases, the majority of the amplitudes are negligible. This results in a data set containing many zero values, and such a data set can be compressed and transmitted using far fewer bits than the original samples Only the relatively few values that make a significant contribution to the perceived picture are transmitted directly. Arab Open University – Spring 2012 31 Discrete Cosine Transform

32 The discrete cosine transform At the receiving end The reverse DCT is applied on a set of “n” samples consisting of the received samples, together with the appropriate number of zero-amplitude samples. Ignoring the low amplitude, high frequency components means that the overall transform process is no longer reversible in a mathematical sense BUT this does not matter, so long as the recovered picture is sufficiently like the original one to meet the reproduction quality requirements of the system. Arab Open University – Spring 2012 32

33 The discrete cosine transform Figure (a) shows the variation of luminance along part of a picture line of length w. The variation with distance, x, along the line obeys a cosine law with one complete ‘period’ of the cosine taking place over distance w. The variation of luminance with distance, L 1, say, can be represented by the equation Arab Open University – Spring 2012 33

34 The discrete cosine transform The resulting picture is shown as thick line in figure (b). Peak white at either end of the line and the darkest region in the middle. Arab Open University – Spring 2012 34

35 The discrete cosine transform Figure (c) shows the case when two complete cycles of luminance just fit into length w of the line. The luminance can be expressed as The resulting pattern is shown in Figure (d). Arab Open University – Spring 2012 35

36 The discrete cosine transform Taking “w” as our unit of length, we can think of the pattern for L 1 as having a spatial frequency of one cycle per unit length and of L 2 as having a spatial frequency of two cycles per unit length. This idea can be extended to higher frequencies with luminance components of the form where r = 1, 2, 3 and L r has spatial frequency “r” cycles per unit length. Arab Open University – Spring 2012 36

37 The discrete cosine transform Figure shows examples of five cosine patterns. The spatial frequencies for (a) to (d) are 1, 2, 3 and 4 cycles per unit length respectively. Figure (e) shows a zero frequency, that is constant luminance, pattern which corresponds to a DC component in terms of the usual frequency spectra. Arab Open University – Spring 2012 37

38 The discrete cosine transform By combining spatial luminance patterns in appropriate amounts  by adding components with appropriately chosen amplitudes (so that the r th component is of form A r cos(2r  x/w), with amplitude A r ), one can reproduce any luminance pattern. In general, abrupt changes in amplitude values for a set of adjacent pixels will be unlikely. Because of this, the higher spatial frequency components of the pattern may be negligible and do not need to be transmitted. Applying the reverse transform at the receiving end will recover a satisfactory picture despite the absence of the higher components. Arab Open University – Spring 2012 38

39 The discrete cosine transform DCT yields a finite set of discrete frequency components equal in number to the original number of samples in the segment being analyzed Frequencies are 1, 2, 3, 4... times the lowest frequency, together with a zero-frequency (dc) term. If a line segment consists of eight samples, the DCT will yield eight amplitudes for components with spatial frequencies of 0, 1, 2,..., 7 cycles per unit length. Arab Open University – Spring 2012 39

40 The discrete cosine transform – 2D A much higher degree of compression can be achieved by using DCT simultaneously Horizontally (lines) and vertically (columns). This is done by applying a two-dimensional DCT to rectangular 8 × 8 blocks of pixels. The two-dimensional DCT applied to the 64 luminance values of an 8 × 8 block yields 64 amplitudes of two- dimensional spatial cosine functions. The spatial frequencies range from 0 (dc term) to 7 in both directions. The luminance in each block varies as a cosine function in both the horizontal and vertical directions. Arab Open University – Spring 2012 40

41 The discrete cosine transform Arab Open University – Spring 2012 41 The 64 two dimensional cosine functions.

42 Question In general, DCTs can be carried out on arrays of n x n amplitude values. But why is n = 8 chosen? Arab Open University – Spring 2012 42

43 The discrete cosine transform 1.Computation turns out to be more efficient if “n” is a power of 2. So “n” could be chosen to be 2 or 4 or 8 or 16 and so on. 2.The bigger the value of “n”, the more computation is involved and the more time is taken by the transform process. 3.Also, the smaller the value of “n”, the greater the inherent errors in the process. These errors show up as differences between the original amplitudes and the amplitudes obtained by using the reverse DCT on the result of carrying out a DCT on the original amplitudes. Tests on typical data indicate errors of the order of 5% for a 4 x 4 transform and 1% for an 8 x 8 transform. Beyond this point, the errors drop very slowly with increasing “n”, being of the order of 0.5% for a 256 x 256 transform. A 1% luminance error is not really perceptible, whereas a 5% error is. So the 8 x 8 transform is often the optimum choice. Arab Open University – Spring 2012 43

44 The discrete cosine transform The DCT computation is much more efficient, and hence faster, if the original block is symmetrical in both the horizontal and vertical directions. Thus, if the DCT is to be applied to the block shown in (a) below, the transform which is used is applied to the extended block of (b). Arab Open University – Spring 2012 44

45 The discrete cosine transform The extended block to which the DCT is applied has twice the height and twice the width of the original block. The original block lies in the top left quarter of the extended block and can be reconstructed by combining the top left quarters of the full two-dimensional cosine functions whose amplitudes have been determined by carrying out the DCT. Arab Open University – Spring 2012 45

46 Arab Open University – Spring 2012 46 Fig. Example of an 8 × 8 DCT. In Figure above, each amplitude applies to a different component and the way the amplitudes are ordered in the right-hand transform output block is shown to the right.

47 The output block is organized so that the horizontal frequencies increase from left to right and the vertical frequencies increase from top to bottom. The top-left component, with zero vertical and horizontal frequencies, is the dc term which represents the average luminance of the block. The minus signs in the DCT output represent phase differences. Arab Open University – Spring 2012 47

48 The discrete cosine transform Looking at the DCT output block of figure above, the dc term, A 00 = 826, the A 20 term = 15 and the A 14 term=−2. It turns out that each of the components can either be in phase, or 180° out of phase, with any of the others. This is a consequence of applying the transform to a symmetrical block such as (b) above. Arab Open University – Spring 2012 48

49 Thresholding and requantization Humans are not very sensitive to fine detail at low luminance levels. This allows higher spatial frequency components below a certain magnitude to be eliminated. This is known as Thresholding. The values of components below a certain threshold are each replaced by a zero value. Threshold tables are stored in the encoder Arab Open University – Spring 2012 49

50 Thresholding and requantization Also, in general, humans are less sensitive to the contribution of high-frequency components compared with lower ones. This is taken into account by using requantisation: fewer bits are used for the higher-frequency components which remain after Thresholding than for the low-frequency ones. Arab Open University – Spring 2012 50

51 Thresholding and requantization A requantization table which is stored in the encoder is used. Each amplitude value in the DCT output table is divided by the corresponding number in the quantization table and the result, rounded to the nearest integer, is used to replace the original amplitude. Arab Open University – Spring 2012 51

52 Thresholding and requantization Q ij is used for the requantization table entries A ij is used for the DCT component amplitudes. Example: Q 31 = 24 The new quantized values are given by taking the nearest integer to A ij /Q ij for each table entry. Example: A 12 = 16 and Q 12 = 22 Arab Open University – Spring 2012 52

53 Thresholding and requantization A 12 /Q 12 = 16/22 = 0.727 and the nearest integer is 1; this is the value of the (1,2) entry in the requantized values tables. A 1 3 = 9 and Q 1 3 = 22, giving 9/22 = 0.409. The nearest integer is 0 – hence the entry at position (1,3). Arab Open University – Spring 2012 53

54 Arab Open University – Spring 2012 54

55 Zig-zag scan and run-length encoding The higher the frequencies, the more zeros there are in the requantized block. In order to take advantage of this, the requantized values are rearranged for further processing in the order shown in the figure, which places them in order of ascending frequency for the horizontal and vertical directions combined. The result of this is that there are relatively long sequences consisting entirely of zeros. In that case, Run-length encoding leads to useful compression Arab Open University – Spring 2012 55

56 Zig-zag scan and run-length encoding The dc term is coded separately using differential coding. This just involves sending the difference between the value of the dc term and the value of the dc term of the contiguous block that was encoded immediately before. Arab Open University – Spring 2012 56

57 Zig-zag scan and run-length encoding For requantized table values, zig-zag scanning gives 103, followed by two zeros, −2, 1, 1, two zeros, 1, 42 zeros, 1, and finally 12 zeros. Assuming that the difference between the current and previous dc terms was 4, the data after zig-zag scanning and run-length coding would be sent as: 4, 2, −2, 0, 1, 0, 1, 2, 1, 42, 1, 12 Note that run lengths and non-zero amplitude values must alternate in the coded sequence. zero run lengths between contiguous non-zero amplitudes have to be indicated. There are two zeros immediately after the dc term, followed by consecutive non-zero amplitudes of −2, 1 and 1 which are separated by zero run lengths. Hence the initial coding of 4, 2,−2, 0, 1, 0, 1,…. Arab Open University – Spring 2012 57

58 Activity 5.3 Assuming that the requantized amplitude of the dc term in the contiguous block coded immediately before was 100, what would be the sequence of numbers transmitted after the requantized amplitude data of Table 5.4 had been run-length encoded? Arab Open University – Spring 2012 58

59 Exercise Table 3 below shows an 4 × 4 DCT output block as part of MPEG-2 luminance coding. What would be the sequence of numbers transmitted after requantization using the table 4, zigzag scanning and run-length encoding? Assuming that the amplitude of the dc term in the contiguous block coded immediately before was 770. Arab Open University – Spring 2012 59

60 Huffman coding The final step in the coding of single pictures is to use a technique known as Huffman coding. Huffman coding uses short code words for the most commonly occurring patterns and longer words for those patterns that occur less frequently. This results in significant compression of the data without any loss of information. The coding for the chrominance is essentially the same as for the luminance, but different quantization tables and Huffman code tables are used, based on the statistics of typical sets of chrominance data and on relevant features of our perception of color. Arab Open University – Spring 2012 60

61 Arab Open University – Spring 2012 61

62 Summary of spatial coding The coding techniques can be divided into two categories: Reversible or lossless coding: the exact data can be recovered after decoding. Examples: Huffman and run-length encoding The DCT is also effectively reversible, although some errors are, in fact, introduced through rounding and other effects. Reversible coding preserves all the information contained in the signal. Non-reversible or lossy coding: causes some information to be lost irrecoverably. Example: the requantization (reduces the number of bits per sample) Arab Open University – Spring 2012 62


Download ppt "T325: Technologies for digital media Second semester – 2011/2012 Tutorial 5 – Video and Audio Coding (1-2) Arab Open University – Spring 2012 1."

Similar presentations


Ads by Google