STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.

STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION

Recaps.. In Module 10, w have Variable Length Coding. The main objective of carrying VLC is to increase the coding efficiency, and this is possible due to the differences in probability of symbols occurrences. Recaps that the formulae to calculate the coding efficiency is: Since the entropy (H) is fixed regardless of type of coding being used, in VLC we managed to increase the coding efficiency by yielding a lower average number of bits per symbol (Ṝ). This also implies that we have somehow managed to compress the information!!

Compression Compression refers to the process by which the description of computerized information is modified so that the capacity required to store it or the bit- rate required to transmit it is reduced. Compression is carried out for the following reasons: (1) reduce storage requirement (2) reduce processing time (3) reduce transmission duration Basically, there are two types of compression:- (1) lossless compression (2) lossy compression

Lossless Compression - The information is recovered without any alteration after the decompression process. - The information contained in the compression bit stream is identical to the information contained in the original source bit stream. - Also called bit-preserving or reversible compression. - E.g. Huffman Coding, Fano Coding, Run Length Coding etc. (Note: We have learned Fano and Huffman coding in Module 10, we shall learn Run-Length Coding next)

Run Length Coding Any sequence of repetitive characters may be replaced by a short form. The algorithm may take into account any repeated character. A series of n successive characters c will be replaced by the character c itself followed by a special character, the flag, followed by the number n, that is the number of occurrences of the repeated character. In fact, the substitution should only take place if the number of occurrences is equal to or greater than four (why 4? – we shall look into this next).

Run Length Coding (cont.) Lets digits 1, 2, 3 represent Red, Green, and Blue. These will correspond to ci. Let a scan line be of length 33 consisting of 111111111113333333333222222222111 as xi. Then, the run-length encoded stream will be the series of tuples (1,11), (3,10), (2,9), and 111 where 11, 10, 9, are the ni. Or 1!113!102!9111 Note: 111 – 3 symbols 1!1 – also 3 symbols Thus, we only do run length coding for 4 or more repetitive symbols as such compression is possible.

Exercise 1 Derive the Run Length code for the following information: 111111222222233333333111222333333333333 Working - (1,6), (2,7),(3,8),111,222,(3,12) Run Length code - 1!62!73!81112223!12 What is the compression rate?

Exercise 2 Decode the following Run Length code: 3!102!41!93331!11222 Working – (3,10),(2,4),(1,9),333,(1,11),222 Original Information – 3333333333222211111111133311111111111222 Note: Compression is reversible!!! Lossless Compression!!

Lossy Compression Lossly Compression - Also called irreversible compression. - The decompressed information is different from the original uncompressed information. - This mode is suitable for most continuous media such as sound and video as well as for many images. - E.g. JPEG, MPEG, MP3 etc. (Note: In this Module, we will only learn the lossy compression characteristics of JPEG)

JPEG (Joint Photographic Expert Group) In this module, we shall learn the basic steps in JPEG compression techniques, which involves: (1) Block preparation (2) Discrete Cosine Transform (3) Quantization (4) Run Length coding (5) Huffman coding The main compression techniques of JPEG is made possible due to the spatial correlation of adjacent pixels. Though Run Length coding and Huffman coding are both lossless, compression becomes lossy in this JPEG operation due to quantization (where some information is being discarded)

JPEG (cont.) Block Preparation - An image is divided into individual blocks. - A block consists of 8*8 pixels - Why? More accurate since each block is treated individually (i.e. the following compression steps are applied onto individual blocks, not onto the image as a whole)

JPEG (cont.) Discrete Cosine Transform - The block of 8*8 sampled values in space domain is transformed into another block of 8*8 coefficient values in spectral frequency domain. - Why? - Easier to compress data in the frequency domain (i.e. which will be proven in the following quantization step). - Compression is based on the assumption that samples values in individual blocks of an image usually contains similar information (i.e. high coefficients that have low frequency) – spatial correlation.

JPEG (cont.) Discrete Cosine Transform

JPEG (cont.) Quantization - The block of 8*8 DCT coefficients are divided by an 8*8 quantization table. - Why? To allow further compression of entropy encoding by neglecting insignificant low coefficients. (Note: Human eyes are not sensitive to the low coefficients of the high frequencies). Thus in quantization, these low coefficients of the high frequencies are discarded Note: This is where compression becomes lossy! Since the low coefficients information is being discarded, even if we reverse the encoding process, we will not retain the original information!!

JPEG (cont.) Quantization

JPEG (cont.) Run Length Coding - To achieve higher level of compression by getting rid of long successive identical coefficient values. Encoding is carried out in a zig-zag manner. 1 0 0 0 0 0 0 0 0 0 0 0 5 1 0 0 0 0 0 0 5 4 1 0 0 0 0 0 256 5 3 3 0 0 0 0

JPEG (cont.) Huffman coding - To achieve greater level of compression by using Variable Length Coding (VLC) Do not freak out!! You need not to fully understand the details of each compression step in JPEG. What is important for you to know is the basic overview of each step and that it becomes lossy due to quantization, where some information is being discarded!

Exercise 3 Assuming an image file of 240 kByte is being compressed by using JPEG2000 with compression rate of 1:12. Calculate the size of the file in JPEG2000 format.

STATISTIC & INFORMATION THEORY (CSNB134) COMPRESSION --END--

STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.

Similar presentations

Presentation on theme: "STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.

Similar presentations

Presentation on theme: "STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION."— Presentation transcript:

Similar presentations

About project

Feedback