Download presentation
Presentation is loading. Please wait.
Published byChad Kelly Modified over 8 years ago
1
Dr. Hadi AL Saadi Image Compression
2
Goal of Image Compression Digital images require huge amounts of space for storage and large bandwidths for transmission. A 640 x 480 color image requires close to 1MB of space. The goal of image compression is to reduce the amount of data required to represent a digital image. Reduce storage requirements and increase transmission rates.
3
Compression: Basic Algorithms The need for compression: Raw Video, Image and Audio files can be very large: Uncompressed Audio: 1 minute of Audio: Audio Type44.1 KHz22.05 KHz11.025 KHz 16 Bit Stereo10.1 Mb5.05 Mb2.52 Mb 16 Bit Mono5.05 Mb2.52 Mb1.26 Mb 8 Bit Mono2.52 Mb1.26 Mb630 Kb Uncompressed Images: Image TypeFile Size 512 x 512 Monochrome0.25 Mb 512 x 512 8-bit colour image0.25 Mb 512 x 512 24-bit colour image0.75 Mb
4
Video: Can also involve: Stream of audio plus video imagery. Raw Video – Uncompressed Image Frames, 512x512 True Colour, 25 fps, 1125 MB Per Min HDTV — Gigabytes per minute uncompressed (1920 x 1080, true colour, 25fps: 8.7GB per min) Relying on higher bandwidths is not a good option. Compression HAS TO BE part of the representation of audio, image and video formats.
5
Approaches Lossless ( Entropy Coding) Information preserving, does not loss information (i.e signal can perfectly reconstructed after compression ) produce variable bit – rate important for text image over network Low compression ratios, its not guaranteed to actually reduce the data size Lossy ( Source Coding) Not information preserving, (i.e the signal is not perfectly reconstructed after decompression ) its important approach for digitized, audio and video signal,because the sensitivity of the human eye and ear can not recognize all detail High compression ratios, produced any desired constant bit-rate Trade-off: image quality vs compression ratio
6
Data ≠ Information Data and information are not synonymous terms! Data is the means by which information is conveyed. Data compression aims to reduce the amount of data required to represent a given quantity of information while preserving as much information as possible. The same amount of information can be represented by various amount of data, e.g.: Ex1.Your wife, Helen, will meet you at Logan Airport in Boston at 5 minutes past 6:00 pm tomorrow night Ex1.Your wife, Helen, will meet you at Logan Airport in Boston at 5 minutes past 6:00 pm tomorrow night Ex2. Ex2. Your wife will meet you at Logan Airport at 5 minutes past 6:00 pm tomorrow night Ex3. Ex3. Helen will meet you at Logan at 6:00 pm tomorrow night
7
Data Redundancy compression Compression ratio:
8
Data Redundancy (cont’d) Relative data redundancy: Example:
9
Types of Data Redundancy (1) Coding (2) Interpixel (3) Psychovisual Compression attempts to reduce one or more of these redundancy types.
10
Coding Redundancy Code: a list of symbols (letters, numbers, bits etc.) Code word: a sequence of symbols used to represent a piece of information or an event (e.g., gray levels). Code word length: number of symbols in each code word
11
Goal: minimize the average symbol length Average symbol length Average symbol length I(i) = binary length of i-th symbol N= number of symbols source emits M symbols (one every T sec ) ) symbol i has been emitted m(i) times number of bits being emitted by the source is: average length per symbol : average bit rate =
12
Symbol probabilities : Probability P(i) of a symbol : corresponds to its relative frequency : Average symbol length Average symbol length
13
Minimum Average symbol length Basic idea for reducing the average symbol length : assign a shorter codewords to symbols that appear more frequently, and longer codewords to symbol that appear less frequently. Theorem : “ the average binary length of the encoded symbols is always greater than or equal to the entropy H of the source ( the average information content per symbol of the source ) “
14
N x M image r k : k-th gray level P(r k ): probability of r k l(r k ): # of bits for r k Coding Redundancy
15
l(r k ) = constant length Coding Redundancy
16
l(r k ) = variable length Consider the probability of the gray levels: Coding Redundancy
17
How do we measure information? information content What is the information content of a message/image? amount of data What is the minimum amount of data that is sufficient to describe completely an image without loss of information?
18
Information generation is assumed to be a probabilistic process. Idea: associate information with probability! Modeling Information Note: I(E)=0 when P(E)=1 A random event E with probability P(E) contains:
19
How much information does a pixel contain? Suppose that gray level values are generated by a random variable, then r k contains: units of information! Average information content of an image : units/pixel Entropy H Entropy H: the average information content per symbol of the source Entropy reaches a maximum( ) when all symbols in the set are equal probable
20
Entropy is small (always 0) when symbols that are much more likely to appear than other symbol The efficiency of coding scheme is : Example : for N=4 and the probability distribution shown in Fig. below calculate the entropy to every system. i PiPi 0.25 i PiPi 1/3 2/3
21
Entropy Estimation It is not easy to estimate H reliably! image
22
Example 2: statistical encoding algorithm is being considered for transmission of a large number of long text file over a public network analysis of the file content has shown file comprises only the six different characters ( M,F,Y,N,O, and I ) each of which occurs with a relative frequency of occurrence of (0.25,0.25,0.125,0.125,0.125,0.125 ) respectively.If the encoding algorithm under consideration uses the following set of codewords. M=10 F=11 Y=010 N=011 O=000 and I= 001 Compute : 1.the average number of bit per codewords with the algorithm 2.the entropy of the source 3.the minimum number of bits required assuming fixed-length codewordes
23
Solution : 1. L= 2*(2*0.25)+4(3*0.125)=2.5 bit per codeword 2. =-[2(0.25Log2 0.25)+(4(0.125Log2 0.125))=2.5 3. the minimum number of bits required assuming fixed-length codewordes since there are 6 characters,using fixed-length codewords would require minimum of 3 bits (8 combinations)
24
Entropy codingRun-length coding Huffman coding Arithmetic coding Source codingPrediction DPCM DM Transformation FFT DCT Vector Quantization Hybrid codingJPEG MPEG H.261 Categories of Compression Techniques
25
Lossless Compression
26
Lossless Compression (Entropy Coding) Methods Lossless Coding Techniques (Entropy Coding) Lossless Coding Techniques (Entropy Coding) Repetitive Sequence Encoding Statistical Encoding Lossless predictive Coding Run-Length Encoding Huffman Arithmetic LZW DPCM Bitplane Encoding
27
Huffman Coding Huffman Coding (coding redundancy) Huffman Coding Algorithm— a bottom-up approach 1. Initialization: Put all symbols on a list sorted according to their frequency counts. 2. Repeat until the list has only one symbol left: (1) From the list pick two symbols with the lowest frequency counts. Form a Huffman subtree that has these two symbols as child nodes and create a parent node. (2) Assign the sum of the children’s frequency counts to the parent and insert it into the list such that the order is maintained. (3) Delete the children from the list. 3. Assign a codeword for each leaf based on the path from the root. Huffman code assignment procedure is based on binary tree structure Huffman code assignment procedure is based on binary tree structure
28
Codeword Formation: Follow the path from the root of the tree to the Child symbol, and accumulate the labels of all the branches. to the Child symbol, and accumulate the labels of all the branches. Example 1: Consider a source with the following symbol occurrence SymbolABCD Probability0.50.30.120.08 P(BCD)=0.5 P(A)=0.5 P(B)=0.3 P(CD)=0.2 P(C)=0.12 P(D)=0.08 0 1 0 0 1 1 P(ABCD)=1
29
code (A)=1, code (B)=01 code(C)=001 code(D)=000 Average length of this code=0.3*2+0.5*1+0.12*3+0.08*3=1.7 bits/ sample
30
Huffman Coding Codeword length 2233322333 CodewordX 01 10 11 000 001 1234512345 Probability 0.25 0.2 0.15 0.25 0.2 0.450.5510.3 0.25 0.45 0 1 1 00 01 00 01 10 11 01 10 11 000 001
31
Example:Example: A series of messages is to be transferred between two computers over PSTN network.The computers just comprise the character A through H. Analyses has shown that each characters has probability as follows. A and B= 0.25 C and D = 0.14 E,F,G and H =0.055 1. Derive the minimum average number of bit per characters 2.Use Huffman code to derive the codewords set and prove this is the minimum set by constructing corresponding Huffman code tree 3.derive the average number of bit per character for your codeword set and compare this with the fixed- length binary codeword Ans: 1.Entropy= bit per codeword H=-[2(0.25*log2 (0.25))+ 2(0.14*log2 (0.14))+ 4(0.055*log2 (0.055))] = 2.175 bits per codeword 2. shown in Fig. bellow
32
Entropy encoding : a: Codeword generation b: tree
33
Example: N=8 symbols (a,b,c,d,e,f,g and h) P(a)=0.01 P(b)=0.02 P(c)=0.05 P(d)=0.09 P(e)=0.18 P(f)=P(g)=0.2 P(h)=0.25 Find: 1.The average length per symbol before coding 2.Entropy 3.The average length per symbol before coding after Huffman coding 4.Efficiency Solution: 1. 2. Entropy : 3.Average length per symbol ( with Huffman Coding)= =2.63 bits/symbol
34
4. Efficiency of Code:=98% Original codeword Huffman codeword 111 01 P(h)=0.25 110 11 P(g)=0.2 101 10 P(f)=0.2 100 001 P(e)=0.18 011 0001 P(d)=0.09 010 00001 P(c)=0.05 001 00001 P(b)=0.02 000 000000 P(a) =0.01
35
H/W Assume the following Files 21 95 169 169 243 243 45 45 45 100 100 120 21 95 169 243 243 243 45 45 45 100 120 120 95 169 169 243 243 45 45 45 45 100 120 120 1.find the size of file using fixed length binary codeword 2.use Huffman code to derive the codeword set 3.find the size of file using Huffman codeword set
36
Run-length coding (RLC) Run-length coding (RLC) (interpixel redundancy) Used to reduce the size of a repeating string of characters (i.e., runs): 1 1 1 1 1 0 0 0 0 0 0 1 (1,5) (0, 6) (1, 1) a a a b b b b b b c c (a,3) (b, 6) (c, 2) Encodes a run of symbols into two bytes: (symbol, count) Can compress any type of data but cannot achieve high compression ratios compared to other compression methods.
37
A predictive coding approach. Each pixel value (except at the boundaries) is predicted based on its neighbors (e.g., linear combination) to get a predicted image. The difference between the original and predicted images yields a differential or residual image. i.e., has much less dynamic range of pixel values. The differential image is encoded using Huffman coding. Differential Pulse Code Modulation (DPCM) Coding Differential Pulse Code Modulation (DPCM) Coding (interpixel redundancy)
38
AB C D AB-A C-AD-A A Simple DPCM Encoding procedure maybe described by the following steps for a 2x2 block of monochrome pixels: 1. Take top left pixel as the base value for the block, pixel A. 2. Calculate three other transformed values by taking the difference between these (respective) pixels and pixel A, Ii.e. B-A, C-A, D-A. 3. Store the base pixel and the differences as the values of the transform. A simple DPCM transform coding
39
Example Consider the following 4x4 image block: then we get: X0 = 120 X1 = 10 X2 = 5 X3 = 0 We can then compress these values by taking less bits to represent the data. 120130 125120
40
Coding (interpixel redundancy) Lempel-Ziv-Welch (LZW) Coding (interpixel redundancy) Requires no priori knowledge of pixel probability distribution values. Assigns fixed length code words to variable length sequences. Included in GIF and TIFF and PDF file formats
41
BEGIN s = next input character; while not EOF { c = next input character; if s + c exists in the dictionary s = s + c; else { output the code for s; add string s + c to the dictionary with a new code; s = c; } output the code for s; END ALGORITHM - LZW Compression
42
Example :LZW compression for string “ABABBABCABABBA” Let’s start with a very simple dictionary (also referred to as a “string table”), initially containing only 3 characters, with codes as follows: Now if the input string is “ABABBABCABABBA”, the LZW compression algorithm works as follows:
43
The output codes are: 1 2 4 5 2 3 4 6 1. Instead of sending 14 characters, only 9 codes need to be sent (compression ratio = 14/9 = 1.56).
44
Example2: Consider the following 4X4, 8-bit image 39 39 126 126 A 512-word dictionary with the following starting content is assumed Dictionary location Entry 0 1. 255 256. 511 0 1. 255 …... ……. scOutputcodedictionary 39 25639-39 391263925739-126 126 258126-126 12639126259126-39 39 39-3912625626039-39-126 126 126-12639258261126-126-39 39 39-39126 39-39-12612626026239-39-126-126 12639 126-3939259263126-39-39 39126 39-12612625726439-126-126 126 Table1
45
The image is encoded by processing its pixels in a Left –To – Right, Top –To- Bottom manner, each successive gray-level value is concatenated with variable – column 1 of table 1, called the currently recognized sequence. The conclusion of this coding table, the dictionary contains 256 code word and LZW algorithm, has successfully identified several repeating gray –level sequence. The encoded output is obtained by reading the third column from top to bottom. The resulting compression ration is 1.42:1
46
LZW Decompression
47
LZW Decompression Example ABABBABCABABBA
48
Exercises Use LZW to trace encoding the string ABRACADABRA. Write a Java program that encodes a given string using LZW. Write a Java program that decodes a given set of encoded codewords using LZW.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.