Download presentation
Presentation is loading. Please wait.
Published byRakesh Kumar Modified over 8 years ago
1
Lecture 4: Data Compression Techniques TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
2
Outline Huffman coding Arithmetic coding Application: JBIG Universal coding LZ-coding LZ77, LZ78, LZW Applications: GIF and PNG
3
Repetition Coding: Assigning binary codewords to (blocks of) source symbols. Variable-length codes (VLC) and fixed- length codes. Instantaneous codes ½ Uniqely decodable codes ½ Non-singular codes ½ All codes Tree codes are instantaneous. Tree code, Kraft’s Inequality.
4
Creating a Code: The Data Compression Problem Assume a source with an alphabet A and known symbol probabilities {p i }. Goal: Chose the codeword lengths as to minimize the bitrate, i.e., the average number of bits per symbol l i ¢ p i. Trivial solution: l i = 0 8 i. Restriction: We want an instantaneous code, so 2 -l i · 1 (KI) must be valid. Solution (at least in theory): l i = – log p i
5
In practice… Use some nice algorithm to find the code tree –Huffman coding –Tunnstall coding
6
Huffman Coding Two-step algorithm: 1.Iterate: –Merge the least probable symbols. –Sort. 2.Assign bits. a d b c 0.5 0.25 0.125 0.5 0.25 0.5 0.25 0.5 Merge Sort Assign 0 1 0 10 11 0 10 110 111 Get code
7
Coding of the BMS Trick: Code blocks of symbols (extended source). Example: p 1 = ¼, p 2 = ¾. Applying the Huffman algorithm directly: 1 bit/symbol. Block P (block) Code 009/160) 013/1610 approx 0.85 103/16110bits/symbol 111/16111
8
Huffman Coding: Pros and Cons +Fast implementations. + Error resilient: resynchronizes in ~ l 2 steps. -The code tree grows exponentially when the source is extended. -The symbol probabilities are built-in in the code. Hard to use Huffman coding for extended sources / large alphabets or when the symbol probabilities are varying by time.
9
Arithmetic Coding Shannon-Fano-Elias Basic idea: Split the interval [0,1] according to the symbol probabilities. Example: A = {a,b,c,d}, P = {½, ¼, 1/8, 1/8}.
10
b c a 0.6 0.5 0.2 0.8 0.2 Start in b. Code the sequence c c a. cb a 0.9 Code the sequence c c a. ) Code the interval [0.9, 0.96] Bit IntervalDecoder 1c0.5-1 10.75-1 10.875-1 0 -0.9375 1c a0.90624-0.9375 bc 0.510 cba 0.910.960.98
11
An Image Coding Application Consider the image content in a local environment of a pixel as a state in a Markov model. Example (binary image): Such an environment is called a context. A probability distribution for X can be estimated for each state. Then arithmetic coding is used. This is the basic idea behind the JBIG algorithm for binary images and data. X 0 0 1 1 0
12
Flushing the Coder The coding process is ended (restarted) and the coder flushed –after a given number of symbols (FIVO) or –When the interval is too small for a fixed number of output bits (VIFO).
13
Universal Coding A universal coder doesn’t need to know the statistics in advance. Instead, estimate from data. Forward estimation: Estimate statistics in a first pass and transmit to the decoder. Backward estimation: Estimate from already transmitted (received) symbols.
14
Universal Coding: Examples 1.An adaptive arithmetic coder 2.An adaptive dictionary technique –The LZ coders [Sayood 5] 3.An adaptive Huffman coder [Sayood 3.4] Arithmetic coder Statistics estimation
15
Ziv-Lempel Coding (ZL or LZ) Named after J. Ziv and A. Lempel (1977). Adaptive dictionary technique. –Store previously coded symbols in a buffer. –Search for the current sequence of symbols to code. –If found, transmit buffer offset and length.
16
LZ77 abcabd acab d eee f c Search bufferLook-ahead buffer Output triplet 12345678 3 8013d0e2f 2 Transmitted to decoder: If the size of the search buffer is N and the size of the alphabet is M we need bits to code a triplet. Variation: Variation: Use a VLC to code the triplets! PKZip, Zip, Lharc, PNG, gzip, ARJ
17
Drawback with LZ77 Repetetive patterns with a period longer than the search buffer size are not found. If the search buffer size is 4, the sequence a b c d e a b c d e a b c d e a b c d e … will be expanded, not compressed.
18
LZ78 Store patterns in a dictionary Transmit a tuple Transmit a tuple
19
LZ78 Output tuple Dictionary: 1 a 2 b 3 c 4 a b 5 a b c abcaba bc 0a Transmitted to decoder: 0b0c1b4c Decoded:a b c a a b b c Strategy needed for limiting dictionary size!
20
LZW Modification to LZ78 by Terry Welch, 1984. Applications: GIF, v42bis Patented by UniSys Corp. Transmit only the dictionary index. The alphabet is stored in the dictionary in advance.
21
LZW Output: dictionary index Encoder dictionary: 1 a 2 b 3 c 4 d 5 a b abcaba bc 1 Transmitted: 2355 Decoded: abca b 6 bc 7 ca 8 aba 9 abc Decoder dictionary: 1 a 2 b 3 c 4 d 5 a b 6 bc 7 ca 8 aba Input sequence:
22
And now for some applications: GIF & PNG
23
GIF CompuServe Graphics Interchange Format (1987, 89). Features: –Designed for up/downloading images to/from BBSes via PSTN. –1-, 4-, or 8-bit colour palettes. –Interlace for progressive decoding (four passes, starts with every 8th row). –Transparent colour for non-rectangular images. –Supports multiple images in one file (”animated GIFs”).
24
GIF: Method Compression by LZW. Dictionary size 2 b+1 8-bit symbols –b is the number of bits in the palette. Dictionary size doubled if filled (max 4096). Works well on computer generated images.
25
GIF: Problems Unsuitable for natural images (photos): –Maximum 256 colors ( ) bad quality). –Repetetive patterns uncommon ( ) bad compression). LZW patented by UniSys Corp. Alternative: PNG
26
PNG: Portable Network Graphics Designed to replace GIF. Some features: –Indexed or true-colour images ( · 16 bits per plane). –Alpha channel. –Gamma information. –Error detection. No support for multiple images in one file. –Use MNG for that. Method: –Compression by LZ77 using a 32KB search buffer. –The LZ77 triplets are Huffman coded. More information: www.w3.org/TR/REC-png.html
27
Summary Huffman coding –Simple, easy, fast –Complexity grows exponentially with the block length –Statistics built-in in the code Arithmetic coding –Complexity grows linearly with the block size –Easily adapted to variable statistics ) used for coding of Markov sources Universal coding –Adaptive Huffman or arithmetic coder –LZ77: Buffer with previously sent sequences –LZ77: Buffer with previously sent sequences –LZ78: Dictionary instead of buffer –LZ78: Dictionary instead of buffer –LZW: Modification to LZ78 –LZW: Modification to LZ78
28
Summary, cont Where are the algorithms used? –Huffman coding: JPEG, MPEG, PNG, … –Arithmetic coding: JPEG, JBIG, MPEG-4, … –LZ77: PNG, PKZip, Zip, gzip, … –LZW: compress, GIF, v42bis, …
29
Finally These methods work best if the source alphabet is small and the distribution skewed. –Text –Graphics Analog sources (images, sound) require other methods –complex dependencies –accepted distortion
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.