Lecture 10: Dictionary Coding Thinh Nguyen Oregon State University
Outline LZ77 LZ78 LZW Applications
Review of Entropy Coding a 0.5 0.3 b 0.2 c source Minimize the number of bits to code a, b, c based on the statistical properties of the source
Dictionary Coding index pattern 1 a 2 b 3 ab … n abc Indices Encoder Encoder codes the index Indices Encoder Decoder Both encoder and decoder are assumed to have the same dictionary (table)
Ziv-Lempel Coding (ZL or LZ) Named after J. Ziv and A. Lempel (1977). Adaptive dictionary technique. Store previously coded symbols in a buffer. Search for the current sequence of symbols to code. If found, transmit buffer offset and length.
LZ77 8 3 d e 1 2 f Search buffer Look-ahead buffer a b c d e f 3 2 6 5 4 3 2 1 3 2 Output triplet <offset, length, next> 8 3 d e 1 2 f Transmitted to decoder: If the size of the search buffer is N and the size of the alphabet is M we need bits to code a triplet. PKZip, Zip, Lharc, PNG, gzip, ARJ Variation: Use a VLC to code the triplets!
Drawback with LZ77 Repetetive patterns with a period longer than the search buffer size are not found. If the search buffer size is 4, the sequence a b c d e a b c d e a b c d e a b c d e … will be expanded, not compressed.
LZ78 Store patterns in a dictionary Transmit a tuple <dictionary index, next>
LZ78 a b c 1 b 4 c Strategy needed for limiting dictionary size! a b c Output tuple <dictionary index, next> Transmitted to decoder: a b c 1 b 4 c Decoded: a b c a b a b c Dictionary: 1 a 2 b Strategy needed for limiting dictionary size! 3 c 4 a b 5 a b c
LZW Modification to LZ78 by Terry Welch, 1984. Applications: GIF, v42bis Patented by UniSys Corp. Transmit only the dictionary index. The alphabet is stored in the dictionary in advance.
LZW 1 2 3 5 5 a b c a b a b Input sequence: a b c Output: dictionary index Transmitted: Decoded: 1 2 3 5 5 a b c a b a b Encoder dictionary: Decoder dictionary: 1 a 6 bc 1 a 6 bc 2 b 7 ca 2 b 7 ca 3 c 8 aba 3 c 8 aba 4 d 9 abc 4 d 5 a b 5 a b
And now for some applications: GIF & PNG
GIF CompuServe Graphics Interchange Format (1987, 89). Features: Designed for up/downloading images to/from BBSes via PSTN. 1-, 4-, or 8-bit colour palettes. Interlace for progressive decoding (four passes, starts with every 8th row). Transparent colour for non-rectangular images. Supports multiple images in one file (”animated GIFs”).
GIF: Method Compression by LZW. Dictionary size 2b+1 8-bit symbols b is the number of bits in the palette. Dictionary size doubled if filled (max 4096). Works well on computer generated images.
GIF: Problems Unsuitable for natural images (photos): Maximum 256 colors () bad quality). Repetetive patterns uncommon () bad compression). LZW patented by UniSys Corp. Alternative: PNG
PNG: Portable Network Graphics Designed to replace GIF. Some features: Indexed or true-colour images (· 16 bits per plane). Alpha channel. Gamma information. Error detection. No support for multiple images in one file. Use MNG for that. Method: Compression by LZ77 using a 32KB search buffer. The LZ77 triplets are Huffman coded. More information: www.w3.org/TR/REC-png.html