CSCI 3280 Tutorial 6
Outline Theory part of LZW Tree representation of LZW Table representation of LZW
Introduction What is LZW? Lossless compression method Lempel-Ziv-Welch Based on LZ78
Basic idea Assume repetition of phase usually occurs Use a code to represent one phase Build a dictionary of phases that we met If a phase is found in dictionary, use the code If not found, add it to dictionary and give it a code Very high compression ratio if lots of repetition
Algorithm But how to handle byte stream? The algorithm is similar to lecture note! However, each node have maximum number of 256 child nodes. R ….. R AssignmentLecture
R a b a b c d ba ENCODER SIDE: R c d …. Tree Structure Example
R a b a b c d ba ENCODER SIDE: R c d output: 97, b 256 …. Tree Structure Example
R a b a b c d ba ENCODER SIDE: R c d output: 97,98 b 256 …. 257 a Tree Structure Example
R a b a b c d ba ENCODER SIDE: R c d output: 97,98,256 b 256 …. 257 a 258 c Tree Structure Example
R a b a b c d ba ENCODER SIDE: R c d output: 97,98,256,99 b 256 …. 257 a 258 c 259 d Tree Structure Example
R a b a b c d ba ENCODER SIDE: R c d output: 97,98,256,99,100Encode complete! b 256 …. 257 a 258 c 259 d Tree Structure Example
R ba DECODE SIDE: R c d …. output: input: 97,98,256,99,100 Tree Structure Example
R ba DECODE SIDE: R c d …. output: a input: 97,98,256,99,100 Tree Structure Example
R ba DECODE SIDE: R c d …. output: a b input: 97,98,256,99,100 Last string Last string = a b 256 Tree Structure Example
R ba DECODE SIDE: R c d …. output: a b a b input: 97,98,256,99,100 Last string Last string = b b a Tree Structure Example
R ba DECODE SIDE: R c d …. output: a b a b c input: 97,98,256,99,100 Last string Last string = a b b a 258 c Tree Structure Example
R ba DECODE SIDE: R c d …. output: a b a b c d input: 97,98,256,99,100 Last string Last string = c b a 258 c 259 d Tree Structure Example
Algorithm Now let’s see a table structure example
IN: a b c c c d c c d CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII-255 :: OUT: Prefix Char. Search Code Saved NULL Table Structure Compression
IN: a b c c c d c c d CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII-255 :: OUT: Prefix Char. Search Code Saved NULL Table Structure Compression ‘a’ “a” 97 “a” ‘b’ “ab” 256 “ ab ” 97
IN: a b c c c d c c d CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” OUT: 97, Prefix Char. Search Code Saved NULL ‘b’ “b” 98 ‘b’ ‘c’ “bc” 257 “ bc ” 98 Table Structure Compression
IN: a b c c c d c c d CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” 257 “ bc ” OUT: 97, 98, Prefix Char. Search Code Saved NULL ‘c’ “c” 99 ‘c’ “cc” 258 “ cc ” 99 Table Structure Compression
IN: a b c c c d c c d CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” 257 “ bc ” 258 “ cc ” OUT: 97, 98, 99, Prefix Char. Search Code Saved NULL ‘c’ “c” 99 ‘c’ “cc” 258 ‘cc’ ‘d’ “ccd” 259 “ ccd ” 258 Table Structure Compression
IN: a b c c c d c c d CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” 257 “ bc ” 258 “ cc ” 259 “ ccd ” OUT: 97, 98, 99, 258, Prefix Char. Search Code Saved NULL ‘d’ “d” 100 ‘d’ ‘c’ “dc” 260 “ dc ” 100 Table Structure Compression
IN: a b c c c d c c d CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” 257 “ bc ” 258 “ cc ” 259 “ ccd ” 260 “ dc ” OUT: 97, 98, 99, 258, 100, Prefix Char. Search Code Saved NULL ‘c’ “c” 99 ‘c’ “cc” 258 ‘cc’ ‘d’ “ccd” 259 Table Structure Compression
IN: 97, 98, 99, 258, 100, 259 CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII-255 OUT: cW pW C C dict(pW)+C 97 a Table Structure Decompression
IN: 97, 98, 99, 258, 100, 259 CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII-255 OUT: a “ab” 256 b cW pW C C dict(pW)+C 98 97‘b’ “ab” Table Structure Decompression
IN: 97, 98, 99, 258, 100, 259 CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” OUT: a b “bc” 257 c cW pW C C dict(pW)+C 99 98‘c’ “bc” Table Structure Decompression
IN: 97, 98, 99, 258, 100, 259 CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” 257 “ bc ” OUT: a b c “cc” 258 c cW pW C C dict(pW)+C ‘c’ “cc” exception Table Structure Decompression
IN: 97, 98, 99, 258, 100, 259 CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” 257 “ bc ” 258 “ cc ” OUT: a b c c c “ccd” 259 d cW pW C C dict(pW)+C ‘d’ “ccd” Table Structure Decompression
IN: 97, 98, 99, 258, 100, 259 CODEEntry 0NUL : 97a 98b 99c 100d :: 255ASCII “ ab ” 257 “ bc ” 258 “ cc ” 259 “ ccd ” OUT: a b c c c d “dc” 260 c c d cW pW C C dict(pW)+C ‘c’ “dc” Table Structure Decompression
Handling of exception case: Usually C is the first char of current word(cW) In exception C is the first char of previous word(pW)
Exercise Encode and decode with tree structure. (This helps to better understand that in exception case, why C must be the first char of pW.) My understanding to the exercise question: To encounter exception case, cW must be construct along pW branch, otherwise we will not encounter exception case, so the first char of cW and pW is the same char.