Download presentation
Presentation is loading. Please wait.
1
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007
2
Properties of Huffman Coding Huffman coding uses longer codewords for symbols with smaller probabilities and shorter codewords for symbols that often occur. The two longest codewords differ only in the last bit. The codewords are prefix codes and uniquely decodable. H · Average Codeword Length < H + 1
3
Extended Huffman Coding Huffman coding is not effective for cases when there are small number of symbols and the probabilities are highly skewed. Example: A source has 2 symbols a and b. P(a) = 0.9 and P(b) = 0.1. H = 0.4690 For Huffman Coding, average codeword length is 1. (far from optimal !)
4
Extended Huffman Coding (cont) We can encode a group symbols together and get better performance. For the previous example, an extended source has symbols {aa, ab, ba, aa} and P(aa) = P(a)*P(a) = 0.81 => 1 P(ab) = P(a)*P(b) = 0.09 => 00 P(bb) = P(b)*P(b) = 0.09 => 011 P(bb) = P(a)*P(b) = 0.01 => 010 Now the average codeword length per symbol is 0.6450 (much better!).
5
Extended Huffman Coding (cont) 1223231212 P(1) = 0.3 p(2) = 0.5 P(3) = 0.2 Codewords: 1 -> 10 2 -> 0 3-> 11 Average codeword length = 2 * 0.3 + 1 * 0.5 + 2 * 0.2 = 1.5 P(12) = 0.6 P(23) = 0.4 codewords: 12 -> 0 23 -> 1 Average codeword length = (1 * 0.6 + 1 * 0.4)/2 = 0.5 In the second case, the average codeword length is smaller than the entropy of single symbol one. Is this right?
6
Dictionary Based Dictionary based method is another way to capture the correlation of symbols. Static dictionary –Good when the data to be compressed is specific in some application. –For instance, to compress a student database, the world “Name”, “Student ID” will often appear. Static dictionary method does not work well if the source characteristics change.
7
Adaptive Dictionary LZ77 (Jacob Ziv and Abraham Lempel 1977) encoder a b c a a c d a a b c d a b b b a b Longest match string length = 3 Match position 8 Codeword generated is a b c a a c d a a b c d a b b b a b Codeword generated is If No match, Step n: Step n+1:
8
LZ77 Decoder a b c a a c d a a b c d Codeword generated is a b c a a c d a a b c d Then move the window by 4 characters and repeat.
9
A Special Case c d d c d c a b a b a b a d b b a b The output codeword is
10
LZ78 LZ78 uses an explicit dictionary. OutputCodewordEntry 4ab 5cb 6aba 7aa CodewordEntry 1a 2b 3c Input: a b c b a b a a a Encoding Process Example:
11
LZ78 Decoding Example InputOutputIndexEntry ab4 cb5 aba6 aa7 CodewordEntry 1a 2b 3c
12
LZW Encoder s = next input character; While not EOF { c = next input character; if s + c is in the directory s = s + c; else { output the codeword for s; add s+c to the directory; s = c; } Output code for s
13
scoutputCodewordDiction Items ab14ab ba25ba ab abb46abb ba bab57bab bc28bc ca39ca ab abEOF4 The input string: a b a b b a b c a b EOF Codewordsymbol 1a 2b 3c LZW encoding example
14
LZW Decoder s = empty string; While ( (k = next input code) != EOF ) { entry = dictionary entry for k; if (k is not in the dictionary) entry = s + s[0]; output entry; if (s is not empty) add string (s+entry[0]) to dictionary; s = entry; }
15
skEntry/ Output CodewordDiction Items NULL1a a2b4ab b4 5ba ab5ba6abb ba2b7bab b3c8bc c4ab9ca abEOF The input string: 1 2 4 5 2 3 4 EOF Indexsymbol 1a 2b 3c LZW Decoding example:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.