Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.

Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007

Properties of Huffman Coding  Huffman coding uses longer codewords for symbols with smaller probabilities and shorter codewords for symbols that often occur.  The two longest codewords differ only in the last bit.  The codewords are prefix codes and uniquely decodable.  H · Average Codeword Length < H + 1

Extended Huffman Coding  Huffman coding is not effective for cases when there are small number of symbols and the probabilities are highly skewed.  Example: A source has 2 symbols a and b. P(a) = 0.9 and P(b) = 0.1. H = 0.4690 For Huffman Coding, average codeword length is 1. (far from optimal !)

Extended Huffman Coding (cont)  We can encode a group symbols together and get better performance.  For the previous example, an extended source has symbols {aa, ab, ba, aa} and P(aa) = P(a)*P(a) = 0.81 => 1 P(ab) = P(a)*P(b) = 0.09 => 00 P(bb) = P(b)*P(b) = 0.09 => 011 P(bb) = P(a)*P(b) = 0.01 => 010 Now the average codeword length per symbol is 0.6450 (much better!).

Extended Huffman Coding (cont) 1223231212 P(1) = 0.3 p(2) = 0.5 P(3) = 0.2 Codewords: 1 -> 10 2 -> 0 3-> 11 Average codeword length = 2 * 0.3 + 1 * 0.5 + 2 * 0.2 = 1.5 P(12) = 0.6 P(23) = 0.4 codewords: 12 -> 0 23 -> 1 Average codeword length = (1 * 0.6 + 1 * 0.4)/2 = 0.5 In the second case, the average codeword length is smaller than the entropy of single symbol one. Is this right?

Dictionary Based  Dictionary based method is another way to capture the correlation of symbols.  Static dictionary –Good when the data to be compressed is specific in some application. –For instance, to compress a student database, the world “Name”, “Student ID” will often appear.  Static dictionary method does not work well if the source characteristics change.

Adaptive Dictionary  LZ77 (Jacob Ziv and Abraham Lempel 1977) encoder a b c a a c d a a b c d a b b b a b Longest match string length = 3 Match position 8 Codeword generated is a b c a a c d a a b c d a b b b a b Codeword generated is If No match, Step n: Step n+1:

 LZ77 Decoder a b c a a c d a a b c d Codeword generated is a b c a a c d a a b c d Then move the window by 4 characters and repeat.

A Special Case c d d c d c a b a b a b a d b b a b The output codeword is

LZ78  LZ78 uses an explicit dictionary. OutputCodewordEntry 4ab 5cb 6aba 7aa CodewordEntry 1a 2b 3c Input: a b c b a b a a a Encoding Process Example:

LZ78 Decoding Example InputOutputIndexEntry ab4 cb5 aba6 aa7 CodewordEntry 1a 2b 3c

LZW  Encoder s = next input character; While not EOF { c = next input character; if s + c is in the directory s = s + c; else { output the codeword for s; add s+c to the directory; s = c; } Output code for s

scoutputCodewordDiction Items ab14ab ba25ba ab abb46abb ba bab57bab bc28bc ca39ca ab abEOF4 The input string: a b a b b a b c a b EOF Codewordsymbol 1a 2b 3c LZW encoding example

LZW Decoder s = empty string; While ( (k = next input code) != EOF ) { entry = dictionary entry for k; if (k is not in the dictionary) entry = s + s[0]; output entry; if (s is not empty) add string (s+entry[0]) to dictionary; s = entry; }

skEntry/ Output CodewordDiction Items NULL1a a2b4ab b4 5ba ab5ba6abb ba2b7bab b3c8bc c4ab9ca abEOF The input string: 1 2 4 5 2 3 4 EOF Indexsymbol 1a 2b 3c LZW Decoding example:

Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.

Similar presentations

Presentation on theme: "Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.

Similar presentations

Presentation on theme: "Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007."— Presentation transcript:

Similar presentations

About project

Feedback