CSE 326 Huffman coding Richard Anderson
Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F
Decode the following E0 T11 N100 I1010 S E0 T10 N100 I0111 S
Prefix code No prefix of a codeword is a codeword Uniquely decodable A001 B C D E F
Prefix codes and binary trees Tree representation of prefix codes A00 B010 C0110 D0111 E10 F11
Construct the tree for the following code E0 T11 N100 I1010 S1011
Minimum length code Average cost Average leaf depth Huffman tree – tree with minimum weighted path length C(T) – weighted path length
Compute average leaf depth A001/4 B0101/8 C01101/16 D01111/16 E11/2
Huffman code algorithm Derivation Two rarest items will have the longest codewords Codewords for rarest items differ only in the last bit Idea: suppose the weights are with and the smallest weights Start with an optimal code for and Extend the codeword for to get codewords for and
Huffman code H = new Heap() for each w i T = new Tree(w i ) H.Insert(T) while H.Size() > 1 T 1 = H.DeleteMin() T 2 = H.DeleteMin() T 3 = Merge(T 1, T 2 ) H.Insert(T 3 )
Example: Weights 4, 5, 6, 7, 11, 14, 21
Draw a Huffman tree for the following data values and show internal weights: 3, 5, 9, 14, 16, 35
Correctness proof The most amazing induction proof Induction on the number of code words The Huffman algorithm finds an optimal code for n = 1 Suppose that the Huffman algorithm finds an optimal code for codes size n, now consider a code of size n
Key lemma Given a tree T, we can find a tree T’, with the two minimum cost leaves as siblings, and C(T’) <= C(T)
Modify the following tree to reduce the WPL
Finish the induction proof T – Tree constructed by Huffman X – Any code tree Show C(T) <= C(X) T’ and X’ – Trees from the lemma C(T’) = C(T) C(X’) <= C(X) T’’ and X’’ – Trees with minimum cost leaves x and y removed
X : Any tree, X’: – modified, X’’ : Two smallest leaves removed C(X’’) = C(X’) – x – y C(T’’) = C(T’) – x – y C(T’’) <= C(X’’) C(T) = C(T’) = C(T’’) + x + y <= C(X’’) + x + y = C(X’) <= C(X)