Presentation is loading. Please wait.

Presentation is loading. Please wait.

Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.

Similar presentations


Presentation on theme: "Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence."— Presentation transcript:

1 Huffman Codes

2 Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string C: Alphabet

3 Prefix Code  Prefix(-free) code: no codeword is also a prefix of some other codewords (Un-ambiguous) »An optimal data compression achievable by a character code can always be achieved with a prefix code »Simplify the encoding (compression) and decoding Encoding: abc  0. 101. 100 = 0101100 Decoding: 001011101 = 0. 0. 101. 1101  aabe –Use binary tree to represent prefix codes for easy decoding  An optimal code is always represented by a full binary tree, in which every non-leaf node has two children »|C| leaves and |C|-1 internal nodes (Exercise B.5-3) »Cost: Frequency of c Depth of c (length of the codeword)

4 (Not optimal)

5 Constructing A Huffman Code  C is a set of n characters »Each character c  C is an object with a frequency, denoted by f[c]  The algorithm builds the tree T in a bottom-up manner »Begin with |C| leaves and perform a sequence of |C|-1 merging »A min-priority queue Q, keyed on f, is used to identify the two least-frequent objects to merge together The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged

6

7 Constructing A Huffman Code (Cont.) O(lg n) Total computation time = O(n lg n)

8 Lemma 16.2 Greedy-Choice  Let C be an alphabet in which each character c  C has a frequency f[C]. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree

9 Proof of Lemma 16.2  Without loss of generality, assume f[a]  f[b] and f[x]  f[y]  The cost difference between T and T’ is B(T’’)  B(T), but T is optimal, B(T)  B(T’’)  B(T’’) = B(T) Therefore T’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth

10 Greedy-Choice?  Define the cost of a single merger as being the sum of the frequencies of the two items being merged  Of all possible mergers at each step, HUFFMAN chooses the one that incurs the least cost

11 Lemma 16.3 Optimal Substructure  Let C’ = C – {x, y}  {z} »f[z] = f[x] + f[y]  Let T’ be any tree representing an optimal prefix code for C’ Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children, represent an optimal prefix code for C  Observation: B(T) = B(T’) + f[x] + f[y]  B(T’) = B(T)-f[x]-f[y] »For each c  C – {x, y}  d T (c) = d T’ (c)  f[c]d T (c) = f[c]d T’ (c) »d T (x) = d T (y) = d T’ (z) + 1 »f[x]d T (x) + f[y]d T (y) = (f[x] + f[y])(d T’ (z) + 1) = f[z]d T’ (z) + (f[x] + f[y])

12 B(T’) = B(T)-f[x]-f[y] B(T) = 45*1+12*3+13*3+5*4+9*4+16*3 z:14 B(T’) = 45*1+12*3+13*3+(5+9)*3+16*3 = B(T) - 5 - 9

13 Proof of Lemma 16.3  Prove by contradiction.  Suppose that T does not represent an optimal prefix code for C. Then there exists a tree T’’ such that B(T’’) < B(T).  Without loss of generality, by Lemma 16.2, T’’ has x and y as siblings. Let T’’’ be the tree T’’ with the common parent x and y replaced by a leaf with frequency f[z] = f[x] + f[y]. Then  B(T’’’) = B(T’’) - f[x] – f[y] < B(T) – f[x] – f[y] = B(T’) »T’’’ is better than T’  contradiction to the assumption that T’ is an optimal prefix code for C’


Download ppt "Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence."

Similar presentations


Ads by Google