Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.

Huffman Codes

Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string C: Alphabet

Prefix Code  Prefix(-free) code: no codeword is also a prefix of some other codewords (Un-ambiguous) »An optimal data compression achievable by a character code can always be achieved with a prefix code »Simplify the encoding (compression) and decoding Encoding: abc  0. 101. 100 = 0101100 Decoding: 001011101 = 0. 0. 101. 1101  aabe –Use binary tree to represent prefix codes for easy decoding  An optimal code is always represented by a full binary tree, in which every non-leaf node has two children »|C| leaves and |C|-1 internal nodes (Exercise B.5-3) »Cost: Frequency of c Depth of c (length of the codeword)

(Not optimal)

Constructing A Huffman Code  C is a set of n characters »Each character c  C is an object with a frequency, denoted by f[c]  The algorithm builds the tree T in a bottom-up manner »Begin with |C| leaves and perform a sequence of |C|-1 merging »A min-priority queue Q, keyed on f, is used to identify the two least-frequent objects to merge together The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged

Constructing A Huffman Code (Cont.) O(lg n) Total computation time = O(n lg n)

Lemma 16.2 Greedy-Choice  Let C be an alphabet in which each character c  C has a frequency f[C]. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree

Proof of Lemma 16.2  Without loss of generality, assume f[a]  f[b] and f[x]  f[y]  The cost difference between T and T’ is B(T’’)  B(T), but T is optimal, B(T)  B(T’’)  B(T’’) = B(T) Therefore T’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth

Greedy-Choice?  Define the cost of a single merger as being the sum of the frequencies of the two items being merged  Of all possible mergers at each step, HUFFMAN chooses the one that incurs the least cost

Lemma 16.3 Optimal Substructure  Let C’ = C – {x, y}  {z} »f[z] = f[x] + f[y]  Let T’ be any tree representing an optimal prefix code for C’ Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children, represent an optimal prefix code for C  Observation: B(T) = B(T’) + f[x] + f[y]  B(T’) = B(T)-f[x]-f[y] »For each c  C – {x, y}  d T (c) = d T’ (c)  f[c]d T (c) = f[c]d T’ (c) »d T (x) = d T (y) = d T’ (z) + 1 »f[x]d T (x) + f[y]d T (y) = (f[x] + f[y])(d T’ (z) + 1) = f[z]d T’ (z) + (f[x] + f[y])

B(T’) = B(T)-f[x]-f[y] B(T) = 45*1+12*3+13*3+5*4+9*4+16*3 z:14 B(T’) = 45*1+12*3+13*3+(5+9)*3+16*3 = B(T) - 5 - 9

Proof of Lemma 16.3  Prove by contradiction.  Suppose that T does not represent an optimal prefix code for C. Then there exists a tree T’’ such that B(T’’) < B(T).  Without loss of generality, by Lemma 16.2, T’’ has x and y as siblings. Let T’’’ be the tree T’’ with the common parent x and y replaced by a leaf with frequency f[z] = f[x] + f[y]. Then  B(T’’’) = B(T’’) - f[x] – f[y] < B(T) – f[x] – f[y] = B(T’) »T’’’ is better than T’  contradiction to the assumption that T’ is an optimal prefix code for C’

Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.

Similar presentations

Presentation on theme: "Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.

Similar presentations

Presentation on theme: "Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence."— Presentation transcript:

Similar presentations

About project

Feedback