Download presentation
Presentation is loading. Please wait.
Published bySibyl Dixon Modified over 8 years ago
1
Huffman Codes
2
Overview Huffman codes: compressing data (savings of 20% to 90%) Huffman’s greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string C: Alphabet
3
Prefix Code Prefix(-free) code: no codeword is also a prefix of some other codewords (Un-ambiguous) »An optimal data compression achievable by a character code can always be achieved with a prefix code »Simplify the encoding (compression) and decoding Encoding: abc 0. 101. 100 = 0101100 Decoding: 001011101 = 0. 0. 101. 1101 aabe –Use binary tree to represent prefix codes for easy decoding An optimal code is always represented by a full binary tree, in which every non-leaf node has two children »|C| leaves and |C|-1 internal nodes (Exercise B.5-3) »Cost: Frequency of c Depth of c (length of the codeword)
4
(Not optimal)
5
Constructing A Huffman Code C is a set of n characters »Each character c C is an object with a frequency, denoted by f[c] The algorithm builds the tree T in a bottom-up manner »Begin with |C| leaves and perform a sequence of |C|-1 merging »A min-priority queue Q, keyed on f, is used to identify the two least-frequent objects to merge together The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged
7
Constructing A Huffman Code (Cont.) O(lg n) Total computation time = O(n lg n)
8
Lemma 16.2 Greedy-Choice Let C be an alphabet in which each character c C has a frequency f[C]. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree
9
Proof of Lemma 16.2 Without loss of generality, assume f[a] f[b] and f[x] f[y] The cost difference between T and T’ is B(T’’) B(T), but T is optimal, B(T) B(T’’) B(T’’) = B(T) Therefore T’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth
10
Greedy-Choice? Define the cost of a single merger as being the sum of the frequencies of the two items being merged Of all possible mergers at each step, HUFFMAN chooses the one that incurs the least cost
11
Lemma 16.3 Optimal Substructure Let C’ = C – {x, y} {z} »f[z] = f[x] + f[y] Let T’ be any tree representing an optimal prefix code for C’ Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children, represent an optimal prefix code for C Observation: B(T) = B(T’) + f[x] + f[y] B(T’) = B(T)-f[x]-f[y] »For each c C – {x, y} d T (c) = d T’ (c) f[c]d T (c) = f[c]d T’ (c) »d T (x) = d T (y) = d T’ (z) + 1 »f[x]d T (x) + f[y]d T (y) = (f[x] + f[y])(d T’ (z) + 1) = f[z]d T’ (z) + (f[x] + f[y])
12
B(T’) = B(T)-f[x]-f[y] B(T) = 45*1+12*3+13*3+5*4+9*4+16*3 z:14 B(T’) = 45*1+12*3+13*3+(5+9)*3+16*3 = B(T) - 5 - 9
13
Proof of Lemma 16.3 Prove by contradiction. Suppose that T does not represent an optimal prefix code for C. Then there exists a tree T’’ such that B(T’’) < B(T). Without loss of generality, by Lemma 16.2, T’’ has x and y as siblings. Let T’’’ be the tree T’’ with the common parent x and y replaced by a leaf with frequency f[z] = f[x] + f[y]. Then B(T’’’) = B(T’’) - f[x] – f[y] < B(T) – f[x] – f[y] = B(T’) »T’’’ is better than T’ contradiction to the assumption that T’ is an optimal prefix code for C’
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.