Podcast Ch23d Title: Huffman Compression Description: Huffman compression; building a Huffman tree Participants: Barry Kurtz (instructor); John Helfert and Tobie Williams (students) Textbook: Data Structures for Java; William H. Ford and William R. Topp
Huffman Compression Huffman compression relies on counting the number of occurrences of each 8-bit byte in the data and generating a sequence of optimal binary codes called prefix codes. The Huffman algorithm is an example of a greedy algorithm. A greedy algorithm makes an optimal choice at each local step in the hope of creating an optimal solution to the entire problem.
Huffman Compression (continued) The algorithm generates a table that contains the frequency of occurrence of each byte in the file. Using these frequencies, the algorithm assigns each byte a string of bits known as its bit code and writes the bit code to the compressed image in place of the original byte. Compression occurs if each 8-bit char in a file is replaced by a shorter bit sequence.
Huffman Compression (continued) b c d e f Frequency (in thousands) 16 4 8 6 20 3 Fixed-length code word 000 001 010 011 100 101 Compression Ratio = 456000/171000 = 2.67
Huffman Compression (continued) Use a binary tree to represent bit codes. A left edge is a 0 and a right edge is a 1. Each interior node specifies a frequency count, and each leaf node holds a character and its frequency.
Huffman Compression (continued)
Huffman Compression (continued) Each data byte occurs only in a leaf node. Such codes are called prefix codes. A full binary tree is one in where each interior node has two children. By converting the tree to a full tree, we can generate better bit codes for our example.
Huffman Compression (continued) Compression ratio = 456000/148000 = 3.08
Huffman Compression (continued) To compress a file replace each char by its prefix code. To uncompress, follow the bit code bit‑by‑bit from the root of the tree to the corresponding character. Write the character to the uncompressed file. Good compression involves choosing an optimal tree. It can be shown that the optimal bit codes for a file are always represented by a full tree.
Huffman Compression (cont) For each byte b in the original file, let f(b) be the frequency of the byte and d(b) be the depth of the leaf node containing b. The depth of the node is also the number of bits in the bit code for b. The cost of the tree is the number of bits necessary to compress the file. A Huffman tree generates the minimum number of bits in the compressed image. It generates optimal prefix codes.
Circle all statements that are true for a Huffman tree. (a) A Huffman tree is complete. (b) Every interior node has exactly two children. (c) Each byte is in a leaf node. (d) The total number of bits generated by the tree is minimum. (e) Each interior node contains the product of its children's weights. (f) Each interior node contains the sum of its children's weights. (g) Nodes with lower frequency are near the top of the tree. (h) Nodes with lower frequency are near the bottom of the tree.
Building a Huffman Tree For each of the n bytes in a file, assign the byte and its frequency to a tree node, and insert the node into a minimum priority queue ordered by frequency.
Building a Huffman Tree (continued) Remove two elements, x and y, from the priority queue, and attach them as children of a node whose frequency is the sum of the frequencies of its children. Insert the resulting node into the priority queue. In a loop, perform this action n-1 times. Each loop iteration creates one of the n-1 interior nodes of the full tree.
Building a Huffman Tree (continued) With a minimum priority queue the least frequently occurring characters have longer bit codes, and the more frequently occurring chars have shorter bit codes.
Building a Huffman Tree (continued)
Building a Huffman Tree (continued)
Building a Huffman Tree (continued)
Building a Huffman Tree (cont) For the Huffman tree, the compressed file contains (16(2) + 4(4) + 8(2) + 6(3) + 20(2) + 3(4)) x 1000 = 134,000 bits, which corresponds to a compression ratio of 3.4.
The file "data.txt" contains the following ASCII characters: ababbcabaac Construct a Huffman tree for the file. What are the Huffman codes for the characters? Huffman codes: a->0 b-> 11 c->10 (c) Write out the bits in the compressed file. Bits in the compressed file: 01101111100110010
Consider the following file characters: beabdcbacaacbdecdeaaeb Construct a Huffman tree for the file. What are the Huffman codes? a 00 b 10 c 010 d 011 e 11 (c) Write out the bits in the compressed file.