Podcast Ch23e Title: Implementing Huffman Compression Description: Overview; compress() method; building the Huffman tree; decompression Participants: Barry Kurtz (instructor); John Helfert and Tobie Williams (students) Textbook: Data Structures for Java; William H. Ford and William R. Topp
Implementing Huffman Compression The implementation of Huffman compression uses a priority queue, bit arrays, inheritance, and binary files. The class HCompress does the Huffman compression and writes progress messages to a text area.
Implementing Huffman Compression (continued) HCompress has a constructor that takes a file name as an argument, along with a reference to a JTextArea object. It opens the source file and creates a binary output file by adding the extension ".huf" to the name. The public method compress() executes the compression steps.
Implementing Huffman Compression (continued) The methods compressionRatio() and size() provide some of the internal parameters of the compression process. size() gives the number of nodes in the Huffman tree. The method displayTree() displays the resulting Huffman tree.
Implementing Huffman Compression (continued) After creating an HCompress object, call the method compress() that writes a compressed image to the output file. Messages output to the text area trace the progress of the compression. The method displayTree() outputs the Huffman tree in vertical format. Use it only for small trees.
Example of Huffman Compression JTextArea textArea = new JTextArea(30, 80); ... HCompress hc = new HCompress("demo.dat", textArea); hc.compress(); if (hc.size() <= 11) textArea.append(hc.displayTree()); // output the compression ratio textArea.append("The compression ratio = " + hc.compressionRatio() + "\n\n");
Example of Huffman Compression (continued) Output: Frequency analysis ... File size: 57000 characters Number of unique characters: 6 Building the Huffman tree ... Number of nodes in Huffman tree: 11 Generating the Huffman codes ... Tree has 11 entries. Root index = 10 Index Sym Freq Parent Left Right NBits Bits 0 a 16000 9 -1 -1 2 10 1 b 4000 6 -1 -1 4 0111 2 c 8000 8 -1 -1 2 00 3 d 6000 7 -1 -1 3 010 4 e 20000 9 -1 -1 2 11 5 f 3000 6 -1 -1 4 0110 6 Int 7000 7 5 1 7 Int 13000 8 3 6 8 Int 21000 10 2 7 9 Int 36000 10 0 4 10 Int 57000 0 8 9
Example of Huffman Compression (continued) Generating the compressed file The compression ratio is 3.389830508474576 Huffman tree
Summary of compress() Call freqAnalysis() Read the file and tabulate the number of occurrences of each byte. Compute the size of the file, to support the computation of the compression ratio.
Summary of compress() (continued) Call buildTree() Construct the Huffman tree for the file in an array. Call generateCodes() For each leaf node, follow the path to the root and determine the bit code for the byte. In the process, determine the cost of the tree, which is the total number of code bits generated.
Summary of compress() (continued) Write the 16-bit size of the Huffman tree to the compressed file. Write the Huffman tree to the compressed file. Write the total number of bits in the bit codes to the compressed file. Call writeCompressedData() Read the source file again. For each byte, write its bit code to the compressed file.
Summary of compress() (concluded) From the actions of compress(), we see that the format of the compressed file is as follows:
Building the Huffman Tree DiskHuffNode class contains the data and the location of children. Its subclass HuffNode contains the remaining attributes required by the Huffman compression implementation.
Building the Huffman Tree (continued)
Building the Huffman Tree (continued) HCompress method buildTree() executes Huffman algorithm to build the tree.
Building the Huffman Tree (continued) The HCompress method generateCodes() determines the bit codes. To output the bit codes, the method writeCompressedData() declares a BitArray object, compressedData, whose bit size is the cost of the Huffman tree. Upon the conclusion of input, writeCompressedData() calls the write() method of the BitArray class to output the bits to the compressed file.
writeCompressedData() // reread the source file and write the // Huffman codes specified by the Huffman // tree to the stream dest private void writeCompressedData() throws IOException { // vector that will contain the Huffman codes // for the compressed file BitArray compressedData = new BitArray(totalBits); int bitPos, i, j; int b; // close the source file and reopen it source.close(); source = new DataInputStream( new FileInputStream(fname)); // bitPos is used to put bits into compressedData bitPos = 0;
writeCompressedData() (continued) // re-read the source file and generate the Huffman // codes in compressedData while (true){ try { // try to input a byte b = source.readUnsignedByte(); } catch (EOFException eofex) // we are at end-of-file break; // index of the tree node containing ch i = charLoc[b];
writeCompressedData() (concluded) // put the bit code for tree[i].b into // the bit vector for (j=0; j < tree[i].numberOfBits; j++) { // only need to call set() if // tree[i].bits.bit(j) is 1 if (tree[i].bits.bit(j) == 1) compressedData.set(bitPos); // always advance bitPos bitPos++; } // write the bit codes to the output file compressedData.write(dest);
Implementing Huffman Decompression The class HDecompress performs Huffman decompression. The public method decompress() decodes the file.
Implementing Huffman Decompression (continued) The HDecompress method decompress() sequences through the bits of the compressed image, tracing paths from the root node to leaf nodes and writes the corresponding byte to the uncompressed file.
decompress() // decompress the file public void decompress() throws IOException { int i, bitPos; // treeSize and totalBits are read from // the compressed file short treeSize; int totalBits; int decompressedFileSize = 0; textArea.append("Decompressing ... \n"); // input the Huffman tree size treeSize = source.readShort(); // treeSize DiskHuffNode nodes are read from // the compressed file into the tree DiskHuffNode[] tree = new DiskHuffNode[treeSize];
decompress() (continued) // input the tree for (i=0; i < treeSize; i++) { tree[i] = new DiskHuffNode(); tree[i].read(source); } // input the number of bits of Huffman code totalBits = source.readInt(); // allocate a 1-bit bit array, whose contents // we immediately replace by the bits in the // compressed file BitArray bits = new BitArray(1); // read totalBits number of binary bits from // the compressed file into bits bits.read(source, totalBits);
decompress() (continued) // restore the original file by using the // Huffman codes to traverse the tree and // write out the corresponding characters bitPos = 0; while (bitPos < totalBits) { // root of the tree is at index treeSize-1 i = treeSize-1; // follow the bits until we arrive at leaf node while (tree[i].left != HuffNode.NIL) { // if bit is 0, go left; otherwise, go right if (bits.bit(bitPos) == 0) i = tree[i].left; else i = tree[i].right; // we have used the current bit; move to // the next one bitPos++; }
decompress() (concluded) // we are at a leaf node; output the // character to the file dest.writeByte(tree[i].b); decompressedFileSize++; } textArea.append("Decompressed file " + decompressedFileName + " (" + decompressedFileSize + ") characters\n"); // close the two streams source.close(); dest.close(); filesOpen = false;
After compressing a file, the following is the resulting Huffman tree and the corresponding bit stream. Decompress and determine the original file. Original file: abbaccabcaabbcaa
Program 23-2 The application Program23_2.java in Chapter 23 of the software supplement is a GUI application that uses the Huffman algorithms. The figure provides a snapshot of the running application.
Program 23-2 (concluded)