Podcast Ch23e Title: Implementing Huffman Compression

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
Huffman Coding: An Application of Binary Trees and Priority Queues
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Chapter 9: Huffman Codes
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
1 Project 7: Huffman Code. 2 Extend the most recent version of the Huffman Code program to include decode information in the binary output file and use.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
© 2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved. Data Structures for Java William H. Ford William R. Topp Chapter 23 Bit Arrays.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
1. What is it? It is a queue that access elements according to their importance value. Eg. A person with broken back should be treated before a person.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
3.3 Fundamentals of data representation
Podcast Ch17b Title: Iterative Tree Traversal
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Compression & Huffman Codes
Podcast Ch26a Title: Representing Graphs
Podcast Ch17d Title: Drawing a Binary Tree
CIS265/506 Files & Indexing CIS265/506: File Indexing.
CompSci 201 Data Representation & Huffman Coding
Podcast Ch17a Title: Expression Trees
Java Programming: Guided Learning with Early Objects
Chapter 8 – Binary Search Tree
Chapter 8 – Binary Search Tree
Huffman Compression.
Chapter 9: Huffman Codes
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Podcast Ch18b Title: STree Class
Trees Addenda.
Data Structure and Algorithms
Podcast Ch22c Title: Deleting from a Heap
Podcast Ch23b Title: BitArray Implementation
Podcast Ch18c Title: BST delete operation
Podcast Ch25c Title: Shortest Path Algorithm
Podcast Ch23f Title: Serialization
Heaps and Priority Queues
Podcast Ch22b Title: Inserting into a Heap
Podcast Ch18a Title: Overview of Binary Search Trees
Podcast Ch20b Title: TreeMap Design
Podcast Ch18d Title: Binary Search Tree Iterator
Podcast Ch21d Title: Hash Class Iterators
Podcast Ch27a Title: Overview of AVL Trees
Podcast Ch21a Title: Hash Functions
Podcast Ch23d Title: Huffman Compression
Podcast Ch27b Title: AVLTree implementation
Podcast Ch22a Title: Array-based Binary Trees
Podcast Ch24b Title: Strongly Connected Components
Podcast Ch21b Title: Collision Resolution
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Podcast Ch23a Title: Bit Arrays
Podcast Ch23c Title: Binary Files
Presentation transcript:

Podcast Ch23e Title: Implementing Huffman Compression Description: Overview; compress() method; building the Huffman tree; decompression Participants: Barry Kurtz (instructor); John Helfert and Tobie Williams (students) Textbook: Data Structures for Java; William H. Ford and William R. Topp

Implementing Huffman Compression The implementation of Huffman compression uses a priority queue, bit arrays, inheritance, and binary files. The class HCompress does the Huffman compression and writes progress messages to a text area.

Implementing Huffman Compression (continued) HCompress has a constructor that takes a file name as an argument, along with a reference to a JTextArea object. It opens the source file and creates a binary output file by adding the extension ".huf" to the name. The public method compress() executes the compression steps.

Implementing Huffman Compression (continued) The methods compressionRatio() and size() provide some of the internal parameters of the compression process. size() gives the number of nodes in the Huffman tree. The method displayTree() displays the resulting Huffman tree.

Implementing Huffman Compression (continued) After creating an HCompress object, call the method compress() that writes a compressed image to the output file. Messages output to the text area trace the progress of the compression. The method displayTree() outputs the Huffman tree in vertical format. Use it only for small trees.

Example of Huffman Compression JTextArea textArea = new JTextArea(30, 80); ... HCompress hc = new HCompress("demo.dat", textArea); hc.compress(); if (hc.size() <= 11) textArea.append(hc.displayTree()); // output the compression ratio textArea.append("The compression ratio = " + hc.compressionRatio() + "\n\n");

Example of Huffman Compression (continued) Output: Frequency analysis ... File size: 57000 characters Number of unique characters: 6 Building the Huffman tree ... Number of nodes in Huffman tree: 11 Generating the Huffman codes ... Tree has 11 entries. Root index = 10 Index Sym Freq Parent Left Right NBits Bits 0 a 16000 9 -1 -1 2 10 1 b 4000 6 -1 -1 4 0111 2 c 8000 8 -1 -1 2 00 3 d 6000 7 -1 -1 3 010 4 e 20000 9 -1 -1 2 11 5 f 3000 6 -1 -1 4 0110 6 Int 7000 7 5 1 7 Int 13000 8 3 6 8 Int 21000 10 2 7 9 Int 36000 10 0 4 10 Int 57000 0 8 9

Example of Huffman Compression (continued) Generating the compressed file The compression ratio is 3.389830508474576 Huffman tree

Summary of compress() Call freqAnalysis() Read the file and tabulate the number of occurrences of each byte. Compute the size of the file, to support the computation of the compression ratio.

Summary of compress() (continued) Call buildTree() Construct the Huffman tree for the file in an array. Call generateCodes() For each leaf node, follow the path to the root and determine the bit code for the byte. In the process, determine the cost of the tree, which is the total number of code bits generated.

Summary of compress() (continued) Write the 16-bit size of the Huffman tree to the compressed file. Write the Huffman tree to the compressed file. Write the total number of bits in the bit codes to the compressed file. Call writeCompressedData() Read the source file again. For each byte, write its bit code to the compressed file.

Summary of compress() (concluded) From the actions of compress(), we see that the format of the compressed file is as follows:

Building the Huffman Tree DiskHuffNode class contains the data and the location of children. Its subclass HuffNode contains the remaining attributes required by the Huffman compression implementation.

Building the Huffman Tree (continued)

Building the Huffman Tree (continued) HCompress method buildTree() executes Huffman algorithm to build the tree.

Building the Huffman Tree (continued) The HCompress method generateCodes() determines the bit codes. To output the bit codes, the method writeCompressedData() declares a BitArray object, compressedData, whose bit size is the cost of the Huffman tree. Upon the conclusion of input, writeCompressedData() calls the write() method of the BitArray class to output the bits to the compressed file.

writeCompressedData() // reread the source file and write the // Huffman codes specified by the Huffman // tree to the stream dest private void writeCompressedData() throws IOException { // vector that will contain the Huffman codes // for the compressed file BitArray compressedData = new BitArray(totalBits); int bitPos, i, j; int b; // close the source file and reopen it source.close(); source = new DataInputStream( new FileInputStream(fname)); // bitPos is used to put bits into compressedData bitPos = 0;

writeCompressedData() (continued) // re-read the source file and generate the Huffman // codes in compressedData while (true){ try { // try to input a byte b = source.readUnsignedByte(); } catch (EOFException eofex) // we are at end-of-file break; // index of the tree node containing ch i = charLoc[b];

writeCompressedData() (concluded) // put the bit code for tree[i].b into // the bit vector for (j=0; j < tree[i].numberOfBits; j++) { // only need to call set() if // tree[i].bits.bit(j) is 1 if (tree[i].bits.bit(j) == 1) compressedData.set(bitPos); // always advance bitPos bitPos++; } // write the bit codes to the output file compressedData.write(dest);

Implementing Huffman Decompression The class HDecompress performs Huffman decompression. The public method decompress() decodes the file.

Implementing Huffman Decompression (continued) The HDecompress method decompress() sequences through the bits of the compressed image, tracing paths from the root node to leaf nodes and writes the corresponding byte to the uncompressed file.

decompress() // decompress the file public void decompress() throws IOException { int i, bitPos; // treeSize and totalBits are read from // the compressed file short treeSize; int totalBits; int decompressedFileSize = 0; textArea.append("Decompressing ... \n"); // input the Huffman tree size treeSize = source.readShort(); // treeSize DiskHuffNode nodes are read from // the compressed file into the tree DiskHuffNode[] tree = new DiskHuffNode[treeSize];

decompress() (continued) // input the tree for (i=0; i < treeSize; i++) { tree[i] = new DiskHuffNode(); tree[i].read(source); } // input the number of bits of Huffman code totalBits = source.readInt(); // allocate a 1-bit bit array, whose contents // we immediately replace by the bits in the // compressed file BitArray bits = new BitArray(1); // read totalBits number of binary bits from // the compressed file into bits bits.read(source, totalBits);

decompress() (continued) // restore the original file by using the // Huffman codes to traverse the tree and // write out the corresponding characters bitPos = 0; while (bitPos < totalBits) { // root of the tree is at index treeSize-1 i = treeSize-1; // follow the bits until we arrive at leaf node while (tree[i].left != HuffNode.NIL) { // if bit is 0, go left; otherwise, go right if (bits.bit(bitPos) == 0) i = tree[i].left; else i = tree[i].right; // we have used the current bit; move to // the next one bitPos++; }

decompress() (concluded) // we are at a leaf node; output the // character to the file dest.writeByte(tree[i].b); decompressedFileSize++; } textArea.append("Decompressed file " + decompressedFileName + " (" + decompressedFileSize + ") characters\n"); // close the two streams source.close(); dest.close(); filesOpen = false;

After compressing a file, the following is the resulting Huffman tree and the corresponding bit stream. Decompress and determine the original file. Original file: abbaccabcaabbcaa

Program 23-2 The application Program23_2.java in Chapter 23 of the software supplement is a GUI application that uses the Huffman algorithms. The figure provides a snapshot of the running application.

Program 23-2 (concluded)