UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by.

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Lossless Data Compression Using run-length and Huffman Compression pages
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
1 Project 7: Huffman Code. 2 Extend the most recent version of the Huffman Code program to include decode information in the binary output file and use.
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Building Java Programs Priority Queues, Huffman Encoding.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Bahareh Sarrafzadeh 6111 Fall 2009
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Huffman encoding.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
CompSci 100e 10.1 Binary Digits (Bits) l Yes or No l On or Off l One or Zero l
3.3 Fundamentals of data representation
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
CSC317 Greedy algorithms; Two main properties:
Data Coding Run Length Coding
Data Compression.
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 9: Huffman Codes
Optimal Merging Of Runs
Huffman Coding.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Data Structure and Algorithms
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Podcast Ch23d Title: Huffman Compression
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

UTILITIES Group 3 Xin Li Soma Reddy

Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by modems.

A Standard coding scheme

File Compression Compression –Reducing the number of bits required for data representation. Two phases –The encoding phase (compressing) –The decoding phase (uncompressing) Strategy –Ensure that most-frequent characters have the shortest representation.

A Binary Trie A left branch represents 0 and a right branch represents 1. The path to a node indicates its representation.

Representation of the original code by a tree

A Slightly Better Tree

A Full Tree All nodes either are leaves or have two children.

A Prefix Code No character code is a prefix of another character code. Guaranteed if the characters are only in leaves. Can be decoded unambiguously.

An Optimal Prefix Code Tree

Optimal Prefix Code

Huffman’s Algorithm Constructs an optimal prefix code. The weight of a tree is the sum of the frequencies of its leaves. Works by repeatedly merging the two minimum weight trees.

Initial Stage of Huffman’s Algorithm

Huffman’s Algorithm After the First Merge

Huffman’s Algorithm After the Second Merge

Huffman’s Algorithm After the Third Merge

Huffman’s Algorithm After the Fourth Merge

Huffman’s Algorithm After the Fifth Merge

Huffman’s Algorithm After the Final Merge

Implementation BitInputStream Class BitOutputStream Class CharCounter Class HuffmanTree Class Hzip Class HZIPInputStream Class HZIPOutputStream Class

BitInputStream Class Wraps an Inputstream and provides bit-at-a-time input Main Methods: readBit reads one bit as a 0 or 1 getBit gets an individual bit in an 8-bit byte close closes underlying stream

BitOutputStream Class Wraps an Outputstream and provides bit-at-a-time output Main Methods: writeBit writes one bit (0 or 1) writebits writes array of bits setBit sets an individual bit in an 8-bit byte flush flushes buffered bits close closes underlying stream

CharCounter Class Maintains character counts Main Methods: getCount returns the number of occurrences of a character setCount sets the number of occurences of a character

HuffmanTree Class (cont) Manipulates Huffman coding trees Main Methods: getCode obtains the code of a given character getCharobtains the character by giving a code createTree constructs the Huffman coding tree

HuffmanTree Class Main Methods: writeEncodingTable writes an encoding table to an output stream readEncodingTable reads the encoding table from an input stream

Hzip Class Main Methods: compress adds a “.huf” to the filename uncompress adds a “.uc” to the filename main

HZIPInputStream Class Contains an uncompression wrapper Main Method: read returns an uncompressed byte from the wrapped input stream

HZIPOutputStream Class Contains a compression wrapper Writes to HZIPOutputStream are compressed and sent to the output stream being wrapped. No writing is actually done until close. Main Method: close

Programming Project Part 1 Storing the character counts in the encoding table gives the uncompression algorithm the ability to perform extra consistency checks. Code is added to verify that the result of the uncompression has the same character counts as the encoding table claimed.

Part 1 Implementation (cont) Add several public methods In HZIPInputputStream class public HuffmanTree getTree () { return codeTree; } In HuffmanTree class public CharCounter getCharCounter() { return theCounts; }

Part 1 Implementation In Hzip class, uncompress method HuffmanTree tree = hzin.getTree(); CharCounter newcc1 = tree.getCharCounter(); CharCounter newcc2 = new CharCounter(in); for (int i = 0; i < BitUtils.DIFF_BYTES; i++) { if (newcc2.getCount(i) != newcc1.getCount(i)) { System.out.println( " There is an error in the uncompressing process."); File file1 = new File(inFile); file1.delete(); }

Part 2 Check the size of the resulting compressed file and abort if the size is larger than or equal to the original.

Part 2 Implementation In Hzip class, compress method File originFile = new File (inFile); File compreFile = new File (compressedFile); if (originFile.length() < compreFile.length()) { System.out.println( "The size of the resulting compressed file is larger than the original."); compreFile.delete(); return; } else if (originFile.length() == compreFile.length()) { System.out.println( "The size of the resulting compressed file is equal to the original."); compreFile.delete(); return; }

Run Example To compress a text file whose size is six bytes C:\>set path=c:/j2sdk1.4.1_01/bin C:\>javac Hzip.java C:\>javac HZIPInputStream.java C:\>javac HZIPOutputStream.java C:\>java Hzip -c file1.txt The size of the resulting compressed file is larger than the original. C:\>

Conclusion Text compression is an important technique that allows us to increase both effective disk capacity and effective modem speed. It is an area of active research. Huffman’s algorithm typically achieves compression of 25% on text files.