Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Greedy Algorithms Amihood Amir Bar-Ilan University.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
Greedy Algorithms Huffman Coding
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Encoding Veronica Morales.
Graph Theory in Computer Science Greg Stoll November 22, 2008.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Huffman Coding Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Homework #5 New York University Computer Science Department Data Structures Fall 2008 Eugene Weinstein.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Huffman encoding.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Assignment 6: Huffman Code Generation
Proving the Correctness of Huffman’s Algorithm
Lecture 7 Greedy Algorithms
Optimal Merging Of Runs
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Optimal Merging Of Runs
Huffman Coding.
Math 221 Huffman Codes.
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy: Huffman Codes Yin Tat Lee
Trees Addenda.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Podcast Ch23d Title: Huffman Compression
Lecture 2: Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein

Representing Text (ASCII) Way of representing characters as bits –Characters are ‘a’, ‘b’, ‘1’, ‘%’, ‘\n’, ‘\t’… Each character is represented by a unique 7 bit code. There are 128 possible characters. –STATIC LENGTH ENCODING To encode a long text, we encode it character by character.

Inefficiency of ASCII Realization: In many natural files, we are much more likely to see the letter ‘e’ than the character ‘&’, yet they are both encoded using 7 bits! Solution: Use variable length encoding! The encoding for ‘e’ should be shorter than the encoding for ‘&’.

Variable Length Coding Assume we know the distribution of characters (‘e’ appears 1000 times, ‘&’ appears 1 time) Each character will be encoded using a number of bits that is inversely proportional to its frequency (made precise later). Need a ‘prefix free’ encoding: if ‘e’ = 001 than we cannot assign ‘&’ to be Since encoding is variable length, need to know when to stop.

Encoding Trees Think of encoding as an (unbalanced) tree. Data is in leaf nodes only (prefix free). ‘e’ = 0, ‘a’ = 10, ‘b’ = 11 How to decode ‘01110’? e ab

Cost of a Tree For each character c i let f i be its frequency in the file. Given an encoding tree T, let d i be the depth of c i in the tree (number of bits needed to encode the character). The length of the file after encoding it with the coding scheme defined by T will be C(T)= Σd i f i

Creating an Optimal T Problem: Find tree T with C(T) minimal. Solution (Huffman 1952): –Create a tree for each character. The weight of the tree W(T) is the frequency of the character. –Repeat n-1 times (n = number of chars) Select trees T’, T’’ with lowest weights. Merge them together to form T. Set W(T)= W(T’) + W(T’’) Implement Using Min-Heap. What is running time?

Optimality Intuition Need to show that Huffman’s algorithm indeed results in a tree T with optimal C(T)= Σc i f i. The two least weight letters should be on bottom as siblings (otherwise improve cost by swapping). Intuitively when we combine trees we can think of this as a new letter with combined weight.

Homework Implement: –public class HuffmanTree Has traversal/code printing method –public class HuffmanNode (Comparable) Contains letter, integer frequency Has accessor (getter) methods –public class BinaryHeap (given in class) Read a file ‘huff.txt’ which includes letters and frequencies: –A 20 E 24 G 3 H 4 I 17 L 6 N 5 O 10 S 8 V 1 W 2 Create a Huffman Tree, algorithm: book Print “legend”: the code of each character

10 Tips and Implementation Notes HuffmanNode should be Comparable to work with BinaryHeap –How to implement compareTo method? Implement toString method in BinaryHeap –Print heap after every rearrangement Understand binary heap operations: –insert –deleteMin 10