Huffman Coding: An Application of Binary Trees and Priority Queues

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
x x x 1 =613 Base 10 digits {0...9} Base 10 digits {0...9}
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Topic 20: Huffman Coding The author should gaze at Noah, and... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Chapter 2 Source Coding (part 2)
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Data Structures Week 6: Assignment #2 Problem
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-217, ext: 1204, Lecture 4 (Week 2)
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
Expression Tree The inner nodes contain operators while leaf nodes contain operands. a c + b g * d e f Start of lecture 25.
HUFFMAN CODES.
Review GitHub Project Guide Huffman Coding
Assignment 6: Huffman Code Generation
Madivalappagouda Patil
ISNE101 – Introduction to Information Systems and Network Engineering
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Huffman Compression.
Chapter 9: Huffman Codes
Huffman Coding.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Data Structure and Algorithms
Topic 20: Huffman Coding The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Huffman Coding: An Application of Binary Trees and Priority Queues CS 102

Encoding and Compression of Data Fax Machines ASCII Variations on ASCII min number of bits needed cost of savings patterns modifications CS 102

Purpose of Huffman Coding An Introduction to Huffman Coding March 21, 2000 Purpose of Huffman Coding Proposed by Dr. David A. Huffman in 1952 “A Method for the Construction of Minimum Redundancy Codes” Applicable to many forms of data transmission Our example: text files CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 The Basic Algorithm Huffman coding is a form of statistical coding Not all characters occur with the same frequency! Yet all characters are allocated the same amount of space 1 char = 1 byte, be it e or x CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 The Basic Algorithm Any savings in tailoring codes to frequency of character? Code word lengths are no longer fixed like ASCII. Code word lengths vary and will be shorter for the more frequently used characters. CS 102 Mike Scott

The (Real) Basic Algorithm An Introduction to Huffman Coding March 21, 2000 The (Real) Basic Algorithm 1. Scan text to be compressed and tally occurrence of all characters. 2. Sort or prioritize characters based on number of occurrences in text. 3. Build Huffman code tree based on prioritized list. 4. Perform a traversal of tree to determine all code words. 5. Scan text again and create new file using the Huffman codes. CS 102 Mike Scott

Building a Tree Scan the original text An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Consider the following short text: Eerie eyes seen near lake. Count up the occurrences of all characters in the text CS 102 Mike Scott

Building a Tree Scan the original text An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What characters are present? E e r i space y s n a r l k . CS 102 Mike Scott

Building a Tree Scan the original text An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1 CS 102 Mike Scott

Building a Tree Prioritize characters An Introduction to Huffman Coding March 21, 2000 Building a Tree Prioritize characters Create binary tree nodes with character and frequency of each character Place nodes in a priority queue The lower the occurrence, the higher the priority in the queue CS 102 Mike Scott

Building a Tree Prioritize characters An Introduction to Huffman Coding March 21, 2000 Building a Tree Prioritize characters Uses binary tree nodes public class HuffNode { public char myChar; public int myFrequency; public HuffNode myLeft, myRight; } priorityQueue myQueue; CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree The queue after inserting all nodes Null Pointers are not shown E 1 i y l k . r 2 s n a sp 4 e 8 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree While priority queue contains two or more nodes Create new node Dequeue node and make it left subtree Dequeue next node and make it right subtree Frequency of new node equals sum of frequency of left and right children Enqueue new node back into queue CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree E 1 i y l k . r 2 s n a sp 4 e 8 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 E 1 i 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 2 y 1 l 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree n 2 a 2 2 e 8 sp 4 2 4 2 k 1 . 1 E 1 i 1 r 2 s 2 y 1 l 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree e 8 2 4 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree e 8 2 4 4 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree e 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 y 1 l 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 4 6 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 What is happening to the characters with a low number of occurrences? CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 8 e 8 10 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree After enqueueing this node there is only one node left in priority queue. 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Eerie eyes seen near lake.  26 characters CS 102 Mike Scott

Encoding the File Traverse Tree for Codes An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Perform a traversal of the tree to obtain new code words Going left is a 0 going right is a 1 code word is only completed when a leaf node is reached E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 CS 102 Mike Scott

Encoding the File Traverse Tree for Codes An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Encoding the File Rescan text and encode file using new code words Eerie eyes seen near lake. Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 0000101100000110011100010101101101001111101011111100011001111110100100101 Why is there no need for a separator character? . CS 102 Mike Scott

Encoding the File Results An Introduction to Huffman Coding March 21, 2000 Encoding the File Results Have we made things any better? 73 bits to encode the text ASCII would take 8 * 26 = 208 bits 0000101100000110011100010101101101001111101011111100011001111110100100101 If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great. CS 102 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Decoding the File How does receiver know what the codes are? Tree constructed for each text file. Considers frequency for each file Big hit on compression, especially for smaller files Tree predetermined based on statistical analysis of text files or file types Data transmission is bit based versus byte based CS 102 Mike Scott

Decoding the File Once receiver has tree it scans incoming bit stream 0  go left 1  go right E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 10100011011110111101111110000110101 CS 102

Summary Huffman coding is a technique used to compress files for transmission Uses statistical coding more frequently used symbols have shorter code words Works well for text and fax transmissions An application that uses several data structures CS 102