Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.

Slides:



Advertisements
Similar presentations
Compression techniques. Why we need compression. Types of compression –Lossy and lossless Concentrate on lossless techniques. Run Length coding. Entropy.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Lossless Data Compression Using run-length and Huffman Compression pages
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Chapter 3 Data Representation. 2 Compressing Files.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Compression & Huffman Codes
Assignment 6: Huffman Code Generation
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Data Structure and Algorithms
Huffman Encoding.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Lecture 12 Huffman Algorithm

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. Huffman coding is a form of statistical coding Not all characters occur with the same frequency! Yet all characters are allocated the same amount of space

Huffman Encoding Huffman coding, however, makes coding more efficient. In this mechanism we assign shorter codes to characters that occur more frequently and longer codes that occur less frequently. For example, E and T, the two characters that occur most frequently in the English language, are assigned one bit each. C,G,K,R,S,U and W are the next most frequently and are assigned three bits each, and so on.

In a given piece of text, only some of the characters will require the maximum bit length. The overall length of the transmission, therefore, is shorter than that resulting from fixed-length encoding. Difficulty arises, however, if the bit patterns associated with each character are assigned randomly. Consider the example given figure.

Bit assignments based on frequency of the character. E : 0 T : 1 A: 00 I : 01 M : 10 C : 000 D : 001 G : 010 O : 100 R : 101 S : 110 U : 111

Multiple interpretations of transmitted data code sent First interpretation Second interpretation Third interpretation E I G O T U E A M R E I S D G O U M

Huffman coding is designed to counter this ambiguity while retaining the bit count advantages of a compression code. Not only does it vary the length of the code based on the frequency of the character represented, but each character code is chosen in such a way that prefix of another code. For example, no three-bit code has the same pattern as the first three bits of a four or five- bit code.

Character Tree Using the character set from the example above, let’s examine how a Huffman code is built. 1.First we organize the entire character set into a row, ordered according to frequency from highest to lowest (or vice Versa). Each character is now a node at the leaf level of a tree. E = 15 T = 12 A = 10 I = 08 M = 07 N = 06 C = 05 D = 05 G = 04 K = 04 O = 03 R = 03 S = 02 U = 02

2. Next, we find the two nodes with the smallest combined frequency weighting and join them to form a third node, resulting in a simple two-level tree, the weight of the new node is the combined weight of the original two nodes. This node, one level up from the leaves, is eligible to be combined with other nodes. Remember the same weight of the two nodes chosen must be smaller than the combination of any other possible choices.

3. We repeat step 2 until all of the nodes, on every level, are combined into a single tree.

Assigning the codes Once the tree is complete, we use it to assign codes to each character. First we assign a bit value to each branch (see figure F.7). Starting from the root (top node). We assign 0 to the branch and 1 to the right branch and repeat this pattern at each node. Which branch becomes 0 and which becomes 1 is left to the designer as long as the assignments are consistent throughout the tree.

Code assignment

Code assignment table. CharacterFrequencyCodeCharacterFrequencycode E15100D50101 T12111G40110 A10000K40111 I8001O M71010R N61011S c50100U211011

DECODING A message encoded in this fashion can be interpreted without ambiguity, using the following process : 1.The receiver stores the first three bits received in memory and attempts to match them with one of the three bit codes. If a match is found, that character is selected and the three bits are discarded. The receiver then repeats this step with the next three bits.

2. If a match is not found, the receiver reads the neat bit from the stream and adds it to the first three. It then attempts to find a match among the four-bit codes. If a match is found, the corresponding character is selected and the bits are discarded. 3. If a match is not found, the receiver reads the neat bit from the stream and tries to match all the five bits to one of the five bit codes. If a match is found, the character is selected and the bits are discarded.

Unambiguous transmission sent Interpreted S N T I C