Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
Optimal Merging Of Runs
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
A Data Compression Algorithm: Huffman Compression
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Greedy Algorithms Huffman Coding
Data Compression Basics & Huffman Coding
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
Communication Technology in a Changing World Week 2.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 1 Greedy.
Greedy Algorithms Analysis of Algorithms.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Assignment 6: Huffman Code Generation
Greedy Technique.
ISNE101 – Introduction to Information Systems and Network Engineering
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Huffman Coding.
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Huffman Coding CSE 373 Data Structures.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Trees Addenda.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Huffman Codes Juan A. Rodriguez CS 326 5/13/2003

Presentation Content Introduction Encoding Huffman’s algorithm Huffman’s code Dynamic Huffman encoding Quiz

Introduction Suppose we have to encode a text that compromises of n characters. Huffman code is a coding scheme that yields a shorter bit string by applying the idea of assigning shorter codes words to more frequent characters and longer code words to less frequent characters. Same idea that was used in the mid-19 th century by Samuel Morse, where frequent letters such as e(.) and a(._) are assigned short sequence of dots and dashes while infrequent letters such as q (_ _. _) and z (_ _..) have longer ones.

Encoding Fixed Length encoding assigns to each character a bit string of the same length. That is what the standard seven-bit ASCII codes does

Encoding Variable length encoding assigns code words a different lengths to different characters Huffman codes is an example of variable length encoding

Encoding Prefix codes No code word is a prefix of another code word Simply scan a bit string until the first group of bits that is a code word for some character and repeat this operation until the bit string’s end is reached. Simplifies encoding and decoding.

Huffman’s Algorithm Suppose you have a 1000 character data file with the following properties:

Huffman’s Algorithm Step 1: Initialize n one-node trees and label them with the character of the alphabet I_UEJLROS D

Huffman’s Algorithm Step 2: Record the frequency of each character in its tree’s root to indicate the tree’s weight. D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S.04

Huffman’s Algorithm Step 3: Find 2 trees with the smallest weight and make them left and right sub-tree of a new tree and record the sum of their weights in the root of the new tree as it’s weight D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S.04

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S.04

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S.04.08

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm Step 3 continued D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm Step 3 cont D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm Step 4: We take the convention that going left down the binary tree means adding a 0, and going right down the binary tree means adding a 1.

Huffman’s Algorithm Step 4 cont D.1 I.2 _.01 U.13 E.18 J.1 L.06 R.03 O.15 S

Huffman’s Algorithm The algorithm is greedy, which means that it makes choices that are locally optimal and hopes it yields a globally optimal solution. Notice that this is a full binary tree: every non-leaf node has two children. This is true of all optimal codes.

Huffman’s Algorithm The operation that we need to perform repeatedly is the extraction of the two sub-trees with the smallest frequencies. This can be implemented using a priority queue.

Example of Huffman’s algorithm code implementation ML. The operation that we need to perform repeatedly is the extraction of the two sub-trees with the smallest frequencies. This can be implemented using a priority queue. Building the initial queue takes time O(n log n) since each enqueue operations takes O(log n). Then we perform n-1 merges, each of which takes O(log n). Thus this implementation of Huffman’s algorithm takes O(n long n). datatype HTree = Leaf of char * int | Branch of HTree * int * Htree fun huffmanTree(alpha : (char * int) list) : HTree = let val alphasize = length(alpha) fun freq(node:HTree):int = case node of Leaf(_,i) => i | Branch(_,i,_) => i val q = new_heap (fn (x,y) => Int.compare(freq x, freq y)) alphasize fun merge(i:int):HTree = if i = 0 then extract_min(q) else let val x = extract_min(q) val y = extract_min(q) in insert q (Branch(x, freq(x)+freq(y), y)); merge(i-1) end in app (fn (c:char,i:int):unit => insert q (Leaf(c,i))) alpha; merge(alphasize-1) end

Huffman’s Code The output of Huffman’s algorithms is Huffman’s code:

Huffman’s Code Encoding of LORI: Given the probabilities and codeword length the expected bits per character in the code is Had we used a fixed- length encoding for the same alphabet, we would use at least 4 bits per character

Huffman’s Code This code achieves the compression ratio, a standard measure of the compression algorithm’s effectiveness, of 35.5%. Huffman’s encoding of this text will use 35.5% less memory than it’s fixed length encoding. Extensive experiments with Huffman’s codes have shown the compression ratio for this scheme typically falls between 20% and 80%.

Dynamic Huffman Encoding Huffman’s encoding yields an optimal (minimal length) encoding providing the probabilities of characters occurrences are know in advance. Draw back: a preliminary scanning of a given text to count the frequencies of the character occurrences in it. We use the algorithm to compute the an optimal prefix tree, and we scan the text a SECOND time, writing the out the code words of each character of the text.

Dynamic Huffman Encoding Dynamic (Adaptive) Huffman coding builds a tree incrementally in such a way that the coding always is optimal for the sequence of characters already seen.

Huffman’s Code Quiz Given the following bit string, what does it decode into?