Huffman Coding Yancy Vance Paredes
Outline Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application
Background Lossless compression where around 20% to 90% of savings in space Developed by David A. Huffman Published in 1952
Motivation /1 Let’s say we want to store the string: go go gophers (13 characters) How do we usually do it? – ASCII – 7 bits + 1 more bit 13 * 8 bits = 104 bits – Reduce it? 8 unique characters: g, o, p, h, e, r, s, space Instead of 8 bits, we can lower it to 3 bits 13 * 3 bits = 39 bits We saved 65 bits!
Motivation /2 What if we lessen the number of bits for frequent characters? CharacterFrequencyCodeTotal Bits g3006 o3016 p11003 h r e s Space21116
Motivation /3 The total number of bits used is lowered to 37 – Prefix Code Easy to encode and decode
Motivation /4 How do we decode? – 0 means go LEFT – 1 means go RIGHT How to decode the following?
How to Decode? CharacterCode g00 o01 p100 h1010 r1011 e1100 s1101 Space111
Huffman Algorithm A greedy algorithm Constructs an optimal prefix code – Huffman code HUFFMAN(C) n = |C| Q = C for i = 1 to n-1 allocate a new node z z.left = x = EXTRACT_MIN(Q) z.right = y = EXTRACT_MIN(Q) z.freq = x.freq + y.freq INSERT(Q,z) return EXTRACT_MIN(Q)
Sample Implementation See program demo
Running Time Analysis Assume that Q is implemented as a min heap (priority queue) Building the Q takes O(n) The for loop executes n-1 times – The heap operations contribute O(lg n) – Thus, the loop contributes O(n lg n) – Total running time is O(n lg n)!
Proof of Correctness /1 Show that the problem of determining an optimal prefix code exhibits the following properties: – Greedy choice – Optimal substructure
Proof of Correctness /2 To compute the cost of a tree:
Proof of Correctness /3 Greedy choice
Proof of Correctness /4 Optimal substructure
Application Commonly used as the back-end of some multimedia codecs – JPEG, MP3
Summary Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application
References Chapter 13: Greedy Algorithm