Download presentation
Presentation is loading. Please wait.
1
HUFFMAN CODES
2
Greedy Algorithm Design
Steps of Greedy Algorithm Design: Formulate the optimization problem. Show that the greedy choice can lead to an optimal solution, so that the greedy choice is always safe. Demonstrate that an optimal solution to original problem = greedy choice + an optimal solution to the subproblem Optimal Substructure Property Greedy-Choice Property A good clue that that a greedy strategy will solve the problem.
3
Encoding and Compression of Data
Definition Reduce size of data (number of bits needed to represent data) Benefits Reduce storage needed Reduce transmission cost / latency / bandwidth Applicable to many forms of data transmission Our example: text files
4
Huffman Codes Proposed by Dr. David A. Huffman in 1952
For compressing data (sequence of characters) Widely used Very efficient (saving 20-90%) Use a table to keep frequencies of occurrence of characters. Output binary string. Proposed by Dr. David A. Huffman in 1952 “A Method for the Construction of Minimum Redundancy Codes”
5
Huffman Coding Huffman codes can be used to compress information
Like WinZip – although WinZip doesn’t use the Huffman algorithm JPEGs do use Huffman as part of their compression process Not all characters occur with the same frequency! The basic idea is that instead of storing each character in a file as an 8-bit ASCII value, we will instead store the more frequently occurring characters using fewer bits and less frequently occurring characters using more bits On average this should decrease the file size (usually ½)
6
Total space usage in bits:
Computer Data Encoding: How do we represent data in binary? Historical Solution: Fixed length codes. Encode every symbol by a unique binary string of a fixed length. Examples: ASCII (8 bit code),
7
Total space usage in bits-Fixed length codes.
Assume an ℓ bit fixed length code. For a file of n characters Need nℓ bits. Example: suppose alphabet of 3 symbols: { A, B, C }. suppose in file: 1,000,000 characters. Need 2 bits for a fixed length code for a total of 2,000,000 bits.
8
Variable Length codes Idea: In order to save space, use less bits for frequent characters and more bits for rare characters.
9
3 bits to represent 6 characters Saving approximately 25%
Huffman Codes 3 bits to represent 6 characters Frequent characters short codewords and infrequent characters long codewords eg. “abc” = “ ” eg. “abc” = “ ” Example: Frequency Fixed-length Variable-length codeword codeword ‘a’ ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ A file of 100,000 characters. Containing only ‘a’ to ‘e’ Encode file in 1* * * * * *5000 = 224,000 bits Encode file in 100,000 *3 = 300,000 bits Saving approximately 25%
10
Prefix Codes In variable length code we ca use an idea called prefix code, in which no codeword is also a prefix of some other codeword. Prefix code can always achieve the optimal data compression among any character code. Prefix codes are desirable because they simplify decoding.
11
How do we decode? In the fixed length, we know where every character starts, since they all have the same number of bits. Example: A = 000 B = 101 C = 100 A A A B B C
12
How do we decode? In the variable length code, we use an idea called Prefix code, where no code is a prefix of another. Example: A = 0 B = 10 C = 11 None of the above codes is a prefix of another.
13
How do we decode? Example: A = 0 B = 10 C = 11 So, for the string:
A A A B B C the encoding:
14
Prefix Code Example: A = 0 B = 10 C = 11 Decode the string 0 0 0101011
A A A B B C
15
Idea Consider a binary tree, with: 0 meaning a left turn
1 meaning a right turn. Example :Consider the paths from the root to each of the leaves A, B, C, D: A : 0 B : 10 C : 110 D : 111 1 A 1 B 1 C D
16
Observe: This is a prefix code, since each of the leaves has a path ending in it, without continuation. If the tree is full then we are not “wasting” bits. If we make sure that the more frequent symbols are closer to the root then they will have a smaller code. 1 A 1 B 1 C D
17
A full binary tree every nonleaf node has 2 children
Huffman Codes A file of 100,000 characters. The coding schemes can be represented by trees: Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the frequencies of the leaves in its subtree. Frequency Fixed-length (in thousands) codeword ‘a’ ‘b’ ‘c’ ‘d’ ‘e’ 9 100 ‘f’ 5 101 Frequency Variable-length (in thousands) codeword ‘a’ 45 0 ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ 86 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 1 a:45 b:13 c:12 d:16 e:9 f:5 100 86 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 1 a:45 b:13 c:12 d:16 e:9 f:5 Not a full binary tree 100 55 1 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13 A full binary tree every nonleaf node has 2 children
18
Huffman Codes - Greedy Algorithm
To find an optimal code for a file: 1. The coding must be unambiguous. Consider codes in which no codeword is also a prefix of other codeword. => Prefix Codes Prefix Codes are unambiguous. Once the codewords are decided, it is easy to compress (encode) and decompress (decode). 2. File size must be smallest. => Can be represented by a full binary tree. => Usually less frequent characters are at bottom Let C be the alphabet (eg. C={‘a’,’b’,’c’,’d’,’e’,’f’}) For each character c, no. of bits to encode all c’s occurrences = freqc*depthc File size B(T) = cCfreqc*depthc (no of bits to encode a file) Frequency Codeword ‘a’ ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ 100 55 1 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13 Cost of Tree Eg. “abc” is coded as” “ ”
19
Huffman Codes - Greedy Algorithm
1. Consider all pairs: <frequency, symbol>. 2. Choose the two lowest frequencies, and make them brothers, with the root having the combined frequency. 3. Iterate.
20
Huffman Codes How do we find the optimal prefix code?
Huffman code (1952) was invented to solve it. A Greedy Approach. Q: A min-priority queue f:5 e:9 c:12 b:13 d:16 a:45 c:12 b:13 d:16 a:45 14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 d:16 a:45 14 25 c:12 b:13 30 f:5 e:9 a:45 d:16 14 25 c:12 b:13 30 55 f:5 e:9 100 55 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13
21
Huffman Codes Q: A min-priority queue …. HUFFMAN(C) 1 Build Q from C
14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 …. HUFFMAN(C) 1 Build Q from C 2 For i = 1 to |C|-1 3 Allocate a new node z 4 z.left = x = EXTRACT_MIN(Q) 5 z.right = y = EXTRACT_MIN(Q) 6 z.freq = x.freq + y.freq 7 Insert z into Q in correct position. 8 Return EXTRACT_MIN(Q) If Q is implemented as a binary min-heap, “Build Q from C” is O(n) “for loop is O(n) “EXTRACT_MIN(Q)” is O(lg n) “Insert z into Q” is O(lg n) Huffman(C) is O(n lg n)
22
Example
23
Building a Tree Scan the original text
An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Consider the following short text: Eerie eyes seen near lake. Count up the occurrences of all characters in the text Mike Scott
24
Building a Tree Scan the original text
An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What characters are present? E e r i space y s n a r l k . Mike Scott
25
Building a Tree Scan the original text
An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E y k 1 e s r n 2 i a 2 space 4 l 1 Mike Scott
26
Building a Tree Prioritize characters
An Introduction to Huffman Coding March 21, 2000 Building a Tree Prioritize characters Create binary tree nodes with character and frequency of each character Place nodes in a priority queue The lower the occurrence, the higher the priority in the queue Mike Scott
27
An Introduction to Huffman Coding
March 21, 2000 Building a Tree The queue after inserting all nodes E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott
28
An Introduction to Huffman Coding
March 21, 2000 Building a Tree E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott
29
An Introduction to Huffman Coding
March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 E 1 i 1 Mike Scott
30
An Introduction to Huffman Coding
March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 Mike Scott
31
An Introduction to Huffman Coding
March 21, 2000 Building a Tree k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 2 y 1 l 1 Mike Scott
32
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 Mike Scott
33
An Introduction to Huffman Coding
March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1 Mike Scott
34
An Introduction to Huffman Coding
March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott
35
An Introduction to Huffman Coding
March 21, 2000 Building a Tree n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 Mike Scott
36
An Introduction to Huffman Coding
March 21, 2000 Building a Tree n 2 a 2 2 e 8 sp 4 2 4 2 k 1 . 1 E 1 i 1 r 2 s 2 y 1 l 1 Mike Scott
37
An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 2 4 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2 Mike Scott
38
An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 2 4 4 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1 Mike Scott
39
An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 y 1 l 1 Mike Scott
40
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott
41
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 Mike Scott
42
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 6 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 What is happening to the characters with a low number of occurrences? Mike Scott
43
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 Mike Scott
44
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 Mike Scott
45
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott
46
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 8 e 8 10 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott
47
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 Mike Scott
48
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott
49
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott
50
An Introduction to Huffman Coding
March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott
51
An Introduction to Huffman Coding
March 21, 2000 Building a Tree This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Eerie eyes seen near lake. 26 characters Mike Scott
52
Encoding the File Traverse Tree for Codes
An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Perform a traversal of the tree to obtain new code words. Going left is a 0 going right is a 1 Code word is only completed when a leaf node is reached. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Mike Scott
53
Encoding the File Traverse Tree for Codes
An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Char Code E i y l k space 011 e 10 r s n a E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Mike Scott
54
An Introduction to Huffman Coding
March 21, 2000 Encoding the File Char Code E i y l k space 011 e 10 r s n a Rescan text and encode file using new code words Eerie eyes seen near lake. Why is there no need for a separator character? Mike Scott
55
Encoding the File Results
An Introduction to Huffman Coding March 21, 2000 Encoding the File Results Have we made things any better? 73 bits to encode the text ASCII would take 8 * 26 = 208 bits Mike Scott
56
An Introduction to Huffman Coding
March 21, 2000 Decoding the File How does receiver know what the codes are? Tree constructed for each text file. Considers frequency for each file Data transmission is bit based versus byte based. Mike Scott
57
Decoding the File Once receiver has tree it scans incoming bit stream 0 go left 1 go right E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 A. elk nay sir B. eel snake C. eel kin sly D. eek snarl nil E. eel snarl. e e l space s n a r l
58
Summary Huffman coding is a technique used to compress files for transmission Uses statistical coding more frequently used symbols have shorter code words Works well for text and fax transmissions An application that uses several data structures
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.