Download presentation
Published byFrank Merriott Modified over 9 years ago
1
Lecture 4 (week 2) Source Coding and Compression
Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, 1 1
2
Huffman Coding Proposed by Dr. David A. Huffman in 1952
“A Method for the Construction of Minimum Redundancy Codes” Applicable to many forms of data transmission Mostly used example: text files 2 2
3
Huffman Coding: What We Will Discuss
Huffman Coding Algorithm: The procedure to build Huffman codes Extended Huffman Codes (new) Adaptive Huffman Coding (new) Update Procedure Decoding Procedure 3 3
4
Shannon-Fano Coding The second code based on Shannon’s theory
It is a suboptimal code (it took a graduate student (Huffman) to fix it!) Algorithm: Start with empty codes Compute frequency statistics for all symbols Order the symbols in the set by frequency Split the set (almost half-half!) to minimize*difference Add ‘0’ to the codes in the first set and ‘1’ to the rest Recursively assign the rest of the code bits for the two subsets, until sets cannot be split. 4 4
5
Shannon-Fano Coding Example:
Assume a sequence: A={a,b,c,d,e,f} with the following occurrence weights, {9, 8, 6, 5, 4, 2}, respectively Apply Shannon-Fano Coding and discuss the sub-optimality 5 5
6
Shannon-Fano Coding 6 6
7
Shannon-Fano Coding 7 7
8
Shannon-Fano Coding 8 8
9
Shannon-Fano Coding 9 9
10
Shannon-Fano Coding 10 10
11
Shannon-Fano Coding 1 1 e f 11 4 2 11
12
Shannon-Fano Coding Shannon-Fano does not always produce optimal prefix codes; the ordering is performed only once at the beginning!! Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits Sometimes prefix-free codes are called, Huffman codes Symbol-by-symbol Huffman coding (the easiest one) is only optimal if the probabilities of these symbols are independent and are some power of a half, i.e. (½)n 12 12
13
Huffman Coding Proposed by Dr. David A. Huffman in 1952
“A Method for the Construction of Minimum Redundancy Codes” Applicable to many forms of data transmission Our example: text files In general, Huffman coding is a form of statistical coding as not all characters occur with the same frequency! 13 13
14
Huffman Coding: The Same Idea
Why Huffman coding (likewise all entropy coding): Code word lengths are no longer fixed like ASCII. Code word lengths vary and will be shorter for the more frequently used characters, i.e., overall shorter average code length! 14 14
15
Huffman Coding: The Algorithm
1. Scan the text to be compressed and compute the occurrence of all characters. 2. Sort or prioritize characters based on number of occurrences in text (from low-to-high). 3. Build Huffman code tree based on prioritized list. 4. Perform a traversal of tree to determine all code words. 5. Scan text again and create (encode) the characters in a new coded file using the Huffman codes. 15 15
16
E e r i space y s n a r l k . Huffman Coding: Building The Tree
Example Consider the following short text: Eerie eyes seen near lake. Count up the occurrences of all characters in the text E e r i e e y e s s e e n n e a r l a k e . E e r i space y s n a r l k . 16 16
17
Eerie eyes seen near lake.
Huffman Coding: Building The Tree Example Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E y k 1 e s r n 2 i a 2 space l 1 17 17
18
Huffman Coding: Building The Tree
1. Create binary tree nodes with character and frequency of each character 2. Place nodes in a priority queue “??” The lower the occurrence, the higher the priority in the queue 18 18
19
Huffman Coding: Building The Tree
Uses binary tree nodes (OOP-like View; second bonus assignment: Construct the Huffman tree as follows!!) public class HuffNode { public char myChar; public int myFrequency; public HuffNode, myLeft, myRight; } priorityQueue myQueue; 19 19
20
Huffman Coding: Building The Tree
The queue after inserting all nodes Null Pointers are not shown E 1 i y l k . r 2 s n a sp 4 e 8 20 20
21
Huffman Coding: Building The Tree
While priority queue contains two or more nodes Create new node Dequeue node and make it left sub-tree Dequeue next node and make it right sub-tree Frequency of new node equals sum of frequency of left and right children Enqueue new node back into queue in the right order!! 21 21
22
Huffman Coding: Building The Tree
1 i y l k . r 2 s n a sp 4 e 8 22 22
23
Huffman Coding: Building The Tree
y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 E 1 i 1 23 23
24
Huffman Coding: Building The Tree
k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 2 E 1 i 1 2 2 y 1 l 1 24 24
25
Huffman Coding: Building The Tree
2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 CS 307 25
26
Huffman Coding: Building The Tree
2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1 CS 307 26
27
Huffman Coding: Building The Tree
2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 CS 307 27
28
Huffman Coding: Building The Tree
2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 CS 307 28
29
Huffman Coding: Building The Tree
2 a 2 2 e 8 sp 4 2 4 2 k 1 . 1 r 2 s 2 E 1 i 1 y 1 l 1 CS 307 29
30
Huffman Coding: Building The Tree
8 2 4 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2 CS 307 30
31
Huffman Coding: Building The Tree
8 2 4 4 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1 CS 307 31
32
Huffman Coding: Building The Tree
8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 y 1 l 1 CS 307 32
33
Huffman Coding: Building The Tree
4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 CS 307 33
34
Huffman Coding: Building The Tree
4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 CS 307 34
35
Huffman Coding: Building The Tree
4 6 4 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 What is happening to the characters with a low number of occurrences? CS 307 35
36
Huffman Coding: Building The Tree
4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 CS 307 36
37
Huffman Coding: Building The Tree
4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 CS 307 37
38
Huffman Coding: Building The Tree
8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 CS 307 38
39
Huffman Coding: Building The Tree
8 10 e 8 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 CS 307 39
40
Huffman Coding: Building The Tree
10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 CS 307 40
41
Huffman Coding: Building The Tree
10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 CS 307 41
42
Huffman Coding: Building The Tree
26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 CS 307 42
43
Huffman Coding: Building The Tree
26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 After enqueueing this node there is only one node left in priority queue. CS 307 43
44
Huffman Coding: Building The Tree
Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Eerie eyes seen near lake. ----> 26 characters CS 307 44
45
Encoding the File Traverse Tree
Perform a traversal of the tree to obtain new code words Going left is a 0 going right is a 1 code word is only completed when a leaf node is reached E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 CS 307 45
46
Huffman Coding (contd.)
Char Code E i y l k space e 10 r s n a E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 1 1 1 1 1 1 1 1 1 1 1 46
47
Huffman Coding: Example (2)
Init: Create a priority set out of each letter 47
48
Huffman Coding: Example (2)
1. Sort the sets according to probability (lowest first) 48
49
Huffman Coding: Example (2)
2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 49
50
Huffman Coding: Example (2)
3. Insert prefix ‘0’ into the codes of the second set letters 50
51
Huffman Coding: Example (2)
4. Merge the top two sets 51
52
Huffman Coding: Example (2)
1. Sort sets according to probability (lowest first) 52
53
Huffman Coding: Example (2)
2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 53
54
Huffman Coding: Example (2)
3. Insert prefix ‘0’ into the codes of the second set letters 54
55
Huffman Coding: Example (2)
4. Merge the top two sets 55
56
Huffman Coding: Example (2)
1. Sort sets according to probability (lowest first) 56
57
Huffman Coding: Example (2)
2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 57
58
Huffman Coding: Example (2)
3. Insert prefix ‘0’ into the codes of the second set letters 58
59
Huffman Coding: Example (2)
4. Merge the top two sets 59
60
Huffman Coding: Example (2)
1. Sort sets according to probability (lowest first) 60
61
Huffman Coding: Example (2)
2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 61
62
Huffman Coding: Example (2)
3. Insert prefix ‘0’ into the codes of the second set letters 62
63
Huffman Coding: Example (2)
4. Merge the top two sets The END 63
64
Example Summary Average code length
l = 0.4x x x x x4 = 2.2 bits/symbol Entropy H = Σs=a..eP(s) log2P(s) = bits/symbol Redundancy l – H = bits/symbol 64
65
Example: Tree 1 Average code length
l = 0.4x x x x x4 = 2.2 bits/symbol 65
66
Huffman Coding: Example (3)
Symbols: {a,b,c,d,e,} Weights: {0.2, 0.4, 0.2,0.1, 0.1} Required: Maximum Variance Tree! b, 0.4 c, 0.2 a, 0.2 e, 0.1 d, 0.1 1 b, 0.4 b, 0.4 deac, 0.6 deacb, 1.0 1 c, 0.2 b, 0.4 dea, 0.4 1 a, 0.2 c, 0.2 1 de, 0.2
67
Example: Tree 1 Average code length
l = 0.4x x x x x4 = 2.2 bits/symbol 67
68
Huffman Coding: Example (3)
Symbols: {a,b,c,d,e,} Weights: {0.2, 0.4, 0.2,0.1, 0.1} Required: Minimum Variance Tree! b, 0.4 b, 0.4 ac, 0.4 deb, 0.6 deacb, 1.0 1 c, 0.2 de, 0.2 ac, 0.4 b, 0.4 1 a, 0.2 c, 0.2 de, 0.2 1 e, 0.1 a, 0.2 1 d, 0.1
69
Example: Minimum Variance Tree
11 00 10 011 a c 010 b e d Average code length l = 0.4x2 + ( )x3+ ( )*2 = 2.2 bits/symbol 69
70
Example: Yet Another Tree
Average code length l = 0.4x1 + ( )x3 = 2.2 bits/symbol 70
71
Min Variance Huffman Trees
Huffman codes are not unique All versions yield the same average length Which one should we choose? The one with the minimum variance in codeword lengths i.e., with the minimum height tree Why? It will ensure the least amount of variability in the encoded stream 71
72
Another Example! Consider the source:
A = {a, b, c}, P(a) = 0.8, P(b) = 0.02, P(c) = 0.18 H = bits/symbol Huffman code: a → 0 b → 11 c → 10 l = 1.2 bits/symbol Redundancy = bits/symbol (on average)(47%!) Q: Could we do better? 72
73
Extended Huffman Codes
Example 1: Consider encoding sequences of two letters l = (0.64x x x x x x x x x8)/2 = /2 bits/symbol 73 Redundancy = bits/symbol 73
74
Extended Huffman Codes (Remarks)
The idea can be extended further Consider all possible nm sequences (we did 32) In theory, by considering more sequences we can improve the coding !! (is it applicable ? ) In reality, the exponential growth of the alphabet makes this impractical E.g., for length 3 ASCII seq.: 2563= 224= 16M Practical consideration: most sequences would have zero frequency → Other methods are needed (Adaptive Huffman Coding) 74
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.