Math 221 Huffman Codes
Suppose you have a file… Letter Code Frequency Total Bits a 000 10 30 e 001 15 45 i 010 12 36 s 011 3 9 t 100 4 space 101 13 39 newline 110 1 Total 174
Represent the Code with a Tree 1 1 1 1 1 1 a e i s t sp nl
Some Terminology A tree is a collection of nodes where any path that ends at the same node it started from must intersect itself, i.e. it has no “closed circuits”. A node with no edges coming out of it is a leaf. A node connected to an above node is a child of the above node. We will only consider binary trees, i.e. each node will have at most two children. Convention: 0 means to to the left, 1 to the right.
Important If a code is represented by the leaves of a binary tree, then a binary string can be uniquely decoded!
Improving the Code Notice that the newline does not have a sibling. Thus, we can place it in its parent node and get a shorter code! This shortens the number of bits needed to represent a newline, from three to two.
Huffman’s Algorithm Every node is given its frequency as a weight. Join the two nodes with lowest weight. Now we have a tree. In this algorithm the weight of a tree is the sum of the weights of its leaves. Now, at the nth stage, join the two trees with the lowest weight.
Our example We start with which becomes a e i s t sp nl a e i t sp s 10 e 15 i 12 s 3 t 4 sp 13 nl 1 which becomes 4 T1 a 10 e 15 i 12 t 4 sp 13 s 3 nl 1
which becomes 8 T2 4 t 4 T1 a 10 e 15 i 12 sp 13 s 3 nl 1
which becomes 18 T3 8 a 10 T2 4 t 4 T1 e 15 i 12 sp 13 s 3 nl 1
which becomes 18 T3 8 a 10 T4 25 T2 4 t 4 e 15 i 12 sp 13 T1 s 3 nl 1
which becomes e a t i sp s nl 33 T5 18 15 T3 8 10 T4 25 T2 4 4 12 13
which becomes e i sp a t And we are done! s nl 58 T6 T4 25 33 T5 18 15 12 sp 13 T3 8 a 10 T2 4 t 4 And we are done! T1 s 3 nl 1
Our New Code a 001 10 30 e 01 15 i 12 24 s 00000 3 t 0001 4 16 space Letter Code Frequency Total Bits a 001 10 30 e 01 15 i 12 24 s 00000 3 t 0001 4 16 space 11 13 26 newline 00001 1 5 Total 146