Huffman Codes Information coding: –Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. –Why? These electric signals are either present or absent at any specific time. Suppose Voyager on-board camera is sensitive to four shades of gray: –White –Light gray –Dark gray –black Camera picture is digitized into (400*600) “dots”, then transmitted by radio to Earth, in a single stream of signals, to be reconstructed and printed.
Huffman Codes In designing a binary code, we want to decide how to encode the “color” of each dot in binary, so that: –1) No waste of signals (efficiency) –2) Recognizable (later) Example: encode –White – 0001 –Light gray – 0010 –Dark gray – 0100 –Black – 1000 WASTEFUL!! One picture would cost 4*24000 = almost signals 4 “digits” per symbol (dot) How many digits do you need? –1 not enough, only 2 values –2 ok 4 values –3 too much –…–…
Huffman Codes Fixed-length code of length 2 (2 yes/no questions suffice to identify the color) No problem on receiving end, every two digits define a dot. Try 2: –W – 00 –LG – 01 –DG – 10 –B – 11 Encoding mechanism: Decision tree 0 W LG DG B Start at root, follow till leaf is reached
Huffman Codes There are other shapes with four leaf nodes 0 W LG DGB Which one is better? Criterion is weighted average length Suppose we have these probabilities: W LG DG B
Huffman Codes VARIABLE – LENGTH CODE Weighted average for tree 1 =.40*2 +.30*2 +.18*2 +.12*2 = 2 Weighted average for tree 2 =.40*1 +.30*2 +.18*3 +.12*3 = 1.9 On average, tree 2 is better, costs only 1.9*24000 = 45600, less than half of first try.
Huffman Codes General problem: –Given n symbols, with their respective probabilities, which is the best tree? (code?) –To determine the fewest digits (yes/no questions necessary to identify the symbol) Construct the tree from the leaves to root: –1) label each leaf with its probabilities –2) Determine the two fatherless nodes with the smallest probabilities. In case of tie, choose arbitrarily. –3) Create a father for these two nodes; label father with the sum of the two probabilities. –4) Repeat 2) 3) until there is 1 fatherless node (the root).
In our case: By convention, left is 0, right is B DG LG W Using this method, the code obtained is minimum – redundancy, or Huffman code. So, we have: W LG DG B
a – 01 b – 11 c – 10 d – 001 e – 000 Sample Huffman code; minimize the average number of yes/no questions necessary to distinguish 1 of 5 symbols that occur with known probabilities e 0.15 d 0.21 c 0.25 b 0.28 a
Weighted Average Length = 2*( )+3*( ) = 2* *.26 = 2.26 The Huffman code is always a prefix code. A prefix code satisfies the prefix condition. A code satisfies the prefix condition if no code is a prefix of another code.
Not a Prefix code: a:0 b:1 c:00 d:01 If met with 00, it is ambiguous, can’t figure out if it is aa or c Not A Prefix code: a:0 b:01 c:10 Not ambiguous 1 A Prefix code: At any point, it’s possible to delimit the symbol Example.