Download presentation
Presentation is loading. Please wait.
Published byLenard Dorsey Modified over 8 years ago
1
Huffman Coding (2 nd Method)
2
Huffman coding (2 nd Method) The Huffman code is a source code. Here word length of the code word approaches the fundamental limit set by the entropy of discrete memory less source. This code is “optimum” as it provides the smallest average code word length for a given discrete memory less source. The Huffman coding is are as follow : The source symbols (message) are listed in the order of decreasing probability. The two source symbols of lowest probability are assigned a 0 and 1. (this part of the step is referred to as splitting stage)
3
These two source symbols (message) are regarded as being “combined” into a new source symbol (message) with probability equal to the sum of the two original probability. The probability of the new symbol is placed in the list in accordance with its value. This procedure is repeated until we are left with only two source symbol (message) for which a 0 and 1 are assigned. The code of each original source symbol is found by working backward and tracing the sequence of 0 and 1 assigned to that symbol as well as its successors.
4
The huffman code can be shown in the form of an algorithm as follow : 1.List source symbols (message) in the order of decreasing probability. 2.The two source symbols of lowest probability are assigned number 0 and 1. 3.These two source symbols are combined into a new message. 4.The probability of this new message is equal to the sum of probability of the two original symbols. 5.The probability of this new message is placed in the list according to its value. 6.Repeat this procedure until we are lift with only two source symbols, symbols for which a 0 and 1 are assigned.
5
Example: Construct Huffman code from the following given values & find average code length, Entropy, efficiency & redundancy. Messagem1m1 m2m2 m3m3 m4m4 m5m5 probability0.40.2 0.1
6
Step 1: Arrange the message in the order of decreasing probability. Step 2: Assign number 0 and 1 to the two message having lowest probability. Step 3: Combine these two messages into a new message and place this probability in the probability list as per its value. Step 4 and 5 : Repeat this procedure. Step 6 : Write the code word for each message by tracking back from the last stage to the first stage
7
Step 1: Arrange the message in the order of decreasing probability. Step 2: The two message having lowest probability are assigned 0 and 1. the two message with lowest probabilities are m 4 and m 5 as shown in figure. Message Probabilities m 1 0.4 m 2 0.2 m 3 0.2 m 4 0.1 0 m 5 0.1 1 New message with probability of (0.1 + 0.1) = 0.2
8
Step 3: Now consider that these two message m 4 and m 5 as being combined into a new message and place the probability of the new combined message, in the list according to its value. Place the combined message as high as possible when its probability is equal to that of the other message. This is shown in figure. Message Stage I Stage II m 1 0.4 0.4 m 2 0.2 0.2 m 3 0.2 0.2 0 m 4 0.1 0 0.2 1 m 5 0.1 1 New message with probability of (0.2 + 0.2) = 0.4 New message with the probability of 0.2
9
Step 4: Now consider the two message of lowest probability in stage II of the figure assign 0 and 1 to these two massage. Consider that these two message are combined to from a new message with probability of (0.2 + 0.2) = 0.4. Place the probability of the combined message according to its value in stage III. Place it as high as possible if the other message have the same probability. This is shown in figure Step 5: Follow the same procedure till only two messages remain and assign the 0 and 1 for them. All this is as shown.
10
Message Stage I Stage II Stage III Stage IV m 1 0.4 0.4 0.4 0.6 m 2 0.2 0.2 0.4 0 0.4 1 m 3 0.2 0.2 0 0.2 1 m 4 0.1 0 0.2 1 m 5 0.1 1 Read the encircled bits to get code for m 5 as 011 1 1 0
11
Step 6 : How to write the code for a message? Consider the green path shown in figure. To write the code for message m 5 this path is to be used. Start from stage IV and track upto stage I along the dotted path. And write down the code word in terms of 0 s and 1 s starting from stage IV. The code word for message m 5 is 011. Similarly write code words for the other messages shown in table
12
To find the average code word length : The average code word length is given as, L = = (0.4 x 2) + (0.2 x 2) + (0.2 x 2) + (0.1 x 3) + (0.1 x 3) = 2.2 Messagem1m1 m2m2 m3m3 m4m4 m5m5 Probabilities0.40.2 0.1 Code word001011010011
13
Find the entropy of the source : The entropy of the source is given as, H = = 0.4 log 2 (1/0.4) + 0.2 log 2 (1/0.2) + 0.2 log 2 (1/0.2) + 0.1 log 2 (1/0.1) + 0.1 log 2 (1/0.1) 0.52877 + 0.46439 + 0.46439 + 0.33219 + 0.33219 H = 2.12193 ŋ = x100 % Ŋ =2.12/2.2 =.9636 = 96.36%
14
Example: Consider the same memoryless source as in above Ex. All the data is same. Find the Huffman code by moving the probability of the combined message as low as possible. Tracking backwards through the various steps find the code word of the second Huffman code. All the steps to be followed are same as those followed for the Huffman’s first code explained in above Ex. Except for the change that the combined message is to be placed as low as possible. This is shown in figure.
15
Message Stage I Stage II Stage III Stage IV m 1 0.4 0.4 0.4 0.6 m 2 0.2 0.2 0.4 0.4 1 m 3 0.2 0.2 0 0.2 1 m 4 0.1 0 0.2 m 5 0.1 1 Read the encircled bits to get code for m 5 as 0011 1 1 0 0
16
To find code word: This procedure is same as that followed in the previous example. Follow the green path in the figure to obatin the code for the message m 5 as, Code word for the message can be obtained. They are as listed below MessageProbabilitiesCode word m1m1 0.41 m2m2 0.201 m3m3 0.2000 m4m4 0.10010 m5m5 0.10011
17
Note that to transmit the same message as those of the previous example now we need more number of bits per message.
18
Example : Construct Huffman code from the following given values & find average code length, Entropy, efficiency & redundancy. Symbols0s0 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 Probability0.25 0.125 0.0625
19
Solu. The huffman code for the source alphabets is as shown in figure. Symbol Stage I Stage II Stage III Stage IV Stage V Stage VI S 0 0.25 0.25 0.25 0.25 0.5 0.5 0 S 1 0.25 0.25 0.25 0.25 0.25 0 0.5 S 2 0.125 0.125 0.25 0.25 0.25 1 S 3 0.125 0.125 0.125 0 0.25 1 S 4 0.125 0.125 0 0.125 1 S 5 0.0625 0 0.125 1 S 6 0.0625 1 1 0 The encircled bits on the dotted path correspond to the code for symbol S 0 i.e. S 0 = 10
20
Follow the path indicated by the dotted line to obtain the codeword for symbol S 0 as 10. Similarly we can obtain the code words for the remaining symbols. These are as listed in table. SymbolProbabilityCodewordCodeword length S0S0 0.25102 bit S0S0 0.25112 bit S0S0 0.1250013 bit S0S0 0.1250103 bit S0S0 0.1250113 bit S0S0 0.062500004 bit S0S0 0.062500014 bit
21
To compute the efficiency : The average code length L = From table L = (0.25 x 2) + (0.25 x 2) + (0.125 x 3)x3 + (0.0625 x 4)x2 L = 2.625 bits/symbol
22
The average information per message H = H = [0.25 log 2 (4)]x2 + [0.125 log 2 (8)]x3 + [0.0625 log 2 (16)]x2 H = [0.25x2x2] + [0.125x3x3] + [0.0625x4x2] H = 2.625 bits/message
23
Code efficiency ŋ = x 100 = x 100 ŋ = 100%
24
Difference Between Huffman & Shannon Code The point is whether another method would provide a better code efficiency. According to information theory a perfect code should offer an average code length of 2.176 bit or 134,882 bit in total. For comparison purposes the former example will be endcoded by the Huffman algorithm:
25
Problem : Construct Shannon Fano code & Huffman code from following given values. Compare Average code length, entropy, code efficiency & redundancy
26
The Shannon-Fano code does not offer the best code efficiency for the exemplary data structure. This is not necessarily the case for any frequency distribution. But, the Shannon-Fano coding provides a similar result compared with Huffman coding at the best. It will never exceed Huffman coding. The optimum of 134,882 bit will not be matched by both.
27
Another Comparative Look at Huffman vs Shannon Fano Code
28
Entropy Average information per message is termed as entropy. It is denoted by H. We can determine the average information (H) contained in a message generated by a source by multiplying the information of each message by its probability of occurrence & taking summation over the entire alphabet set If P i is the probability of occurrence of symbol a i, information content of the message a i is counted as 1/log P i The Entropy H for n code is calculated as
29
EXAMPLE : A source generates messages from alphabet set (a, b, c, d, e, f, g, h). Calculate entropy of the source for the probability of occurrence of the symbols indicated within the brackets. a (0.48), b (0.08), c (0.12), d (0.02), e (0.12), f (0.04), g (0.06), h (0.08) Solution : Calculation of the entropy (H) of the source H = 0.48 log (1/0.48) + 0.08 log (1/0.08) + 0.12 log (1/0.12) + 0.02 log (1/0.02) + 0.12 log (1/0.12) + 0.04 log (1/0.04) + 0.06 log (1/0.06) + 0.08 log (1/0.08) = 2.367
30
Redundancy If we wish to encode the alphabet set of the above example, one obvious way is to assign a fixed length 3-bit code to each of its eight symbols. But we will not be utilizing full information carrying capability of the code since entropy of the source is 2.367. We can reduce the average number of bits required to encode the alphabet by using variable length code instead of fixed 3-bit code. Some of the symbols can be assigned fewer than three bits so that average code length is reduced. Average code length L is the expected value as given below. n L = P i L i i=1 So, Redundancy is calculated as = Average Code Length – Entropy of code = L - H
31
SymbolProbabilityCode a0.481 b0.080000 c0.12001 d0.0201111 e0.12010 f0.0401110 g0.060110 h0.080001 EXAMPLE A source generates messages from alphabet set (a, b, c, d, e, f, g, h) with probabilities as indicated below. Calculate average code length and redundancy.
32
Solution The entropy H of the source is : H = 0.48 log (1/0.48) + 0.08 log (1/0.08) + 0.12 log (1/0.12) + 0.02 log (1/0.02) + 0.12 log (1/0.12) + 0.04 log (1/0.04) + 0.06 log (1/0.06) + 0.08 log (1/0.08) = 2.367 The average code length (L) is L = 0.48 + 0.08 x 4 + 0.12 x 3 + 0.02 x 5 + 0.12 x 3 + 0.04 x 5 + 0.06 x 4 + 0.08 x 4 = 2.38 The Redundancy R is L – H = 2.38 – 2.367 =.013 or 1.3%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.