Download presentation
Presentation is loading. Please wait.
Published byPaulina Carr Modified over 9 years ago
1
ENTROPY & RUN LENGTH CODING
2
Contents What is Entropy coding? Huffman Encoding Huffman encoding Example Arithmetic coding Encoding Algorithms for arithmetic coding Decoding Algorithm for Arithmetic decoding Run Length Encoding Question –Answer References
3
What is Entropy Coding? Entropy coding is lossless compression scheme. One of the main types of entropy coding creates and assigns a unique prefix-free code to each unique symbol that occurs in the input. These entropy encoders then compress data by replacing each fixed-length input symbol with the corresponding variable-length prefix-free output code word.
4
Continue…… The length of each code word is approximately proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes. According to Shannon's source coding theorem, the optimal code length for a symbol is −logbP, where b is the number of symbols used to make output codes and P is the probability of the input symbol
5
Entropy Encoding Techniques- Huffman Coding Arithmetic coding
6
Huffman Encoding- For each encoding unit (letter, symbol or any character), associate with a frequency. You can choose percentage or probability for occurrence of the encoding unit. Create a binary tree whose children are the encoding units with the smallest frequencies/ probabilities. The frequency of the root is the sum of the frequencies/probabilities of the leaves Repeat this procedure until all the encoding units are covered in the binary tree.
7
Example, step I Assume that relative frequencies are: A: 40 B: 20 C: 10 D: 10 R: 20 (I chose simpler numbers than the real frequencies) Smallest number are 10 and 10 (C and D), so connect those
8
Example, step II C and D have already been used, and the new node above them (call it C+D) has value 20 The smallest values are B, C+D, and R, all of which have value 20 Connect any two of these
9
Example, step III The smallest values is R, while A and B+C+D all have value 40 Connect R to either of the others
10
Example, step IV Connect the final two nodes
11
Example, step V Assign 0 to left branches, 1 to right branches Each encoding is a path from the root A = 0 B = 100 C = 1010 D = 1011 R = 11 Each path terminates at a leaf Do you see why encoded strings are decodable?
12
Unique prefix property A = 0 B = 100 C = 1010 D = 1011 R = 11 No bit string is a prefix of any other bit string For example, if we added E=01, then A (0) would be a prefix of E Similarly, if we added F=10, then it would be a prefix of three other encodings (B=100, C=1010, and D=1011) The unique prefix property holds because, in a binary tree, a leaf is not on a path to any other node
13
Data compression- Huffman encoding is a simple example of data compression: representing data in fewer bits than it would otherwise need A more sophisticated method is GIF (Graphics Interchange Format) compression, for.gif files Another is JPEG (Joint Photographic Experts Group), for.jpg files Unlike the others, JPEG is lossy—it loses information Generally OK for photographs (if you don’t compress them too much), because decompression adds “fake” data very similiar to the original
14
Arithmetic Coding
15
A rithmetic Coding- Huffman coding has been proven the best in compare to fixed length coding method available. Yet, since Huffman codes have to be an integral number of bits long, while the entropy value of a symbol may (as a matter of fact, almost always so) be a fraction number, theoretical possible compressed message cannot be achieved. 15
16
A rithmetic Coding(Cont…) For example, if a statistical method assign 90% probability to a given character, the optimal code size would be 0.15 bits. The Huffman coding system would probably assign a 1-bit code to the symbol, which is six times longer than necessary. 16
17
A rithmetic Coding(Cont..) Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number. 17
18
18 Character probability Range ^(space) 1/10 A 1/10 B 1/10 E 1/10 G 1/10 I 1/10 L 2/10 S 1/10 T 1/10 BILL GATES” Suppose that we want to encode the message “BILL GATES”
19
Encoding Algorithm For A rithmetic Coding- Encoding algorithm for arithmetic coding : low = 0.0 ; high =1.0 ; while not EOF do range = high - low ; read(c) ; high = low + range high_range(c) ; low = low + range low_range(c) ; end do output(low); 19
20
Continue………………. To encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than 0.30. range = 1.0 – 0.0 = 1.0 high = 0.0 + 1.0 × 0.3 = 0.3 low = 0.0 + 1.0 × 0.2 = 0.2 After the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to 0.30. 20
21
Continue………….. The next character to be encoded, the letter I, owns the range 0.50 to 0.60 in the new sub range of 0.20 to 0.30. So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established. Thus, this number is further restricted to 0.25 to 0.26. 21
22
Continue………………………. Note that any number between 0.25 and 0.26 is a legal encoding number of ‘BI’. Thus, a number that is best suited for binary representation is selected. (Condition : the length of the encoded message is known or EOF is used.) 22
23
23 0.0 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.9 ( ) A B E G I L S T 0.2 0.3 ( ) A B E G I L S T 0.25 0.26 ( ) A B E G I L S T 0.256 0.258 ( ) A B E G I L S T 0.2572 0.2576 ( ) A B E G I L S T 0.2572 0.25724 ( ) A B E G I L S T 0.257216 0.25722 ( ) A B E G I L S T 0.2572164 0.2572168 ( ) A B E G I L S T 0.25721676 0.2572168 ( ) A B E G I L S T 0.257216772 0.257216776 ( ) A B E G I L S T 0.2572167752 0.2572167756
24
Continue…………….. CharacterLowHigh B0.20.3 I0.250.26 L0.2560.258 L0.25720.2576 ^(space)0.257200.25724 G0.2572160.257220 A0.25721640.2572168 T0.257216760.2572168 E0.2572167720.257216776 S0.25721677520.2572167756 24
25
Continue………………. So, the final value 0.2572167752 (or, any value between 0.2572167752 and 0.2572167756, if the length of the encoded message is known at the decode end), will uniquely encode the message ‘BILL GATES’. 25
26
A rithmetic Coding(Decoding) Decoding is the inverse process. Since 0.2572167752 falls between 0.2 and 0.3, the first character must be ‘B’. Removing the effect of ‘B’from 0.2572167752 by first subtracting the low value of B, 0.2, giving 0.0572167752. Then divided by the width of the range of ‘B’, 0.1. This gives a value of 0.572167752. 26
27
Decoding (Cont………..) Then calculate where that lands, which is in the range of the next letter, ‘I’. The process repeats until 0 or the known length of the message is reached. 27
28
A rithmetic Decoding Algorithm- Decoding algorithm : r = input_number repeat search c such that r falls in its range output(c) ; r = r - low_range(c); r = r ÷ (high_range(c) - low_range(c)); until EOF or the length of the message is reached 28
29
29 r cLow High range 0.2572167752 B 0.2 0.3 0.1 0.572167752 I 0.5 0.6 0.1 0.72167752 L 0.6 0.8 0.2 0.6083876 L 0.6 0.8 0.2 0.041938 ^(space) 0.0 0.1 0.1 0.41938 G 0.4 0.5 0.1 0.1938 A 0.2 0.3 0.1 0.938 T 0.9 1.0 0.1 0.38 E 0.3 0.4 0.1 0.8 S 0.8 0.9 0.1 0.0
30
A rithmetic Coding Summary In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol. The new range is proportional to the predefined probability attached to that symbol. Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted. 30
31
Continue………………….. Coding rate approaches high-order entropy theoretically. Not so popular as Huffman coding because ×, ÷ are needed. 31
32
Run Length Encoder/Decoder
33
What is RLE? Compression technique Represents data using value and run length Run length defined as number of consecutive equal values e.g 1110011111 1 3 0 2 1 5 RLE Val ues Run Length s
34
Advantage of RLE- Useful for compressing data that contains repeated values e.g. output from a filter, many consecutive values are 0. Very simple compared with other compression techniques Reversible (Lossless) compression decompression is just as easy
35
Applications- I Frame compression in Video- Run Length Encoder!
36
RLE Effectiveness- Compression effectiveness depends on input Must have consecutive runs of values in order to maximize compression Best case: all values same Can represent any length using two values Worst case: no repeating values Compressed data twice the length of original!! Should only be used in situations where we know for sure have repeating values
37
Encoder - Algorithm Start on the first element of input Examine next value If same as previous value Keep a counter of consecutive values Keep examining the next value until a different value or end of input then output the value followed by the counter. Repeat If not same as previous value Output the previous value followed by ‘1’ (run length. Repeat
38
Encoder – Matlab Code % Run Length Encoder % EE113D Project function encoded = RLE_encode(input) my_size = size(input); length = my_size(2); run_length = 1; encoded = []; for i=2:length if input(i) == input(i-1) run_length = run_length + 1; else encoded = [encoded input(i-1) run_length]; run_length = 1; end if length > 1 % Add last value and run length to output encoded = [encoded input(i) run_length]; else % Special case if input is of length 1 encoded = [input(1) 1]; end
39
Encoder – Matlab Results >> RLE_encode([1 0 0 0 0 2 2 2 1 1 3]) ans = 1 1 0 4 2 3 1 2 3 1 >> RLE_encode([0 0 0 0 0 0 0 0 0 0 0]) ans = 0 11 >> RLE_encode([0 1 2 3 4 5 6 7 8 9]) ans = 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1
40
Encoder Input from separate.asm file In the form of a vector e.g. ‘array.word 4,5,5,2,7,3,6,9,9,10,10,10,10,10,10,0,0’ Output is declared as data memory space Examine memory to get output Originally declared to be all -1. Immediate Problem Output size not known until run-time (depends on input size as well as input pattern) Cannot initialize variable size array
41
Encoder Solution Limit user input to preset length (16) Initialize output to worst case (double input length – 32) Initialize output to all -1’s (we’re only handling positive numbers and 0 as inputs) Output ends when -1 first appears or if length of output equals to worst case
42
Decoder – Matlab Code % Run Length Decoder % EE113D Project % The input to this function should be the output from Run Length Encoder, % which means it assumes even number of elements in the input. The first % element is a value followed by the run count. Thus all odd elements in % the input are assumed the values and even elements the run counts. % function decoded = RLE_decode(encoded) my_size = size(encoded); length = my_size(2); index = 1; decoded = []; % iterate through the input while (index <= length) % get value which is followed by the run count value = encoded(index); run_length = encoded(index + 1); for i=1:run_length % loop adding 'value' to output 'run_length' times decoded = [decoded value]; end % put index at next value element (odd element) index = index + 2; end
43
Decoder – Matlab Results >> RLE_decode([0 12]) ans = 0 0 0 0 0 0 0 0 0 0 0 0 >> RLE_decode([0 1 1 1 2 1 3 1 4 1 5 1]) ans = 0 1 2 3 4 5 >> RLE_decode(RLE_encode([0 0 3 1 4 4 5 6 10])) ans = 0 0 3 1 4 4 5 6 10
45
Reference:- 1.http://en.wikipedia.org/wiki/Entropy_encoding 2.www.cis.upenn.edu/~matuszek/cit594-2002/Slides/huffman.ppt 3.is.cs.nthu.edu.tw/course/2012Spring/ISA530100/chapt06.ppt 4.ihoque.bol.ucla.edu/presentation.ppt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.