Download presentation
Presentation is loading. Please wait.
1
CSCI 3 Chapter 1.8 Data Compression
2
Chapter 1.8 Data Compression For the purpose of storing or transferring data, it is often helpful to reduce the size of the data involved. The technique for accomplishing this is called data compression.
3
Generic Data Compression Techniques Run-length encoding Relative encoding Frequency-dependent encoding Adaptive dictionary encoding
4
Run-length encoding When the data being compressed consist of long sequences of the same value, the run- length encoding produces its best results. Run Length Encoding (RLE) is a simple and popular data compression algorithm. It is based on the idea to replace a long sequence of the same symbol by a shorter sequence and is a good introduction into the data compression field for newcomers.
5
Run-length encoding It replaces the consist of long sequences of the same value with a code indicating the value that is repeated and the number of times it occurs in the sequence.
6
Run-length encoding Example: abcdeeeeeeeeeefghi And noticing that the letter “e” is repeated 10 times in a row. RLE compression would look at this and say "there are 4 non-repeating bytes (abcd) followed by 10 'e' characters which are then followed by 4 non-repeating bytes (fghi)".
7
Run-length encoding Example of run-length encoding. Each run of zeros is replaced by two characters in the compressed file: a zero to indicate that compression is occurring, followed by the number of zeros in the run
8
Relative encoding Its approach is to record the differences between consecutive data blocks rather than entire blocks. Each block is encoded in terms of its relationship to the previous block.
9
Frequency-dependent encoding In English language the letters e, t, a, and I are used more frequently than the letters z, q, and x. So, when constructing a code for text in the English language, space can be saved by using short bit patterns to represent the former letters and longer bit patters to represent the latter ones.
10
Frequency-dependent encoding The result would be a code the English text would have shorter representations than would be obtained with uniform-length codes. Example: Huffman code. This method is named after D.A. Huffman, who developed the procedure in the 1950s.
11
Frequency-dependent encoding Huffman codes. It is the most frequency-dependent codes in use today are Huffman codes.
12
Huffman codes The following fig. shows a histogram of the byte values from a large ASCII file. More than 96% of this file consists of only 31 characters: the lower case letters, the space, the comma, the period, and the carriage return. This observation can be used to make an appropriate compression scheme for this file.
13
Huffman codes
14
Assign a few, one or two bits to characters that occur most often. Infrequently characters such as: !, @, #, $ and %, may require a dozen or more bits. In mathematical terms, the optimal situation is reached when the number of bits used for each character is proportional to the logarithm of the character's probability of occurrence.
15
Huffman codes Huffman encoding. The encoding table assigns each of the seven letters used in this example, based on its probability of occurrence. The original data stream composed of these 7 characters is translated by this table into the Huffman encoded data. Since each of the Huffman codes is a different length, the binary data need to be regrouped into standard 8 bit bytes for storage and transmission.
16
Huffman codes
17
Compression IMAGE GIF (“Jiff”) JPEG (“JAY-peg”) Audio & Video MPEG MP3
18
1.9 Communication Errors Parity Bits Simple method is based on the principle that if each bit pattern has an odd and even number of 1s. Encode system in which each pattern contains odd number of 1s : odd parity even number of 1s : even parity
19
1.9 Communication Errors Parity Bits Odd parity 1 0 1 0 0 0 0 0 10 0 1 0 0 0 1 1 0
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.