Download presentation
Presentation is loading. Please wait.
Published byChad Potter Modified over 9 years ago
1
CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC 100 -- Data Compression
2
Data Compression: Motivation Memory is a finite resource: the more data we have, the more space it takes to store Same with bandwidth: the more data we need to send, the more time it takes Data compression can reduce space and bandwidth Lossless compression: Store the exact same data in less space Lossy compression: Store an approximation of the data in less space Tue 9/18/12CMSC 100 -- Data Compression 2
3
Time and Space Tradeoffs Data compression trades (computational) time for space and bandwidth: It takes time to convert the original data D to the compressed format D C It takes time to convert compressed data D C back to a viewable format D’ Compression ratio: Space savings: Tue 9/18/12CMSC 100 -- Data Compression 3
4
Lossless vs. Lossy Compression Lossless: Save space without losing any information Take advantage of repetition and self-similarity (e.g., solid-color regions in an image) Lossy: Save space but lose some information Lose resolution or detail (e.g., “pixillate” an image or remove very high/low frequencies in a sound file) Tue 9/18/12CMSC 100 -- Data Compression 4
5
Encoding Strategies Run-length encoding: replace n instances of object x with the pair of numbers (n,x) Frequency-dependent encoding: use shorter representations (fewer bits) for objects that appear more frequently in a document Relative or differential encoding: when x is followed by y, represent y by the difference y-x (which is often small in images etc. and can therefore be represented by a short code) Dictionary encoding: Create an index of all of the objects (e.g., words) in a document, then replace each object with its index location (can save space if there is a lot of repetition) Tue 9/18/12CMSC 100 -- Data Compression 5
6
Image and Sound Formats Images Row-by-row bitmaps in different color spaces: RGB (one byte per color = 24 bits = 17M different colors), a.k.a “True Color” (used in JPEG formats) (How much storage for one True Color 2Kx3K digital camera image?) Color palette: Use only one byte to index 256 of the 17M 24-bit colors (used in GIF formats) (How much storage for one 24-bit color 200x300 image on a website?) Variable resolution provides different image sizes and levels of fidelity to an original (continuous or very high-resolution digital) image Sound Convert continuous sound to digital by sampling (variable-rate) Each sample can be represented with varying levels of resolution (“bit depth”) (MP3: 44K samples/second, 16 bits/sample – how much storage for one minute of sound? ) Tue 9/18/12CMSC 100 -- Data Compression 6
7
Compression Ratio: Example Suppose I have a 2M.PNG (bitmap) image and I store it in a compressed.JPG file that is 187K. What is the compression ratio? What is the space savings? Tue 9/18/12CMSC 100 -- Data Compression 7
8
Huffman Coding Lossless frequency-based encoding Huffman coding is (space-)optimal in the sense that if we need the exact distribution (frequency) of every object, we will be able to represent the document in the shortest possible number of bits Downside: It takes a while to compute Goal #1: Length of each object should be related to its frequency Specifically: length is proportion to the negative log of the frequency Goal #2: Code should be unambiguous Since objects will be encoded at different lengths, as we read the bits, we need to know when we’ve reached the end of one object and should begin processing the next one This type of code is called a prefix code Tue 9/18/12CMSC 100 -- Data Compression 8
9
Using a Prefix Code Tue 9/18/12CMSC 100 -- Data Compression 9 AE LHO SC How would you represent “HELLO” using this code? 01 Note: By convention, the left branch is 0; the right branch is 1 01 01 01 01 01
10
Interpreting a Prefix Code Tue 9/18/12CMSC 100 -- Data Compression 10 AE LHO SC What does “1110000110110111110” mean in this code? 01 01 01 01 01 01
11
Interpreting a Prefix Code Tue 9/18/12CMSC 100 -- Data Compression 11 AE LHO SC What does “1110000110110111110” mean in this code? 01 01 01 01 01 01 C
12
Interpreting a Prefix Code Tue 9/18/12CMSC 100 -- Data Compression 12 AE LHO SC What does “1110 | 000110110111110” mean in this code? 01 01 01 01 01 01 C
13
Interpreting a Prefix Code Tue 9/18/12CMSC 100 -- Data Compression 13 AE LHO SC What does “1110 | 000110110111110” mean in this code? 01 01 01 01 01 01 CH
14
Interpreting a Prefix Code Tue 9/18/12CMSC 100 -- Data Compression 14 AE LHO SC What does “1110 | 000 | 110 | 110 | 1111 | 10” mean in this code? 01 01 01 01 01 01 CHOOSE
15
AOSPC TLW C!PMUS Y RE Decode the Message: 0111110010100101011011100011110111110110 010 00111111110 010 0110001110 010 0110001110 010 0110001110 010 0001100000100100000000110 010 011111001000000 01110 01 Tue 9/18/12 15 CMSC 100 -- Data Compression 01 01 01 0101 01 01 01 01 01 0 10 1 0 1
16
Encoding Algorithm Frequency distribution: Set of k objects, o 1...o k Number of times of each object appears in the document, n 1...n k Construct a Huffman code as follows: 1. Pick the two least frequent objects, o i and o j 2. Replace them with a single combined object, o ij, with frequency n i +n j 3. If there are at least two objects left, go to step 1 Visually: Each of the original objects is a leaf (bottom node) in the prefix tree Each combined objects represents a 0/1 split where the “children” are the two objects that were combined In the last step, we combine two subtrees into a single final prefix tree Tue 9/18/12CMSC 100 -- Data Compression 16
17
Encoding Example SHE SELLS SEASHELLS BY THE SEASHORE Tue 9/18/12CMSC 100 -- Data Compression 17
18
Encoding Example SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Tue 9/18/12CMSC 100 -- Data Compression 18
19
Encoding Example SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Tue 9/18/12CMSC 100 -- Data Compression 19 2 O1B1
20
Encoding Example SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Tue 9/18/12CMSC 100 -- Data Compression 20 2 O1B1 2 T1R1 3 Y1A2
21
SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Encoding Example Tue 9/18/12CMSC 100 -- Data Compression 21 2 O1B1 2 T1R1 3 Y1A2 4 7
22
SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Encoding Example Tue 9/18/12CMSC 100 -- Data Compression 22 2 O1B1 2 T1R1 3 Y1A2 4 78 L4H4
23
CMSC 100 -- Data Compression SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Encoding Example Tue 9/18/12 23 2 O1B1 2 T1R1 3 Y1A2 4 78 L4H4 12 E7_5
24
CMSC 100 -- Data Compression SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Encoding Example Tue 9/18/12 24 2 O1B1 2 T1R1 3 Y1A2 4 78 L4H4 12 E7_5 15
25
CMSC 100 -- Data Compression SHE SELLS SEASHELLS BY THE SEASHORE Frequency distribution: A – 2 B – 1 E – 7 H – 4 L – 4 O – 1 R – 1 S – 8 T – 1 Y – 1 – 5 Encoding Example Tue 9/18/12 25 2 O1B1 2 T1R1 3 Y1A2 4 8 L4H4 12 E7_5 15 20 S8 7 35
26
Green Eggs and Ham Tue 9/18/12CMSC 100 -- Data Compression 26
27
Green Eggs and Ham I am Sam Sam I am That Sam-I-am! I do not like that Sam-I-am! Do you like green eggs and ham? I do not like them, Sam-I-am. I do not like green eggs and ham. Tue 9/18/12 27 CMSC 100 -- Data Compression Symbols (not letters!) are words. Ignore spaces and punctuation.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.