Lecture # 20 Image and Data Compression
Data Compression
How big? Image 1024x1024x3 –3 Million bytes (3 MB) Audio x 10 min x 60 sec/min x 2 –58 million bytes (58 MB) Video –640 x 480 x 10 minutes –307,200 x 600 sec x 30 fps –16.6 billion pixels (17 GB) Compression (reduce the size)
Problem Reduce the size of a data object –Text –Image –Audio –Video How to do it –Cheat in ways that the user can ’ t see –Coherence
Ways to cheat Text generally only has less than 128 possible characters. –Use 7 bits instead of 8 (12%) For text, some characters are more common than others –Use fewer bits for common characters, more bits for infrequently used characters
Ways to cheat People can ’ t see more than 64 levels of gray –Use 6 bits instead of 8 (25%) People don ’ t see color as well as B/W –Use 6 bits for B/W and much less for color
Coherence If we know the previous value of something, then we generally have a good idea what the next value will be 3 Techniques –Run length encoding –Reuse of subsequences –Prediction and error
Run length encoding Values are frequently repeated. –Instead of storing each value, store a single value with a count of how many times to repeat
12 x 10 = 120 pixels 120 pixels x 3 bytes/pixel = 360 bytes
Run encoded RGB - 3 bytes Count - 1 byte Entries - 23 Space - 4*23 = 92 Compression (360-92)/360 = 74%
Run encoded - with indexed color 4 colors - 12 bytes index - 2 bits Count - 6 bits Entries - 23 Space *23 = 35 Compression (360-35)/360 = 90%
Run encoding HELLO Works well
Run encoding Works Badly
Run encoding Works well
Run encoding Not good Too much variation in the rose
Run encoding - text four score and seven years ago, our fathers brought forth on this continent Not good no repetition
Run Encoding - Audio Not good No repetition
Run Encoding - Audio Not good No repetition
Reuse common sequences
Works really well Used in GIF format
Reuse common sequences Works fair Blacks are good Rose has some similarities
Reuse common sequences
Works really well
Reuse common sequences Works poorly
Reuse common sequences Video Works really well Copy pieces from last frame into this frame One technique in MPEG
Reuse common sequences Text Reuses words and phrases Works fairly well Most common text compression technique
Prediction + error Given previous values, predict what the next value will be When it is not quite right, store the error The error almost always takes fewer bits than the value
Linear prediction line through previous predicts next Little error
Linear prediction line through previous predicts next More error
Linear prediction line through previous predicts next Still more error
Linear prediction line through previous predicts next less error
Linear prediction line through previous predicts next less error
Linear prediction line through previous predicts next little error
Linear Prediction
Look closer Little Error More error
Linear Prediction Prediction + error Shades of black Follows shade of rose Rose detail is error off shade Prediction + error + cheating = JPEG
JPEG Comparisons
Video Copy from previous frame Store error for small details MPEG
Text N-Grams Use the last N letters to predict the next letter Store errors English is quite regular
Review Cheat –Exploit weakness in what people can perceive Coherence –Run encoding (count repetitions) –Reuse (reference pieces from previous data) –Predict + error Know when each technique will or will not work