Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.

Slides:



Advertisements
Similar presentations
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Advertisements

Data Compression CS 147 Minh Nguyen.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Greedy Algorithms (Huffman Coding)
Data Compression Michael J. Watts
Compression & Huffman Codes
Introduction to Data Compression
SWE 423: Multimedia Systems
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Spatial and Temporal Data Mining
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Computer Science 335 Data Compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Chapter 10: Transmission Efficiency Business Data Communications, 4e.
End-to-End Data Outline Presentation Formatting Data Compression.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Data Compression.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
1 Topic 4: Physical Layer - Chapter 10: Transmission Efficiency Business Data Communications, 4e.
ECE242 L30: Compression ECE 242 Data Structures Lecture 30 Data Compression.
MULTIMEDIA TECHNOLOGY SMM 3001 DATA COMPRESSION. In this chapter The basic principles for compressing data The basic principles for compressing data Data.
Data Compression.
Chapter 10: Transmission Efficiency Business Data Communications, 4e.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
CS 111 – Sept. 10 Quiz Data compression –text –images –sounds Commitment: –Please read rest of chapter 1. –Department picnic next Wednesday.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
1 Introduction to Information Technology LECTURE 5 COMPRESSION.
CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC Data Compression.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
CIS Data Communications1 CIS-325 Data Communication Dr. L. G. Williams, Instructor.
CSCI-100 Introduction to Computing Hardware Part II.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Multi-media Data compression
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Lecture 12 Huffman Coding King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
Textbook does not really deal with compression.
Design & Analysis of Algorithm Huffman Coding
Compression & Huffman Codes
Lecture 10: data compression
Data Compression.
Lecture 10: data compression
Data Compression.
Data Compression CS 147 Minh Nguyen.
UNIT IV.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Data Compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression

Content Introduction Techniques for compression –Run-length –Lempel-Ziv –Huffman Mpeg-4 Conclusion

In nature, science, and human affairs, where do we see compression and decompression?

Motivation for Compression Compression is especially important in video, voice and fax applications where very large amounts of data is transmitted. Data compression can increase the throughput considerably. Example If there are 40,000 picture elements (pixels) per square inch. on a 8.5" x 11" page, there are 3,740,000 bits. Using a 56Kbps line, this transmission would take 67 seconds. If the data is compressed by a factor of 10, the transmission time is reduced to 6.7 seconds per page. These days, data compression is commonly used by modems, fax machines, video conferencing equipment, your TIVO, etc.

Realize cost savings in design of system: Examples: Modems, analog fax, compressed voice for cellular radio. Digital voice Compressed video, CD music, iPod Without compression, these applications would not be feasible. Practical applications of data compression Device 2 Device 1 Bottleneck

Principles behind Compression Types of techniques: 1. Redundancy reduction: Remove redundancy from the message. Usually lossless. 2. Reduce information content: Reduce the total amount of information in the message. Leads to sacrifice of quality. Usually lossy.

Categories of compression 1. Data compression Used for data files and program files. Lossless. e.g., Winzip, gzip, compress. 2. Audio compression. Compresses digitized voice (e.g. cellular) and music. Lossy for voice, lossless for hi-fi music. e.g. Real Audio. 3. Image compression Removes redundancy within the frame. Different formats. BMP (bitmap file) is lossless but creates large files. GIF and JPEG lossy. 4. Video compression. Removes intra- and inter-frame redundancy. Lossy. Examples: MPEG, Quicktime, Real Video.

Compressibility of different data patterns In which set is the information content the highest? How will you store these patterns of information in the most economical way? 0 - CLOUDY DAY 1 - SUNNY DAY SET 1: SET 2: SET 3: SET 4: SET 5:

Compression Techniques Common compression techniques “Seinfeld” method: yada, yada, yada... Run-length encoding Lempel-Ziv method Huffman coding Marcy: Speaking of ex's, my old boyfriend came over late last night, and, yada yada yada, anyway. I'm really tired today.

Spot the difference… That’s it. Image Compressed 48 times while you watched

RUN-LENGTH ENCODING Source: NY Times, June 18, 1998.

RUN-LENGTH ENCODING Look for sequences of repeating characters Replace a sequence of repeating characters with a 3-char code: special character that indicates suppression character to be suppressed frequency (count of number of characters) Example: $******55.72 becomes $S* GunsbbbbbbbbbButter becomes GunsSb9Butter What does the efficiency of this method depend on?

Lempel-Ziv

Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. ain ain The rain in spain falls mainly on the plain. Token [a,b] means: go back a characters. copy b characters from there. ain ain The rain [3,3] spain falls mainly on the plain.

Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. ain ain The rain in spain falls mainly on the plain. Token [a,b] means: go back a characters. copy b characters from there. ain [9,4] The rain [3,3] sp [9,4] falls mainly on the plain.

Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. ain ain The rain in spain falls mainly on the plain. Token [a,b] means: go back a characters. copy b characters from there. ain [9,4] The rain [3,3] sp [9,4] falls m [11,3] ly on the plain.

Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. ain ain The rain in spain falls mainly on the plain. Token [a,b] means: go back a characters. copy b characters from there. ain [9,4] The rain [3,3] sp [9,4] falls m [11,3] ly on [34,4] plain.

Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. ain ain The rain in spain falls mainly on the plain. Token [a,b] means: go back a characters. copy b characters from there. This message contains 27 characters and 5 tokens. Each token needs 2 bytes. Thus, space required is 37 bytes vs. original of 44 bytes. (Note: Since each token takes two bytes, this replacement is done only if the repeating pattern is more than two bytes long. ) ain [9,4] The rain [3,3] sp [9,4] falls m [11,3] ly on [34,4] pl [15,3].

Huffman coding Consider a language with only 4 characters, T, E, L, K. Here is a pattern in this language: T E E E L E E E K E Probability of T = 0.1 Probability of E = 0.7 Probability of L = 0.1 Probability of K = 0.1 If we use 2-bit codes for each character, say, 00 - T; 01- E; 10- L; 11- K, then we need 20 bits to store this pattern. Question: Can we do better? i.e., store the pattern in fewer bits.

HUFFMAN CODING Algorithm

HUFFMAN CODING EXAMPLE T L K E Codes: T: 000 L: 001 K: 01 E: 1 Codes: T: 000 L: 001 K: 01 E: 1 1.Treat each character or symbol as leaf node in a tree (ordered by probability and occurrence) 2.Merge two lowest probability nodes into a node whose probability is the sum of the two merged nodes. 3.Repeat this process until no unmerged nodes remain. The final node is the root of a tree. 4.Label each pair of branches starting from root with 0 and 1 5.The code word for a symbol is the string of labels from the root node to the original symbol. 1.Treat each character or symbol as leaf node in a tree (ordered by probability and occurrence) 2.Merge two lowest probability nodes into a node whose probability is the sum of the two merged nodes. 3.Repeat this process until no unmerged nodes remain. The final node is the root of a tree. 4.Label each pair of branches starting from root with 0 and 1 5.The code word for a symbol is the string of labels from the root node to the original symbol

Decoding a Message (start from left) Codes: T: 000 L: 001 K: 01 E: 1 Codes: T: 000 L: 001 K: 01 E: TEKKKEEEELE

SAVINGS FROM HUFFMAN CODING Original string had 10 characters, each 2 bits long. Total length = 20 bits Modified String: Tonce -----> 1 x 3 = 3 bits Konce -----> 1 x 3 = 3 bits Lonce -----> 1 x 2 = 2 bits E7 times -----> 7 x 1 = 7 bits Total = 15 bits Savings = (20-15) = 25 % 20

Applications and Standards MNP Class 5 is a modem standard which uses run-length encoding. V.42 bis is a newer modem standard for high-speed modems These modems use Lempel-Ziv compression method and can compress by a factor of 3.5 to 4 times. Video standards: H261, JPEG, MPEG-1 (for rates up to 1.5 Mbps), MPEG-2 (for rates up to 40 Mbps). Audio compression standards: ADPCM, LPC (Linear Predictive Coding), MPEG Audio (e.g., MP3) In general, compression ratio depends upon nature of data

MPEG-4 The “bane” of DVD? A standard for transmitting video and sound Meshes existing MPEG-2 inter- and intra-frame advancements with VRML What about MPEG-7?

MPEG-4

Conclusion Anything can be compressed more… …but can the original form be recreated? Big Bang: The ultimate decompression! Image source: