1 Introduction to Information Technology LECTURE 5 COMPRESSION.

Slides:



Advertisements
Similar presentations
Data compression. INTRODUCTION If you download many programs and files off the Internet, we have probably encountered.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compression Michael J. Watts
Compression & Huffman Codes
School of Computing Science Simon Fraser University
Huffman Encoding 16-Apr-17.
SWE 423: Multimedia Systems
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
Computer Science 335 Data Compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
 Wisegeek.com defines Data Compression as:  “Data compression is a general term for a group of technologies that encode large files in order to shrink.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Compression Algorithms Robert Buckley MCIS681 Online Dr. Smith Nova Southeastern University.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
ECE242 L30: Compression ECE 242 Data Structures Lecture 30 Data Compression.
MULTIMEDIA TECHNOLOGY SMM 3001 DATA COMPRESSION. In this chapter The basic principles for compressing data The basic principles for compressing data Data.
Multimedia and The Web.
IT-101 Section 001 Lecture #7 Introduction to Information Technology.
1 Analysis of Algorithms Chapter - 08 Data Compression.
1 Lecture 17 – March 21, 2002 Content-delivery services. Multimedia services Reminder  next week individual meetings and project status report are due.
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Data Compression. Compression? Compression refers to the ways in which the amount of data needed to store an image or other file can be reduced. This.
Chapter 7 – End-to-End Data Two main topics Presentation formatting Compression We will go over the main issues in presentation formatting, but not much.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Digital Image Processing Image Compression
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Chapter Five Making Connections Efficient: Multiplexing and Compression Data Communications and Computer Networks: A Business User’s Approach Eighth Edition.
Multi-media Data compression
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
ECE 101 An Introduction to Information Technology Information Coding.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
Textbook does not really deal with compression.
HUFFMAN CODES.
Digital Image Processing Lecture 20: Image Compression May 16, 2005
IMAGE COMPRESSION.
Data Compression.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
UNIT IV.
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Huffman Encoding.
Chapter 8 – Compression Aims: Outline the objectives of compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

1 Introduction to Information Technology LECTURE 5 COMPRESSION

2 Why Do We Need Compression?

3 How Large is a Digitized Image File? In-Class Example

4 Downloading an Image File In-Class Example

5 Reducing the Size of an Image File In-Class Example

6 How Big is a Digital Video File? Color Screen 512 x 512 pixels 3 bits per color per pixel = 9 bits/pixel Scene changes 60 frames/second 512 x 512 x 9 x 60 x 3600 = 500 billion bits/hour “The Godfather” requires 191 GB storage ???? COMPRESSION is a Critical Requirement

7 COMPRESSION Compression techniques can significantly reduce the bandwidth and memory required for sending, receiving, and storing data. Most computers are equipped with modems that compress or decompress all information leaving or entering via the phone line. With a mutually recognized system (e.g. WinZip) the amount of data can be significantly diminished. Examples of compression techniques we’ll discuss: Compressing BINARY DATA STREAMS Variable length coding (e.g. Huffman coding) Universal Coding (e.g. WinZip) IMAGE-SPECIFIC COMPRESSION GIF and JPEG VIDEO COMPRESSION AND MUSIC COMPRESSION MPEG and MP3

8 WHY CAN WE COMPRESS INFORMATION? Compression is possible because information usually contains redundancies, or information that is often repeated. For example, two still images from a video sequence of images are often similar. This fact can be exploited by transmitting only the changes from one image to the next. For example, a line of data often contains redundancies “Ask not what your country can do for you - ask what you can do for your country.” File compression programs remove this redundancy.

9 Redundancy Enables Compression This quote from John F. Kennedy’s inaugural address contains 79 units. 61 letters + 16 spaces + 1 dash + 1 period. Each requires one unit of memory. (1 byte) To reduce memory space, we look for redundancies. “ask” appears two times “what” appears two times “your” appears two times “country” appears two times “can” appears two times “do” appears two times “for” appears two times “you” appears two times “Ask not what your country can do for you - ask what you can do for your country.” Nearly half of the sentence is redundant.

10 Text Files Contain High Redundancy In English and other languages, words often appear together. e.g The.. And..From..Of..Because.. Consequently, a large text file often can be reduced by 50% through compression algorithms. Similarly, programming languages contain a high degree of redundancy. A small number of commands are used over and over again.

11 WHY ELSE CAN WE COMPRESS INFORMATION? We can only hear certain frequencies Our eyesight can only resolve so much detail We can only process so much information at one time.

12 WHY ELSE CAN WE COMPRESS INFORMATION? Some characters occur more frequently than others. It’s possible to represent frequently occurring characters with a smaller number of bits during transmission. This may be accomplished by a variable length code, as opposed to a fixed length code like ASCII. An example of a simple variable length code is Morse Code. “E” occurs more frequently than “Z” so we represent “E” with a shorter length code:. = E - = T = Z = Q

13 SOME BACKGROUND: INFORMATION THEORY Variable length coding exploits the fact that some information occurs more frequently than others. The mathematical theory behind this concept is known as: INFORMATION THEORY Claude E. Shannon developed modern Information Theory at Bell Labs in He saw the relationship between the probability of appearance of a transmitted signal and its information content. This realization enabled the development of compression techniques.

14 A LITTLE PROBABILITY Shannon (and others) found that information can be related to probability. An event has a probability of 1 (or 100%) if we believe this event will occur. An event has a probability of 0 (or 0%) if we believe this event will not occur. The probability that an event will occur takes on values anywhere from 0 to 1. Consider a coin toss: heads or tails each has a probability of.50 In two tosses, the probability of tossing two heads is: 1/2 x 1/2 = 1/4 or.25 In three tosses, the probability of tossing all tails is: 1/2 x 1/2 x 1/2 = 1/8 or.125 We compute probability this way because the result of each toss is independent of the results of other tosses.

15 ENTROPY CONCEPT If the probability of a binary event is.5 (like a coin), then on average, you need one bit to represent the result of this event. As the probability of a binary event increases or decreases, the number of bits you need, on average, to represent the result decreases The figure is expressing that unless an event is totally random, you can convey the information of the event in fewer bits, on average, than it might first appear Let’s do an example... As part of information theory, Shannon developed the concept of ENTROPY Probability of an event Bits

16 EXAMPLE FROM TEXT The probability of male patrons is.8 The probability of female patrons is.2 Assume for this example, groups of two enter the store. Calculate the probabilities of different pairings: Event A, Male-Male. P(MM) =.8 x.8 =.64 Event B, Male-Female. P(MF) =.8 x.2 =.16 Event C, Female-Male. P(FM) =.2 x.8 =.16 Event D, Female-Female. P(FF) =.2 x.2 =.04 We could assign the longest codes to the most infrequent events while maintaining unique decodability. A MEN’S SPECIALTY STORE

17 Let’s assign a unique string of bits to each event based on the probability of that event occurring. EventNameCode AMale-Male0 BMale-Female10 CFemale-Male110 DFemale-Female111 Given a received code of: , determine the events: The above example has used a variable length code. EXAMPLE CONTINUED A MM B MF B MF C FM B MF A MM

18 VARIABLE LENGTH CODING Unlike fixed length codes like ASCII, variable length codes: Assign the longest codes to the most infrequent events. Assign the shortest codes to the most frequent events. Each code word must be uniquely identifiable regardless of length. Examples of Variable Length Coding Morse Code Huffman Coding Takes advantage of the probabilistic nature of information. If we have total uncertainty about the information we are conveying, fixed length codes are preferred.

19 MORSE CODE Characters represented by patterns of dots and dashes. More frequently used letters use short code symbols. Short pauses are used to separate the letters. Represent “Hello” using Morse Code: H.... E. L. -.. O- - - Hello

20 HUFFMAN CODE Creates a Binary Code Tree Nodes connected by branches with leaves Top node – root Two branches from each node D B C A Start Root Branches Node Leaves The Huffman coding procedure finds the optimum, uniquely decodable, variable length code associated with a set of events, given their probabilities of occurrence.

21 A0 B10 C110 D 111 Given the adjacent Huffman code tree, decode the following sequence: HUFFMAN CODING D B C A Start Root Branches Node Leaves C 10 B 0A0A 0A0A 111 D 0A0A

22 HUFFMAN CODE CONSTRUCTION First list all events in descending order of probability. Pair the two events with lowest probabilities and add their probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F.3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F 0.15

23 HUFFMAN CODE CONSTRUCTION Repeat for the pair with the next lowest probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F

24 HUFFMAN CODE CONSTRUCTION Repeat for the pair with the next lowest probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F

25 HUFFMAN CODE CONSTRUCTION Repeat for the pair with the next lowest probabilities..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F

26 HUFFMAN CODE CONSTRUCTION Repeat for the last pair and add 0s to the left branches and 1s to the right branches..3 Event A.3 Event B.13 Event C.12 Event D.1 Event E.05 Event F

27 QUESTION Given the code we just constructed: Event A: 00Event B: 01 Event C: 100Event D: 101 Event E: 110Event F: 111 How can you decode the string: ? Starting from the leftmost bit, find the shortest bit pattern that matches one of the codes in the list. The first bit is 0, but we don’t have an event represented by 0. We do have one represented by 00, which is event A. Continue applying this procedure: 00 A 00 A 111 F 01 B 01 B 100 C 01 B 00 A 00 A 00 A 111 F

28 In-Class Problem Construct a Huffman code tree for the following events: Probability (Event A) =.5 Probability (Event B) =.3 Probability (Event C) =.1 Probability (Event D) =.1

29 In Class Problem Using the Huffman Code tree below, decode the following sequence:

30 UNIVERSAL CODING Huffman has its limits You must know a priori the probability of the characters or symbols you are encoding. What if a document is “one of a kind?” Universal Coding schemes do not require a knowledge of the statistics of the events to be coded. Universal Coding is based on the realization that any stream of data consists of some repetition. Lempel-Ziv coding is one form of Universal Coding presented in the text. Compression results from reusing frequently occurring strings. Works better for long data streams. Inefficient for short strings. Used by WinZip to compress information.

31 Lossless Versus Lossy Compression LOSSLESS CODING: Every detail of the original data is restored upon decoding. Examples of compression we’ve discussed thus far are “Lossless” Lossless approach absolutely essential for information like financial or engineering data. LOSSY CODING: Some information is lost. Lossy coding can be applied to data in which we can tolerate some loss of information. Human vision can tolerate some loss of image sharpness fax images, photographs, video clips Human hearing can tolerate some loss of fidelity in sound. Fidelity = faithfulness of our reproduction of an image or sound after compression and decompression

32 IMAGE COMPRESSION Near Photographic Quality Image 1,280 Rows of 800 pixels each, with 24 bits of color information per pixel Total = 24,576,000 bits 56 Kbps modem 56,000 bits/sec How long does it take to download? 24,576,000/56,000 = 439 seconds/60 = 7.31 minutes Obviously image compression is essential.

33 IMAGES ARE WELL-SUITED FOR COMPRESSION Images have more redundancy than other types of data. Images contain a large amount of structure. Human eye is very tolerant of approximation error. 2 types of image compression Lossless coding Every detail of original data is restored upon decoding Examples – Run Length Encoding, JPEG, GIF Lossy coding Portion of original data is lost but undetectable to human eye Good for images and audio Examples - JPEG

34 IMAGE COMPRESSION JPEG -Joint Photographic Experts Group 29 distinct coding systems for compression, 2 for Lossless compression. Lossless JPEG uses a technique called predictive coding to attempt to identify pixels later in the image in terms of previous pixels in that same image. Lossy JPEG consists of image simplification, removing image complexity at some loss of fidelity. GIF – Graphics Interchange Format Developed by CompuServe. Lossless image compression system. Application of Lempel-Ziv-Welch (LZW) The two compressed image formats most often encountered on the Web are JPEG and GIF.

35 DIGITAL VIDEO COMPRESSION - MPEG MPEG is a series of techniques for compressing streaming digital information. DVDs use MPEG coding. MPEG achieves compression results on the order of 1/35 of original. If we examine two still images from a video sequence of images, we will almost always find that they are similar. This fact can be exploited by transmitting only the changes from one image to the next. Many pixels will not change from one image to the next. Called IMAGE DIFFERENCE CODING Motion Picture Expert Group (MPEG) standard for video compression.

36 MPEG Audio Layer-3 (MP3) The MPEG Compression Standard includes a Specification for Compressing Sound. Technical Name is MPEG Audio Layer-3. Acronym is MP3 CDs Store Music in Uncompressed Formats. Assume Music is sampled 44,100 times per second. 44,100 samples/second * 16 bits/sample * 2 channels = 1,411,200 bits per second It would take a prohibitively long time to download a song over a 56 Kbps modem. Compression is essential. MP3 reduces the file size by more than ten times. HOW? PERCEPTUAL NOISE SHAPING. Example of Lossy Compression. NEW TOPIC: AUDIO AS INFORMATION..