Chapter 7 Special Section Focus on Data Compression.

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Compression Michael J. Watts
School of Computing Science Simon Fraser University
Spatial and Temporal Data Mining
Computer Science 335 Data Compression.
End-to-End Data Outline Presentation Formatting Data Compression.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
Chapter 7 Special Section Focus on Data Compression.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
Compression Algorithms Robert Buckley MCIS681 Online Dr. Smith Nova Southeastern University.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
Cosc 2150: Computer Organization Chapter 2a Data compression.
Chapter 7 Input/Output and Storage Systems. 2 Chapter 7 Objectives Understand how I/O systems work, including I/O methods and architectures. Become familiar.
Lecture 1 Contemporary issues in IT Lecture 1 Monday Lecture 10:00 – 12:00, Room 3.27 Lab 13:00 – 15:00, Lab 6.12 and 6.20 Lecturer: Dr Abir Hussain Room.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
CSCI-235 Micro-Computers in Science Hardware Part II.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
1 Lecture 17 – March 21, 2002 Content-delivery services. Multimedia services Reminder  next week individual meetings and project status report are due.
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Chapter 1: Data Storage.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman Code and Data Decomposition Pranav Shah CS157B.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
CSCI-100 Introduction to Computing Hardware Part II.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Chapter 1: Data Storage.
Computer Science: An Overview Eleventh Edition
GCSE COMPUTER SCIENCE Topic 3 - Data 3.3 Data Storage and Compression.
Textbook does not really deal with compression.
File Compression 3.3.
Data Coding Run Length Coding
Compression & Huffman Codes
IMAGE COMPRESSION.
Data Compression.
Lesson Objectives Aims You should know about: 1.3.1:
Data Compression.
Huffman Coding, Arithmetic Coding, and JBIG2
Chapter 7 Special Section
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Chapter 7 Special Section
Chapter 8 – Compression Aims: Outline the objectives of compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Chapter 7 Special Section Focus on Data Compression

2 7A Objectives Understand the essential ideas underlying data compression. Become familiar with the different types of compression algorithm. Be able to describe the most popular data compression algorithms in use today and know the applications for which each is suitable.

3 Data compression is important to storage systems because it allows more bytes to be packed into a given storage medium than when the data is uncompressed. Some storage devices (notably tape) compress data automatically as it is written, resulting in less tape consumption and significantly faster backup operations. Compression also reduces file transfer time, saving time and communications bandwidth. 7A.1 Introduction

4 A good metric for compression is the compression factor (or compression ratio) given by: If we have a 100KB file that we compress to 40KB, we have a compression factor of: 7A.1 Introduction

5 Compression is achieved by removing data redundancy while preserving information content. The information content of a group of bytes (a message) is its entropy. –Data with low entropy permit a larger compression ratio than data with high entropy. Entropy, H, is a function of symbol frequency. It is the weighted average of the number of bits required to encode the symbols of a message: H= -P(x)  log 2 P(x i ) 7A.1 Introduction

6 The entropy of the entire message is the sum of the individual symbol entropies.  -P(x)  log 2 P(x i ) The average redundancy for each character in a message of length l is given by:  P(x)  l i -  -P(x)  log 2 P(x i ) 7A.1 Introduction

7 Consider the message: HELLO WORLD! –The letter L has a probability of 3/12 = 1/4 of appearing in this message. The number of bits required to encode this symbol is -log 2 (1/4) = 2. Using our formula,  -P(x)  log 2 P(x i ), the average entropy of the entire message is –This means that the theoretical minimum number of bits per character is Theoretically, the message could be sent using only 37 bits. (3.022  12 = 36.26) 7A.1 Introduction

8 The entropy metric just described forms the basis for statistical data compression. Two widely-used statistical coding algorithms are Huffman coding and arithmetic coding. Huffman coding builds a binary tree from the letter frequencies in the message. –The binary symbols for each character are read directly from the tree. Symbols with the highest frequencies end up at the top of the tree, and result in the shortest codes. 7A.2 Statistical Coding

9 The process of building the tree begins by counting the occurrences of each symbol in the text to be encoded. HIGGLETY PIGGLTY POP THE DOG HAS EATEN THE MOP THE PIGS IN A HURRY THE CATS IN A FLURRY HIGGLETY PIGGLTY POP 7A.2 Statistical Coding

10 Next, place the letters and their frequencies into a forest of trees that each have two nodes: one for the letter, and one for its frequency. 7A.2 Statistical Coding

11 We start building the tree by joining the nodes having the two lowest frequencies. 7A.2 Statistical Coding

12 And then we again join the nodes with two lowest frequencies. 7A.2 Statistical Coding

13 And again.... 7A.2 Statistical Coding

14 Here is our finished tree. 7A.2 Statistical Coding

15 This is the code derived from this tree. 7A.2 Statistical Coding

16 The second type of statistical coding, arithmetic coding, partitions the real number interval between 0 and 1 into segments according to symbol probabilities. –An abbreviated algorithm for this process is given in the text. Arithmetic coding is computationally intensive and it runs the risk of causing divide underflow. Variations in floating-point representation among various systems can also cause the terminal condition (a zero value) to be missed. 7A.2 Statistical Coding

17 For most data, statistical coding methods offer excellent compression ratios. Their main disadvantage is that they require two passes over the data to be encoded. –The first pass calculates probabilities, the second encodes the message. This approach is unacceptably slow for storage systems, where data must be read, written, and compressed within one pass over a file. 7A.2 Statistical Coding

18 Ziv-Lempel (LZ) dictionary systems solve the two-pass problem by using values in the data as a dictionary to encode itself. The LZ77 compression algorithm employs a text window in conjunction with a lookahead buffer. –The text window serves as the dictionary. If text is found in the lookahead buffer that matches text in the dictionary, the location and length of the text in the window is output. 7A.3 LZ Dictionary Systems

19 The LZ77 implementations include PKZIP and IBM’s RAMAC RVA 2 Turbo disk array. –The simplicity of LZ77 lends itself well to a hardware implementation. LZ78 is another dictionary coding system. It removes the LZ77 constraint of a fixed-size window. Instead, it creates a trie as the data is read. Where LZ77 uses pointers to locations in a dictionary, LZ78 uses pointers to nodes in the trie. 7A.3 LZ Dictionary Systems

20 GIF compression is a variant of LZ78, called LZW, for Lempel-Ziv-Welsh. It improves upon LZ78 through its efficient management of the size of the trie. Terry Welsh, the designer of LZW, was employed by the Unisys Corporation when he created the algorithm, and Unisys subsequently patented it. Owing to royalty disputes, development of another algorithm PNG, was hastened. 7A.4 GIF and PNG Compression

21 PNG employs two types of compression, first a Huffman algorithm is applied, which is followed by LZ77 compression. The advantage that GIF holds over PNG, is that GIF supports multiple images in one file. MNG is an extension of PNG that supports multiple images in one file. GIF, PNG, and MNG are primarily used for graphics compression. To compress larger, photographic images, JEPG is often more suitable. 7A.4 GIF and PNG Compression

22 Photographic images incorporate a great deal of information. However, much of that information can be lost without objectionable deterioration in image quality. With this in mind, JPEG allows user-selectable image quality, but even at the “best” quality levels, JPEG makes an image file smaller owing to its multiple-step compression algorithm. It’s important to remember that JPEG is lossy, even at the highest quality setting. It should be used only when the loss can be tolerated. The JPEG algorithm is illustrated on the next slide. 7A.5 JPEG Compression

23 7A.5 JPEG Compression

24 7A.6 MP3 Compression MP3 provides a good example of the tremendous power of applied mathematics and algorithms. –MP3 singularly changed the way that the world buys and enjoys music. The algorithm is not new: Karlheinz Brandenburg formulated the basis for MP3 in 1987 while a doctoral student at Erlangen-Nuremberg University. A refined version was adopted by the Moving Picture Experts Group (MPEG) for the audio component of movies.

25 7A.6 MP3 Compression MP3 is one of several related audio compression standards. MP3 is officially known as MPEG-1 Audio Layer III or MPEG-1 Part 3. Adopted by ISO as ISO/IEC MP3 takes as its input a pulse-code modulation (PCM) stream sampled at 44.1kHz. Uncompressed, a 3-minute song would require 32MB of storage! The next slide illustrates PCM encoding.

26 7A.6 MP3 Compression As the sampling rate increases, fidelity improves, but much more data is created.

27 MP3 relies on psychoacoustic coding: the use of imperfections in human hearing. –There is no point in encoding sounds humans can’t hear. For the same reason, sounds at the margins of hearing can tolerate more lossiness than those in the mid-ranges. Bandpass filterbanks allow certain frequencies and omit others. Scalefactor bands determine where lossiness can be tolerated in the output. 7A.6 MP3 Compression The next slide illustrates the MP3 encoding process.

28 7A.6 MP3 Compression Note: The MP3 standard specifies only the algorithm. The implementation is left to the hardware and software companies.

29 Two approaches to data compression are statistical data compression and dictionary systems. Statistical coding requires two passes over the input, dictionary systems require only one. LZ77 and LZ78 are two popular dictionary systems. GIF, PNG, MNG, and JPEG are used for image compression; MP3 (and others) for audio. JPEG is lossy, so its use is not suited for all types of images. Section 7A Conclusion