Lecture 10: data compression

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compression Michael J. Watts
Chapter 7 End-to-End Data
SWE 423: Multimedia Systems
Compression & Huffman Codes
Spring 2003CS 4611 Multimedia Outline Compression RTP Scheduling.
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
SWE 423: Multimedia Systems
Department of Computer Engineering University of California at Santa Cruz Data Compression (1) Hai Tao.
Spatial and Temporal Data Mining
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Lecture 10 Data Compression.
Data Compression and Security Chapter 20, Exploring the Digital Domain.
MULTIMEDIA TECHNOLOGY SMM 3001 DATA COMPRESSION. In this chapter The basic principles for compressing data The basic principles for compressing data Data.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
JPEG. The JPEG Standard JPEG is an image compression standard which was accepted as an international standard in  Developed by the Joint Photographic.
Data Compression. Compression? Compression refers to the ways in which the amount of data needed to store an image or other file can be reduced. This.
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
JPEG CIS 658 Fall 2005.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Image Compression Supervised By: Mr.Nael Alian Student: Anwaar Ahmed Abu-AlQomboz ID: IT College “Multimedia”
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Spring 2000CS 4611 Multimedia Outline Compression RTP Scheduling.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
1 Part A Multimedia Production Chapter 2 Multimedia Basics Digitization, Coding-decoding and Compression Information and Communication Technology.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression Michael J. Watts
Textbook does not really deal with compression.
JPEG Compression What is JPEG? Motivation
IMAGE PROCESSING IMAGE COMPRESSION
Compression & Huffman Codes
Digital Image Processing Lecture 20: Image Compression May 16, 2005
4k… 4K format was named because it has 4000 pixels horizontal resolution approximately. Meanwhile, standard 1080p and 720p resolutions were named because.
IMAGE COMPRESSION.
Data Compression.
Multimedia Outline Compression RTP Scheduling Spring 2000 CS 461.
Lecture 10: data compression
JPEG.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Chapter 11 Data Compression
UNIT IV.
Judith Molka-Danielsen, Oct. 02, 2000
JPEG Still Image Data Compression Standard
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Chapter 8 – Compression Aims: Outline the objectives of compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Lecture 10: data compression

Outline Basics of Data Compression Text & Numeric Compression Image Compression Audio Compression Video Compression Data Security Through Encryption

Learning Outcomes Differentiate between the lossless and the lossy data compression process

Basics of Data Compression Digital compression concepts Compression techniques are used to replace a file with another that is smaller Compressed data requires less storage and can be transmitted at a faster rate Decompression techniques expands the compressed file to recover the original data – either exactly or in facsimile A pair of compression / decompression techniques that work together is called a codec for short

Basics of Data Compression (cont) Compress Decompress – CoDec Main function of CODEC : to reduce the redundancy in data How ??? – by replacing definable patterns with shorter sequences of symbols Uncompressed data Compression / coder Compressed data Decompression / decoder CODEC 10010110010 10101011100

Basics of Data Compression (cont) Types of Codecs Lossy CoDecs Codecs that produces only an approximation of the original data e.g: audio and digital images Lossless CoDecs Codecs that upon decompression always reproduce the originals file exactly e.g: text and numeric data

Basics of Data Compression (cont) Speed of Compression / Decompression Describes the amount of time required to compress and decompress data either symmetrically or asymmetrically Symmetric CoDec takes approximately the same amount of time to compress and decompress a file e.g: video teleconferencing transmission Asymmetric CoDec simple, fast decompression speed but compression is more complicated and significantly slower e.g: storing and accessing CD-ROM / DVD

Basics of Data Compression (cont) CoDec methods CoDec can be distinguish in 3 ways: Syntactic method (entropy encoding) attempt to reduce the redundancy of symbolic patterns without any attention to the type of information represented (ignores the source of information) Semantic method (Source coding) consider special properties of the type of information represented (helps to transform or reduce the amount of non-essential information in the original) Hybrid method combine both syntactic and semantic approaches Firstly, prepare the data using semantic method and then reduce it further with entropy encoding

Entropy Encoding Hybrid Source Coding Layered Coding Run-Length Coding Huffman Coding Arithmetic Coding Source Coding Prediction DPCM, DM Transformation FFT, DCT Layered Coding Bit Position, Subsampling, Sub- band coding Hybrid JPEG, MPEG, H.261, DVI RTV, DVI PLV

Text and Numeric Compression Several methods for compressing files representing text and numeric data 1) Run Length Encoding (RLE) A simple and direct form of compression Based on the assumption that a great deal of redundancy is present in the repetition of particular sequences of symbols Example: consider a sequence of text : A BB CC DDDDDDDDD EE F GGGGG after compression will become A BB CC D#9 EE F G#5 Fax transmissions use RLE for data reduction

Text and Numeric Compression (cont)

Text and Numeric Compression (cont) Run Length Encoding (RLE) Some data contain a sequence of identical bytes The RLE technique replaces these runs of data by using a marker or a counter that indicates the number of occurrences 3-bytes encoding: Uncompressed data  AAAAOOOOOOOOBBBBCCCDCC Compressed data  A#4 O#8 B#4 C C C D C C The # acts as the marker, followed by a number indicating the number of occurrence. This example shows that each run of code is compressed.

Text and Numeric Compression (cont) Run Length Encoding (RLE) RLE can also be coded using 2-bytes The first byte indicates the number of occurrence, where the second indicates the data 2-bytes encoding: Uncompressed data  AAAAOOOOOOOOBBBBCCCDCC Compressed data  4A 8O 4B 3C D C C As a result of this, RLE manages to compress the data down a bit. The original data = 22-bytes (AAAAOOOOOOOOBBBBCCCDCC) RLE compresses it down to 11-bytes (4A 8O 4B 3C D C C)

Text and Numeric Compression (cont) Run Length Encoding (RLE) Compresses more efficiently if the run of strings is really long. Example: AAAAAAAAAAAAAAAAAAAA becomes 20A Instead of 20-bytes… the storage is brought down to just 2-bytes (1-bytes for ’20’ and 1-byte for ‘A’) RLE compression ratio can be measure by the formula: (original size / compressed size) : 1 For the previous example… compression ratio is 22/11 : 1, which is 2:1

Text and Numeric Compression (cont) Run Length Encoding (RLE) RLE on repetitive data source Consider this: 1, 3, 4, 1, 3, 4, 1, 3, 4, 1, 3, 4 RLE  4(1,3,4) – translates to 4 occurrences of 1,3 and 4 RLE on differencing Consider this: 1,2,4,5,7,8,10 RLE can also take the differences between adjacent strings and encodes them. In this case, 1 and 2 = 1; 2 and 4 = 2; 4 and 5 = 1… and so on The respective compressed differences would be  1,2,1,2, 1,2. Further compression  3(1,2)

Text and Numeric Compression (cont) Run Length Encoding (RLE) RLE only good for long runs! As the previous examples show, only long runs of identical data are worth compressing If we don’t have longs runs of data… no compression might be achieved. In some cases, data expansion might happen!!! Consider this: “SAMSUNGNOKIASONYACER” = 20 bytes 3S 3A M U 3N G 2O K I Y C E R = ? bytes

Text and Numeric Compression (cont) 2) Huffman Codes Form of statistical encoding that exploits the overall distribution or frequency of symbols in a source Produces an optimal coding for a passage-based source on assigning the fewest number of bits to encode each symbol given the probability of its occurrence e.g. if a passage-based content has a lot of character “e” then it would make sense to replace it with the smallest sequence of bits possible. Other characters can use its normal representation refer the HUFFMAN tree

Text and Numeric Compression (cont) Huffman Codes This technique is based on the probabilistic distribution of symbols or characters Characters with the most number of occurrences are assigned the shortest length of code The code length increases as the frequency of occurrence decreases Huffman codes are determined by successively constructing a binary tree The leaves of the tree represent the characters to be coded

Text and Numeric Compression (cont) Huffman Codes Characters are arranged in descending order of probability The tree is further built further by repeatedly adding two lowest probabilities and resorting This process goes on until the sum of probabilities of the last two symbols is 1 Once this process is complete, a Huffman binary tree can be generated

Text and Numeric Compression (cont) Huffman Codes The resultant code words are then formed by tracing the tree path from the root node to the end-nodes code words after assigning 0s and 1s to the branches If we do not obtain a probability of 1 in the last two symbols, most likely there is a mistake in the process. This probability of 1 which forms the last symbol is the root of the binary tree

Text and Numeric Compression (cont) Huffman Codes (example) Let’s say you have this particular probabilistic distribution: A = 0.10; B = 0.35; C = 0.16; D = 0.2; E = 0.19 The characters are listed in order of decreasing probability B = 0.35; D = 0.2; E = 0.19; C = 0.16; A = 0.10 Two characters with the lowest probability are combined A = 0.10 and C = 0.16  AC = 0.26 Re-Sort… and the new list is: B = 0.35; AC = 0.26; D = 0.2; E = 0.19 Then repeat what was done in step 2 D = 0.2 and E = 0.19  DE = 0.39 Re-Sort the list again and we get: DE = 0.39; B = 0.35; AC = 0.26

Text and Numeric Compression (cont) Huffman Codes (example - continued) Again… get the lowest two probs. and repeat the process B = 0.35 and AC = 0.26  BAC = 0.61 Re-Sort… and you get the new list: BAC = 0.61; DE = 0.39 Finally, BAC and DE are combined… and you get BACDE = 1.0 From all the combinations of probabilistic values that you’ve done… a binary tree is constructed. Each edge from node to sub-node is assigned either a 1 or 0

Text and Numeric Compression (cont) Huffman Codes (resulting binary tree) P(C) = 0.16 P(A) = 0.10 P(AC) = 0.26 P(D) = 0.2 P(E) = 0.19 P(DE) = 0.39 P(B) = 0.35 P(BAC) = 0.61 P(BACDE) = 1.0 1 Huffman Code for each Character Character Probabilities Code words A 0.10 011 B 0.35 00 C 0.16 010 D 0.20 10 E 0.19 11

Text and Numeric Compression (cont) 3) LZW compression (Lempel-Ziv Welch) Based on recognizing common string patterns Basic strategy: replace strings in a file with bit codes rather than replacing individual characters with bit codes Greater compression rate than both previous methods

Text and Numeric Compression (cont) 3) LZW compression (Lempel-Ziv Welch)

Image Compression Image compression involves reducing the size of image data files, while retaining the necessary information.

Image Compression (cont’d) Besides Compression Ratio, another way to state the compression is to use the terminology of bits per pixel. For an NxN image: Example: Say that we have a 256x256 image which is compressed to 6,554 bytes.

Image Compression (cont’d) The importance of reduction of file size: T0 reduce the amount of storage needed. To reduce the bandwidth requirement when sending the file over the network.

Image Compression (cont’d) The amount of data required for digital image is enormous. A single 512x512, 8-bit image requires 2,097,152 bits (256 KB) for storage. A single 512x512 RGB color image requires 786 KB for storage. To transmit the RGB image using a 56.6 kbps modem would require 1.8 minutes.

Huffman Coding (Image example) The Huffman algorithm can be described in five steps: Find the gray-level probabilities for the image by finding the histogram. Order the input probabilities (histogram magnitudes) from smallest to largest. Combine the smallest two by addition. Repeat step 2, until two probabilities are left. By working backward the tree, generate the code by alternating assignment of 0 and 1.

Huffman Coding (Image example) We have an image with 2 bits/pixel, giving 4 possible gray levels. The image is 10 rows by 10 columns. The histogram of the image is given below: Number of pixels 50 1 2 3 10 20 30 40 Gray level

Huffman Coding (Image example) Step 1: Find the gray-level probabilities. g0 = 20/100 = 0.2 g1 = 30/100 = 0.3 g2 = 10/100 = 0.1 g3 = 40/100 = 0.4 Step 2: Order probabilities from smallest to largest g3  0.4 g1  0.3 g0  0.2 g2  0.1

Huffman Coding (Image example) Step 3: Combine the smallest two by addition. 0.4  0.4 0.3  0.3 0.2  0.3 0.1 Step 4: Reorder and add until two values remain. 0.4  0.4 0.6 0.3  0.3 0.4 0.2  0.3 0.1 + + +

Huffman Coding (Image example) Step 5: Generate the code. The final code is given in the following table. Original Gray Level (Natural Code) Probability Huffman Code g0: 002 0.2 0102 g1: 012 0.3 002 g2: 102 0.1 0112 g3: 112 0.4 12

Huffman Coding (Image example) Note that the gray-level with the highest probability is assigned the least number of bits.

Run-Length Coding (Image example) RLC is an image compression method that works by counting the number of adjacent pixels with the same gray-level value. This count, called the run length, is then coded and stored. There are many variations of RLC: Basic methods: used for binary images Extended versions: for gray-scale images

Run-Length Coding (Image example) There can be two types of RLC: Horizontal RLC: count along rows Vertical RLC: count along columns The number of bits used for the coding depends on the number of pixels in a row: If the row has 2n pixels, then the required number of bits is n. A 256x256 image requires 8 bits, since 28 = 256.

Run-Length Coding (Image example) The next step is to define a convention for the first RLC number in a row. Does it represent a run of 0’s or 1’s? Consider the following binary image:

Run-Length Coding (Image example) Apply RLC to this image, using: Horizontal RLC The first RLC number represents a run of 0’s The RLC numbers are: First row: 8 Fifth row: 1,3,2,1,1 Second row: 0,4,4 Sixth row: 2,1,2,2,1 Third row: 1,2,5 Seventh row: 0,4,1,1,2 Fourth row: 1,5,2 Eighth row: 8

Image Compression popular formats for compressing digital images: 1) GIF (Graphic Interchange Format) Compression LZW codec for lossless compression (8-bit images) look for repeated horizontal patterns along each scan line can handle multiple images TIFF (Tagged Image File Format) Compression based on LZW method widely used by a variety of applications and hardware platforms

Image Compression (cont) 3) PNG (Portable Network Graphic) Compression designed to be a replacement of GIF using lossless method for transmitting single bitmap images over computer networks cannot handle multiple images, but improves compression rates, and can handle true color 24-bit look for repeated horizontal and vertical patterns along each scan line

Image Compression (cont) 4) JPEG (Joint Photographic Experts Group) Compression general-purpose standard for still images - continuous-tone graphics user can choose the compression rates but image quality is sacrificed in proportion to the compression rate -> greater compression rates means poorer image quality advantage – its wide acceptance and support in a variety of applications

Audio Compression The choice of sampling rates (frequency and amplitude) are very important to handle the size of an audio size Higher sampling rates mean higher fidelity, and cost more in storage space and transmission time Widely used method is ADPCM (Adaptive Differential Pulse Code Modulation)

Video Compression Transmitting standard full screen color imagery as video at 30 fps requires a data rate nearly 28MB per second video compression is absolutely essential !!! One idea is to reduce the amount of data rate (from 30 fps to 15 fps), but it will sacrifice a lot of video motions

Video Compression (cont) Intraframe (spatial) compression: reduce the redundant information contained within a single image or frame it is not sufficient for achieving the kinds of data rates essential for transmitting video in practical applications

Video Compression (cont) Interframe (temporal) compression The idea is that much of the data in video images is repeated frame after frame This technique will eliminates the redundancy of information between frames Must identify the key frame (master frame) Key frame: the basis for deciding how much motion or how many changes take place in succeeding frames

Video Compression (cont) Interframe (temporal) compression assumes that the background remains (sky, road, and grass) but only the car is moving the first frame is stored as key frame and it has enough information to reconstruct it independently

Video Compression (cont) MPEG (Moving Picture Experts Group) Compression Prediction approach (predicted pictures = P pictures; intrapictures pictures = I pictures; bi-directional pictures = B pictures). Some compressed frames are the difference results of predictions based on past frames used as a reference, and others are based on both past and future frames from the sequence I = intra picture; B = bi-directional picture;

Video Compression (cont) Spatial vs temporal compression

Summary Compressing data means reducing the effective size of a data file for storage or transmission Particular paired compression/decompression methods are called codecs Codecs that cannot reproduce the original file exactly are called lossy methods; those that reproduce the original exactly are called lossless methods Text and numbers usually lossless methods Images, video and sound codecs are usually lossy