Multi-media Data compression

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

Data Compression CS 147 Minh Nguyen.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compression Michael J. Watts
Compression & Huffman Codes
School of Computing Science Simon Fraser University
Introduction to Data Compression
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
SWE 423: Multimedia Systems
Department of Computer Engineering University of California at Santa Cruz Data Compression (1) Hai Tao.
Spatial and Temporal Data Mining
A Data Compression Algorithm: Huffman Compression
T.Sharon-A.Frank 1 Multimedia Size of Data Frame.
Data Compression Basics
Klara Nahrstedt Spring 2014
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Compression - I Hao Jiang Computer Science Department Sept. 13, 2007.
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
Data Compression Basics & Huffman Coding
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
MULTIMEDIA TECHNOLOGY SMM 3001 DATA COMPRESSION. In this chapter The basic principles for compressing data The basic principles for compressing data Data.
Source Coding-Compression
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Prof. Amr Goneid Department of Computer Science & Engineering
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2010.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Chapter 17 Image Compression 17.1 Introduction Redundant and irrelevant information  “Your wife, Helen, will meet you at Logan Airport in Boston.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Lecture 12 Huffman Coding King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Dr. Hadi AL Saadi Image Compression. Goal of Image Compression Digital images require huge amounts of space for storage and large bandwidths for transmission.
Computer Sciences Department1. 2 Data Compression and techniques.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
Digital Image Processing Lecture 20: Image Compression May 16, 2005
IMAGE COMPRESSION.
Data Compression.
Image Compression The still image and motion images can be compressed by lossless coding or lossy coding. Principle of compression: - reduce the redundant.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
UNIT IV.
CSE 589 Applied Algorithms Spring 1999
Chapter 8 – Compression Aims: Outline the objectives of compression.
Presentation transcript:

Multi-media Data compression

Data compression Definition: Data compression is the process of encoding information using fewer number of bits so that it takes less memory area (storage) or bandwidth during transmission. Two types of compression: Lossy data compression Lossless data compression

Type of compression Lossless Data Compression: in lossless data compression, the original content of the data is not lost/changed when it is compressed (encoded). Examples: RLE (Run Length Encoding) Dictionary Based Coding Arithmetic Coding

Type .. Lossy data compression: the original content of the data is lost to certain degree when compressed. Part of the data that is not much important is discarded/lost. The loss factor determines whether there is a loss of quality between the original image and the image after it has been compressed and played back (decompressed). The more compression, the more likely that quality will be affected. Even if the quality difference is not noticeable, these are considered lossy compression methods. Examples JPEG (Joint Photographic Experts Group) MPEG (Moving Pictures Expert Group)

Why compression Need for Compression With more colors, higher resolution, and faster frame rates, you produce better quality video, but you need more computer power and more storage space for your video. Doing some simple calculations (see below) it can be shown that with 24-bit color video, with 640 by 480 resolutions, at 30 fps, requires an astonishing 26 megabytes of data per second! Not only does this surpass the capabilities of the many home computer systems, but also overburdens existing storage systems

Why … 640 horizontal resolution480 vertical resolution 307, 200 total pixels per frame *3 bytes per pixe l921, 600 total bytes per frame * 30 frames per second =27, 648, 000 total bytes per second/1, 048 576 ()to convert to megabytes 26.36 megabytes per second!

Classification algorithms Entropy Coding (statistical) lossless; independent of data characteristics e.g. RLE, Huffman, LZW, Arithmetic coding Source Coding lossy; may consider semantics of the data depends on characteristics of the data e.g. DCT, DPCM, ADPCM, color model transform Hybrid Coding (used by most multimedia systems) combine entropy with source encoding e.g., JPEG-2000, H.264, MPEG-2, MPEG-4, MPEG-7

Data Compression Branch of information theory minimize amount of information to be transmitted Transform a sequence of characters into a new string of bits same information content length as short as possible ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 8

Concepts Coding (the code) maps source messages from alphabet (A) into code words (B) Source message (symbol) is basic unit into which a string is partitioned can be a single letter or a string of letters EXAMPLE: aa bbb cccc ddddd eeeeee fffffffgggggggg A = {a, b, c, d, e, f, g, space} B = {0, 1} ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 9

Taxonomy of Codes Block-block Block-variable Variable-block source msgs and code words of fixed length; e.g., ASCII Block-variable source message fixed, code words variable; e.g., Huffman coding Variable-block source variable, code word fixed; e.g., RLE Variable-variable source variable, code words variable; e.g., Arithmetic ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 10

Example of Block-Block Coding “aa bbb cccc ddddd eeeeee fffffffgggggggg” Requires 120 bits Symbol Code word a 000 b 001 c 010 d 011 e 100 f 101 g 110 space 111 ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 11

Example of Variable-Variable Coding “aa bbb cccc ddddd eeeeee fffffffgggggggg” Requires 30 bits don’t forget the spaces Symbol Code word aa bbb 1 cccc 10 ddddd 11 eeeeee 100 fffffff 101 gggggggg 110 space 111 ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 12

Concepts (cont.) A code is distinct if each code word can be distinguished from every other (mapping is one-to-one) uniquely decodable if every code word is identifiable when immersed in a sequence of code words e.g., with previous table, message 11 could be defined as either ddddd or bbbbbb ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 13

Static Codes Mapping is fixed before transmission message represented by same codeword every time it appears in message (ensemble) Huffman coding is an example Better for independent sequences probabilities of symbol occurrences must be known in advance; ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 14

Dynamic Codes Mapping changes over time also referred to as adaptive coding Attempts to exploit locality of reference periodic, frequent occurrences of messages dynamic Huffman is an example Hybrids? build set of codes, select based on input ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 15

Traditional Evaluation Criteria Algorithm complexity running time Amount of compression redundancy compression ratio How to measure? ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 16

Measure of Information Consider symbols si and the probability of occurrence of each symbol p(si) In case of fixed-length coding , smallest number of bits per symbol needed is L ≥ log2(N) bits per symbol N=no of symbols Example: Message with 5 symbols need 3 bits (L ≥ log25) ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 17

Variable-Length Coding- Entropy What is the minimum number of bits per symbol? Answer: Shannon’s result – theoretical minimum average number of bits per code word is known as Entropy (H) ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 18

Entropy Example Alphabet = {A, B} Compute Entropy (H) p(A) = 0.4; p(B) = 0.6 Compute Entropy (H) -0.4*log2 0.4 + -0.6*log2 0.6 = .97 bits ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 19

Compression Ratio Compression ratio: 5/2.14 = 2.336 Compare the average message length and the average codeword length e.g., average L(message) / average L(codeword) Example: {aa, bbb, cccc, ddddd, eeeeee, fffffff, gggggggg} Average message length is 5 If we use code-words from slide 11, then We have {0,1,10,11,100,101,110} Average codeword length is 2.14.. Bits Compression ratio: 5/2.14 = 2.336 ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 20

Symmetry Symmetric compression Asymmetric compression requires same time for encoding and decoding used for live mode applications (teleconference) Asymmetric compression performed once when enough time is available decompression performed frequently, must be fast used for retrieval mode applications (e.g., an interactive CD-ROM) ORCHID Research Group Department of Computer Science, University of Illinois at Urbana-Champaign 21

Entropy Coding Algorithms (Content Dependent Coding) Run-length Encoding (RLE) Replaces sequence of the same consecutive bytes with number of occurrences Number of occurrences is indicated by a special flag (e.g., !) Example: abcccccccccdeffffggg (20 Bytes) abc!9def!4ggg (13 bytes) 22

Variations of RLE (Zero-suppression technique) Assumes that only one symbol appears often (blank) Replace blank sequence by M-byte and a byte with number of blanks in sequence Example: M3, M4, M14,… Some other definitions are possible Example: M4 = 8 blanks, M5 = 16 blanks, M4M5=24 blanks 23

Huffman Encoding Statistical encoding To determine Huffman code, it is useful to construct a binary tree Leaves are characters to be encoded Nodes carry occurrence probabilities of the characters belonging to the subtree Example: How does a Huffman code look like for symbols with statistical symbol occurrence probabilities: P(A) = 8/20, P(B) = 3/20, P(C ) = 7/20, P(D) = 2/20? Depends on occurrence frequency of single characters or sequences of data bytes Characters are stored with their probabilities Length (number of bits) of the coded characters differs The shortest code is assigned to the most frequently occurred character 24

Huffman Encoding (Example) Step 1 : Sort all Symbols according to their probabilities (left to right) from Smallest to largest these are the leaves of the Huffman tree P(B) = 0.51 P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16 25

Huffman Encoding (Example) Step 2: Build a binary tree from left to Right Policy: always connect two smaller nodes together (e.g., P(CE) and P(DA) had both Probabilities that were smaller than P(B), Hence those two did connect first P(CEDAB) = 1 P(B) = 0.51 P(CEDA) = 0.49 P(CE) = 0.20 P(DA) = 0.29 P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16 26

Huffman Encoding (Example) Step 3: label left branches of the tree With 0 and right branches of the tree With 1 P(CEDAB) = 1 1 P(B) = 0.51 P(CEDA) = 0.49 1 P(CE) = 0.20 P(DA) = 0.29 1 1 P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16 27

Huffman Encoding (Example) Step 4: Create Huffman Code Symbol A = 011 Symbol B = 1 Symbol C = 000 Symbol D = 010 Symbol E = 001 P(CEDAB) = 1 1 P(B) = 0.51 P(CEDA) = 0.49 1 P(CE) = 0.20 P(DA) = 0.29 1 1 P(C) = 0.09 P(E) = 0.11 P(D) = 0.13 P(A)=0.16 28

Summary Compression algorithms are of great importance when processing and transmitting Audio Images Video 29