1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.

Slides:



Advertisements
Similar presentations
15-583:Algorithms in the Real World
Advertisements

Data Compression CS 147 Minh Nguyen.
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Greedy Algorithms (Huffman Coding)
CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
School of Computing Science Simon Fraser University
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
SWE 423: Multimedia Systems
Huffman Coding: An Application of Binary Trees and Priority Queues
A Data Compression Algorithm: Huffman Compression
Data Compression Basics
Document and Query Forms Chapter 2. 2 Document & Query Forms Q 1. What is a document? A document is a stored data record in any form A document is a stored.
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
Source Coding-Compression
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Lossless Compression(2)
Multi-media Data compression
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Images. Audio. Cryptography - Steganography MultiMedia Compression } Movies.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
3.3 Fundamentals of data representation
HUFFMAN CODES.
Data Coding Run Length Coding
Data Compression.
Huffman Coding, Arithmetic Coding, and JBIG2
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Presentation transcript:

1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding r Lesson 2: m Adaptive Coding based on Statistical Modeling: Adaptive Huffman Arithmetic coding r Lesson 3: m Dictionary-based Coding LZW

2 Lossless Compression Multimedia Systems (Module 2 Lesson 1) Summary: r Compression m With loss m Without loss r Shannon: Information Theory r Shannon-Fano Coding Algorithm r Huffman Coding Algorithm Sources: r The Data Compression Book, 2 nd Ed., Mark Nelson and Jean-Loup Gailly. r Dr. Ze-Nian Li’s course materialcourse material

3 Compression Why Compression? All media, be it text, audio, graphics or video has “redundancy”. Compression attempts to eliminate this redundancy. What is Redundancy? m If one representation of a media content, M, takes X bytes and another takes Y bytes(Y< X), then X is a redundant representation relative to Y. m Other forms of Redundancy If the representation of a media captures content that is not perceivable by humans, then removing such content will not affect the quality of the content. For example, capturing audio frequencies outside the human hearing range can be avoided without any harm to the audio’s quality. Is there a representation with an optimal size Z that cannot be improved upon? This question is tackled by information theory.

4 Compression Lossless CompressionCompression with loss M m Lossless Compress M Uncompress M m Compress with loss M’ Uncompress M’  M

5 Information Theory According to Shannon, the of an information source S is defined as: H(S) = Σ i ( p i log 2 (1/p i ) ) m log 2 (1/p i ) indicates the amount of information contained in symbol S i, i.e., the number of bits needed to code symbol S i. m For example, in an image with uniform distribution of gray- level intensity, i.e. p i = 1/256, with the number of bits needed to code each gray level being 8 bits. The entropy of the image is 8. m Q: What is the entropy of a source with M symbols where each symbol is equally likely? Entropy, H(S) = log 2 M m Q: How about an image in which half of the pixels are white and half are black? Entropy, H(S) = Here is an excellent primer by Dr. Schnieder on this subjectprimer

6 Information Theory Discussion : m Entropy is a measure of how much information is encoded in a message. Higher the entropy, higher the information content. We could also say entropy is a measure of uncertainty in a message. Information and Uncertainty are equivalent concepts. m The units (in coding theory) of entropy are bits per symbol. It is determined by the base of the logarithm: 2: binary (bit); 10: decimal (digit). m Entropy gives the actual number of bits of information contained in a message source. Example: If the probability of the character ‘e’ appearing in this slide is 1/16, then the information content of this character is 4 bits. So, the character string “eeeee” has a total content of 20 bits (contrast this to using an 8-bit ASCII coding that could result in 40 bits to represent “eeeee”).

7 Data Compression = Modeling + Coding Data Compression consists of taking a stream of symbols and transforming them into codes. m The model is a collection of data and rules used to process input symbols and determine their probabilities. m A coder uses a model (probabilities) to spit out codes when its given input symbols Let’s take Huffman coding to demonstrate the distinction: Input Stream Model Encoder Output Stream Symbols Probabilities Codes r The output of the Huffman encoder is determined by the Model (probabilities). Higher the probability shorter the code. r Model A could determine raw probabilities of each symbol occurring anywhere in the input stream. ( p i = # of occurrences of S i / Total number of Symbols ) r Model B could determine prob. based on the last 10 symbols in the i/p stream. (continuously re-computes the probabilities)

8 The Shannon-Fano Encoding Algorithm 1. Calculate the frequencies of the list of symbols (organize as a list). 2. Sort the list in (decreasing) order of frequencies. 3. Divide list into two halfs, with the total freq. Counts of each half being as close as possible to the other. 4. The upper half is assigned a code of 0 and lower a code of Recursively apply steps 3 and 4 to each of the halves, until each symbol has become a corresponding code leaf on a tree. Symbol Count B A DC E Example SymbolCountInfo. -log 2 (p i ) CodeSubtotal # of Bits A B C D E It takes a total of 89 bits to encode bits of information (Pretty good huh!) x x x x x

9 The Huffman Algorithm 1. Initialization: Put all nodes in an OPEN list L, keep it sorted at all times (e.g., ABCDE). 2. Repeat the following steps until the list L has only one node left: 1. From L pick two nodes having the lowest frequencies, create a parent node of them. 2. Assign the sum of the children's frequencies to the parent node and insert it into L. 3. Assign code 0, 1 to the two branches of the tree, and delete the children from L. Symbol B A DC E Count Example SymbolCountInfo. -log 2 (p i ) CodeSubtotal # of Bits A B C D E x x x x x

10 Huffman Alg.: Discussion Decoding for the above two algorithms is trivial as long as the coding table (the statistics) is sent before the data. There is an overhead for sending this, negligible if the data file is big. Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) --> great for decoder, unambiguous; unique Decipherability? If prior statistics are available and accurate, then Huffman coding is very good. Number of bits (per symbol) needed for Huffman Coding is: 87 / 39 = 2.23 Number of bits (per symbol)needed for Shannon-Fano Coding is: 89 / 39 = 2.28