1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
School of Computing Science Simon Fraser University
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Spatial and Temporal Data Mining
A Data Compression Algorithm: Huffman Compression
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Adaptive Huffman Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Why Adaptive Huffman Coding? Huffman coding suffers.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
UNIT II TEXT COMPRESSION. a. Outline Compression techniques Run length coding Huffman coding Adaptive Huffman Coding Arithmetic coding Shannon-Fano coding.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Foundation of Computing Systems
Lossless Compression(2)
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Data Coding Run Length Coding
Assignment 6: Huffman Code Generation
ISNE101 – Introduction to Information Systems and Network Engineering
Context-based Data Compression
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Chapter 11 Data Compression
Greedy: Huffman Codes Yin Tat Lee
Trees Addenda.
Greedy Algorithms Alexandra Stefan.
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm  Arithmetic Coding Coding and Decoding Issues: EOF problem, Zero frequency problem Sources:  The Data Compression Book, 2 nd Ed., Mark Nelson and Jean-Loup Gailly.  Introduction to Data Compression, by Sayood Khalid  The Sqeeze Page at SFU

2 Adaptive Coding Motivations:  The previous algorithms (both Shannon-Fano and Huffman) require the statistical knowledge which is often not available (e.g., live audio, video).  Even when it is available, it could be a heavy overhead.  Higher-order models incur more overhead. For example, a 255 entry probability table would be required for a 0-order model. An order-1 model would require 255 such probability tables. (A order-1 model will consider probabilities of occurrences of 2 symbols) The solution is to use adaptive algorithms. Adaptive Huffman Coding is one such mechanism that we will study. The idea of “adaptiveness” is however applicable to other adaptive compression algorithms.

3 Adaptive Coding ENCODER Initialize_model(); do { c = getc( input ); encode( c, output ); update_model( c ); } while ( c != eof) DECODER Initialize_model(); while ( c = decode (input)) != eof) { putc( c,output) update_model( c ); } r The key is that, both encoder and decoder use exactly the same initialize_model and update_model routines.

4 The Sibling Property The node numbers will be assigned in such a way that: 1. A node with a higher weight will have a higher node number 2. A parent node will always have a higher node number than its children. In a nutshell, the sibling property requires that the nodes (internal and leaf) are arranged in order of increasing weights. The update procedure swaps nodes in violation of the sibling property.  The identification of nodes in violation of the sibling property is achieved by using the notion of a block.  All nodes that have the same weight are said to belong to one block

5 Flowchart of the update procedure START First appearance of symbol Go to symbol external node Node number max in block? Increment node weight Switch node with highest numbered node in block Is this the root node? Go to parent node STOP NYT gives birth To new NYT and external node Increment weight of external node and old NYT node; Adjust node numbers Go to old NYT node Yes No Yes No  The Huffman tree is initialized with a single node, known as the Not-Yet- Transmitted (NYT) or escape code. This code will be sent every time that a new character, which is not in the tree, is encountered, followed by the ASCII encoding of the character. This allows for the de-compressor to distinguish between a code and a new character. Also, the procedure creates a new node for the character and a new NYT from the old NYT node.  The root node will have the highest node number because it has the highest weight.

6 Example B W=2 #1 C W=2 #2 D W=2 #3 W=2 #4 W=4 #5 W=6 #6 E W=10 #7 Root W=16 #8 Counts: (number of occurrences) B:2 C:2 D:2 E:10 Example Huffman tree after some symbols have been processed in accordance with the sibling property NYT #0 Initial Huffman Tree #0

7 Example W=1 #2 B W=2 #3 C W=2 #4 D W=2 #5 W=2+1 #6 W=4 #7 W=6+1 #8 E W=10 #9 Root W=16+1 #10 Counts: (number of occurrences) A:1 B:2 C:2 D:2 E:10 A Huffman tree after first appearance of symbol A A W=1 #1 NYT #0

8 Increment B W=2 #3 C W=2 #4 D W=2 #5 W=3+1 #6 W=4 #7 W=7+1 #8 E W=10 #9 Root W=17+1 #10 Counts: A:1+1 B:2 C:2 D:2 E:10 An increment in the count for A propagates up to the root W=1+1 #2 A W=1+1 #1 NYT #0

9 Swapping B W=2 #3 C W=2 #4 D W=2 #5 W=4 #6 W=4 #7 W=8 #8 E W=10 #9 Root W=18 #10 Counts: A:2+1 B:2 C:2 D:2 E:10 B W=2 #3 C W=2 #4 A W=2+1 #5 W=4 #6 W=4+1 #7 W=8+1 #8 E W=10 #9 Root W=18+1 #10 Counts: A:3 B:2 C:2 D:2 E:10 Swap nodes 1 and 5 Another increment in the count for A results in swap W=2 #2 A W=2 #1 NYT W=2 #2 D W=2 #1 NYT #0

10 Swapping … contd. B W=2 #3 C W=2 #4 A W=3+1 #5 W=4 #6 W=5+1 #7 W=9+1 #8 E W=10 #9 Root W=19+1 #10 Counts: A:3+1 B:2 C:2 D:2 E:10 Another increment in the count for A propagates up W=2 #2 D W=2 #1 NYT #0

11 Swapping … contd. B W=2 #3 C W=2 #4 A W=4 #5 W=4 #6 W=6 #7 W=10 #8 E W=10 #9 Root W=20 #10 Counts: A:4+1 B:2 C:2 D:2 E:10 Swap nodes 5 and 6 Another increment in the count for A causes swap of sub-tree W=2 #2 D W=2 #1 NYT #0

12 Swapping … contd. C W=2 #4 W=6 #7 W=10 #8 E W=10 #9 Root W=20 #10 Counts: A:4+1 B:2 C:2 D:2 E:10 B W=2 #3 W=4 #5 A W=4+1 #6 Swap nodes 8 and 9 Further swapping needed to fix the tree W=2 #2 D W=2 #1 NYT #0

13 Swapping … contd. C W=2 #4 W=6 #7 W=10+1 #9 Root W=20+1 #10 Counts: A:5 B:2 C:2 D:2 E:10 B W=2 #3 W=4 #5 A W=5 #6 E W=10 #8 W=2 #2 D W=2 #1 NYT #0

14 Arithmetic Coding Arithmetic coding is based on the concept of interval subdividing.  In arithmetic coding a source ensemble is represented by an interval between 0 and 1 on the real number line.  Each symbol of the ensemble narrows this interval.  As the interval becomes smaller, the number of bits needed to specify it grows.  Arithmetic coding assumes an explicit probabilistic model of the source.  It uses the probabilities of the source messages to successively narrow the interval used to represent the ensemble. A high probability message narrows the interval less than a low probability message, so that high probability messages contribute fewer bits to the coded ensemble

15 Arithmetic Coding: Description  In the following discussions, we will use M as the size of the alphabet of the data source,  N[x] as symbol x's probability,  Q[x] as symbol x's cumulative probability (i.e., Q[i]=N[0]+N[1]+...+N[i])  Assume we know the probabilities of each symbol of the data source,  we can allocate to each symbol an interval with width proportional to its probability, and each of the intervals does not overlap with others.  This can be done if we use the cumulative probabilities as the two ends of each interval. Therefore, the two ends of each symbol x amount to Q[x-1] and Q[x].  Symbol x is said to own the range [Q[x-1], Q[x]).

16 Arithmetic Coding: Encoder We begin with the interval [0,1) and subdivide the interval iteratively.  For each symbol entered, the current interval is divided according to the probabilities of the alphabet.  The interval corresponding to the symbol is picked as the interval to be further proceeded with.  The procedure continues until all symbols in the message have been processed.  Since each symbol's interval does not overlap with others, for each possible message there is a unique interval assigned.  We can represent the message with the interval's two ends [L,H). In fact, taking any single value in the interval as the encoded code is enough, and usually the left end L is selected.

17 Arithmetic Coding Algorithm L = 0.0; H = 1.0; While ( (x = getc(input)) != EOF ) { R = (H-L); H = L + R * Q[x]; L = L + R * Q[x-1]; } Output(L); R is the interval range, and H and L are two ends of the current code interval. x is the new symbol to be encoded. H and L are initialized to 0 and 1 respectively

18 Arithmetic Coding: Encoder example Symbol, xProbability, N[x][Q[x-1], Q[x]) A0.40.0, 0.4 B0.30.4, 0.7 C0.20.7, 0.9 D0.10.9, B C A B String: BCAB Code sent:

19 Decoding Algorithm  When decoding the code v is placed on the current code interval to find the symbol x so that Q[x-1] <= code < Q[x]. The procedure iterates until all symbols are decoded. v = input_code(); for (;;) { x = find_symbol_straddling_this_range(v); putc(x); R = Q[x] – Q[x-1]; v = (v – Q[x-1])/R; } vOutput Char x Q[x-1]Q[x]R B C A B

20 Arithmetic Coding: Issues  The zero-frequency problem: Each symbol's predicted probability must not be zero or the interval will become zero and interval renormalization would fail. This is called the zero-frequency problem. Models that adapt online may encounter such problem when decaying.  The EOF problem:  Assume we pick the lower end of the interval as the encoded code. Two messages may yield the same code if one message is identical to the other, except for a sequence of finite number of the first symbol(first in table, not in the sequence) as a suffix. For e.g., Both BCAB, BCABA, BCABAA, BCABAAA will have the same lower interval but different upper intervals. (try it)  The simplest solution is to let the decoder know the length of the encoded message. The decoder will know if the message size is fixed or can be transmitted at first. However this is not plausible if the data size is not known beforehand, such as live broadcasting data; or it's too costly to do so, such as tapes whose size is unknown at the beginning.  There is another solution if we introduce a special EOF symbol to the alphabet. The symbol takes a small interval and is used only at the end of the message. When the decoder detects the EOF symbol it knows the end of the message is reached.