Data Compression Meeting October 25, 2002 Arithmetic Coding.

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

CY2G2 Information Theory 1
15-583:Algorithms in the Real World
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Equivalence, Order, and Inductive Proof
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
CS 3240 – Chapter 6.  6.1: Simplifying Grammars  Substitution  Removing useless variables  Removing λ  Removing unit productions  6.2: Normal Forms.
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
SWE 423: Multimedia Systems Chapter 7: Data Compression (3)
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
SWE 423: Multimedia Systems Chapter 7: Data Compression (2)
CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Winter 2004 Lecture 4.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Information and Coding Theory
ENGS Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
Algorithms in the Real World
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Lecture Two: Formal Languages Formal Languages, Lecture 2, slide 1 Amjad Ali.
Huffman Coding Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
L ECTURE 3 Chapter 4 Regular Expressions. I MPORTANT T ERMS Regular Expressions Regular Languages Finite Representations.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Introduction to Theory of Automata By: Wasim Ahmad Khan.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Adaptive Huffman Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Why Adaptive Huffman Coding? Huffman coding suffers.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
1 Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also.
Bahareh Sarrafzadeh 6111 Fall 2009
compress! From theoretical viewpoint...
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
CMPT365 Multimedia Systems 1 Arithmetic Coding Additional Material Spring 2015 CMPT 365 Multimedia Systems.
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
Lecture # 4.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Lecture 2 Theory of AUTOMATA
Bounds on Redundancy in Constrained Delay Arithmetic Coding Ofer ShayevitzEado Meron Meir Feder Ram Zamir Tel Aviv University.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
1 Strings and Languages Lecture 2-3 Ref. Handout p12-17.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Information theory Data compression perspective Pasi Fränti
CSE 589 Applied Algorithms Spring 1999
Introduction to Lossless Compression
Lecture # 2.
Regular Languages, Regular Operations, Closure
Applied Algorithmics - week7
Huffman Coding, Arithmetic Coding, and JBIG2
UNIT IV.
CSE 589 Applied Algorithms Spring 1999
CSE 589 Applied Algorithms Spring 1999
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

Data Compression Meeting October 25, 2002 Arithmetic Coding

2 Outline What is Arithmetic Coding? –Loseless compression –Based on probabilities of symbols appearing Representing Real Numbers Basic Arithmetic Coding Context Adaptive Coding Comparison with Huffman Coding

3 Real Numbers How can we represent a real number? In decimal notation, any real number x in the interval [0,1) can be represented as.b 1 b 2 b 3... where 0 < b i < 9. For example, There’s nothing special about base-10, though. We can do this in any base. In particular … Base-2

4 Reals in Binary Any real number x in the interval [0,1) can be represented in binary as.b 1 b 2... where b i is a bit. 0 1 x binary representation

5 First Conversion L := 0; R :=1; i := 1 while x > L * if x < (L+R)/2 then b i := 0 ; R := (L+R)/2; if x > (L+R)/2 then b i := 1 ; L := (L+R)/2; i := i + 1 end{while} b j := 0 for all j > i * Invariant: x is always in the interval [L,R)

6 Conversion using Scaling Always scale the interval to unit size, but x must be changed as part of the scaling. 0 1 x x := 2xx := 2x-1

7 Binary Conversion with Scaling y := x; i := 0 while y > 0 * i := i + 1; if y < 1/2 then b i := 0; y := 2y; if y > 1/2 then b i := 1; y := 2y – 1; end{while} b j := 0 for all j > i + 1 * Invariant: x =.b 1 b 2... b i + y/2 i

8 Proof of the Invariant Initially x = 0 + y/2 0 Assume x =.b 1 b 2... b i + y/2 i –Case 1. y < 1/2. b i+1 = 0 and y’ = 2y.b 1 b 2... b i b i+1 + y’/2 i+1 =.b 1 b 2... b i 0+ 2y/2 i+1 =.b 1 b 2... b i + y/2 i = x –Case 2. y > 1/2. b i+1 = 1 and y’ = 2y – 1.b 1 b 2... b i b i+1 + y’/2 i+1 =.b 1 b 2... b i 1+ (2y-1)/2 i+1 =.b 1 b 2... b i +1/2 i+1 + 2y/2 i+1 -1/2 i+1 =.b 1 b 2... b i + y/2 i = x

9 x = 1/3 y i b 1/ / / / x = 17/27 y i b 17/ / / / … …... Example

10 Arithmetic Coding Basic idea in arithmetic coding: –represent each string x of length n by a unique interval [L,R) in [0,1). –The width r-l of the interval [L,R) represents the probability of x occurring. –The interval [L,R) can itself be represented by any number, called a tag, within the half open interval. –Find some k such that the k most significant bits of the tag are in the the interval [L,R). That is,.t 1 t 2 t 3...t k is in the interval [L,R). –Then t 1 t 2 t 3...t k is the code for x.

11 Example of Arithmetic Coding (1) a b bb 0 1 bba 15/27 19/ tag = 17/27 = code = tag must be in the half open interval. 2. tag can be chosen to be (L+R)/2. 3. code is the significant bits of the tag. 1/3 2/3

12 Some Tags are Better than Others a b ba 0 1 bab 11/27 15/ /3 2/3 Using tag = (L+R)/2 tag = 13/27 = code = 0111 Alternative tag = 14/37 = code = 1

13 Example of Codes P(a) = 1/3, P(b) = 2/3. a b aa ab ba bb aaa aab aba abb baa bab bba bbb 0 1 0/27 1/27 3/27 9/27 5/27 11/27 15/27 19/27 27/ aaa aab aba abb baa tag = (L+R)/2 code bab bba bbb.95 bits/symbol.92 entropy lower bound

14 Code Generation from Tag If binary tag is.t 1 t 2 t 3... = (L+R)/2 in [L,R) then we want to choose k to form the code t 1 t 2...t k. Short code: –choose k to be as small as possible so that L <.t 1 t 2...t k < R. Guaranteed code: –Choose k = ceiling(log 2 (1/(R-L))) + 1 –L <.t 1 t 2...t k b 1 b 2 b 3... < R for any bits b 1 b 2 b 3... –for fixed length strings provides a good prefix code. –example: [ , ), tag = Short code: 0 Guaranteed code:

15 Guaranteed Code Example P(a) = 1/3, P(b) = 2/3. a b aa ab ba bb aaa aab aba abb baa bab bba bbb 0 1 0/27 1/27 3/27 9/27 5/27 11/27 15/27 19/27 27/ aaa aab aba abb baa tag = (L+R)/ bab bba bbb short code Prefix code

16 Arithmetic Coding Algorithm P(a 1 ), P(a 2 ), …, P(a m ) C(a i ) = P(a 1 ) + P(a 2 ) + … + P(a i-1 ) Encode x 1 x 2...x n Initialize L := 0 and R:= 1; for i = 1 to n do W := R - L; L := L + W * C(x i ); R := L + W * P(x i ); t := (L+R)/2; choose code for the tag

17 Arithmetic Coding Example P(a) = 1/4, P(b) = 1/2, P(c) = 1/4 C(a) = 0, C(b) = 1/4, C(c) = 3/4 abca symbol W L R 0 1 a 1 0 1/4 b 1/4 1/16 3/16 c 1/8 5/32 6/32 a 1/32 5/32 21/128 tag = (5/ /128)/2 = 41/256 = L = R = code = prefix code = W := R - L; L := L + W C(x); R := L + W P(x)

18 Decoding (1) Assume the length is known to be which converts to the tag a b output a

19 Decoding (2) Assume the length is known to be which converts to the tag a b 0 1 aa ab output a

20 Decoding (3) Assume the length is known to be which converts to the tag a b 0 1 aa ab aab output b

21 Arithmetic Decoding Algorithm P(a 1 ), P(a 2 ), …, P(a m ) C(a i ) = P(a 1 ) + P(a 2 ) + … + P(a i-1 ) Decode b 1 b 2...b m, number of symbols is n. Initialize L := 0 and R := 1; t :=.b 1 b 2...b m for i = 1 to n do W := R - L; find j such that L + W * C(a j ) < t < L + W * (C(a j )+P(a j )) output a j ; L := L + W * C(a j ); R := L + W * P(a j );

22 Decoding Example P(a) = 1/4, P(b) = 1/2, P(c) = 1/4 C(a) = 0, C(b) = 1/4, C(c) = 3/ tag = = 5/32 W L R output /4 a 1/4 1/16 3/16 b 1/8 5/32 6/32 c 1/32 5/32 21/128 a

23 Decoding Issues There are two ways for the decoder to know when to stop decoding. 1.Transmit the length of the string 2.Transmit a unique end of string symbol

24 Practical Arithmetic Coding Scaling: –By scaling we can keep L and R in a reasonable range of values so that W = R - L does not underflow. –The code can be produced progressively, not at the end. –Complicates decoding some. Integer arithmetic coding avoids floating point altogether.

25 Context Consider 1 symbol context. Example: 3 contexts. prev next a b c a b c

26 Example with Context acc a b c a b c prev next a Equally Likely model ac 1/3 a modelc model / /5 1/3 = acc 4/15 = Can choose 0101 as code

27 Arithmetic Coding with Context Maintain the probabilities for each context. For the first symbol use the equal probability model For each successive symbol use the model for the previous symbol.

28 Adaptation Simple solution – Equally Probable Model. –Initially all symbols have frequency 1. –After symbol x is coded, increment its frequency by 1 –Use the new model for coding the next symbol Example in alphabet a,b,c,d a a b a a c a b c d After aabaac is encoded The probability model is a 5/10 b 2/10 c 2/10 d 1/10

29 Zero Frequency Problem How do we weight symbols that have not occurred yet. –Equal weights? Not so good with many symbols –Escape symbol, but what should its weight be? –When a new symbol is encountered send the, followed by the symbol in the equally probable model. (Both encoded arithmetically.) a a b a a c a b c d After aabaac is encoded The probability model is a 4/7 b 1/7 c 1/7 d 0 1/7

30 Arithmetic vs. Huffman Both compress very well. For m symbol grouping. –Huffman is within 1/m of entropy. –Arithmetic is within 2/m of entropy. Context –Huffman needs a tree for every context. –Arithmetic needs a small table of frequencies for every context. Adaptation –Huffman has an elaborate adaptive algorithm –Arithmetic has a simple adaptive mechanism. Bottom Line – Arithmetic is more flexible than Huffman.

31 Acknowledgements Thanks to Richard Ladner. Most of these slides were taken directly or modified slightly from slides for lectures 5 and 6 of his Winter 2002 CSE 490gz Data Compression class.