Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.

Slides:



Advertisements
Similar presentations
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Advertisements

Lecture #1 From 0-th order entropy compression To k-th order entropy compression.
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
CY2G2 Information Theory 1
15-583:Algorithms in the Real World
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
School of Computing Science Simon Fraser University
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
SWE 423: Multimedia Systems Chapter 7: Data Compression (3)
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
SWE 423: Multimedia Systems Chapter 7: Data Compression (2)
CS336: Intelligent Information Retrieval
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Advanced Algorithms Piyush Kumar (Lecture 10: Compression) Welcome to COT5405 Source: Guy E. Blelloch, Emad, Tseng …
Algorithms in the Real World
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
Source Coding-Compression
Dr.-Ing. Khaled Shawky Hassan
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Basics of Data Compression Paolo Ferragina Dipartimento di Informatica Università di Pisa.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression(2)
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
CMPT365 Multimedia Systems 1 Arithmetic Coding Additional Material Spring 2015 CMPT 365 Multimedia Systems.
ENTROPY & RUN LENGTH CODING. Contents What is Entropy coding? Huffman Encoding Huffman encoding Example Arithmetic coding Encoding Algorithms for arithmetic.
Information Retrieval
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Information theory Data compression perspective Pasi Fränti
CSE 589 Applied Algorithms Spring 1999
Data Coding Run Length Coding
CSI-447: Multimedia Systems
Algorithms in the Real World
Applied Algorithmics - week7
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding, Arithmetic Coding, and JBIG2
Arithmetic coding Let L be a set of items.
CSE 589 Applied Algorithms Spring 1999
Trees Addenda.
CMPT 365 Multimedia Systems
CSE 589 Applied Algorithms Spring 1999
Analysis of Algorithms CS 477/677
Presentation transcript:

Data Compression Arithmetic coding

Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation is not too bad.

Arithmetic Coding (message intervals) Assign each symbol to an interval range from 0 (inclusive) to 1 (exclusive). e.g. a =.2 c =.3 b =.5     f(a) =.0, f(b) =.2, f(c) =.7 The interval for a particular symbol will be called the symbol interval (e.g for b it is [.2,.7))

Arithmetic Coding: Encoding Example Coding the message sequence: bac The final sequence interval is [.27,.3) a =.2 c =.3 b = a =.2 c =.3 b = a =.2 c =.3 b =

Arithmetic Coding To code a sequence of symbols c with probabilities p[c] use the following: f[c] is the cumulative prob. up to symbol c (not included) Final interval size is The interval for a message sequence will be called the sequence interval

Uniquely defining an interval Important property: The intervals for distinct messages of length n will never overlap Therefore by specifying any number in the final interval uniquely determines the msg. Decoding is similar to encoding, but on each step need to determine what the message value is and then reduce interval

Arithmetic Coding: Decoding Example Decoding the number.49, knowing the message is of length 3: The message is bbc. a =.2 c =.3 b = a =.2 c =.3 b = a =.2 c =.3 b =

Representing a real number Binary fractional representation: So how about just using the shortest binary fractional representation in the sequence interval. e.g. [0,.33) =.01 [.33,.66) =.1 [.66,1) =.11 Algorithm 1.x = 2 *x 2.If x < 1 output 0 3. else x = x - 1; output 1

Representing a code interval Can view binary fractional numbers as intervals by considering all completions. We will call this the code interval.

Selecting the code interval To find a prefix code, find a binary fractional number whose code interval is contained in the sequence interval (dyadic number). Can use L + s/2 truncated to 1 +  log (1/s)  bits Sequence Interval Code Interval (.101)

Bound on Arithmetic length Note that –  log s  +1 =  log (2/s) 

Bound on Length Theorem: For a text of length n, the Arithmetic encoder generates at most 1 +  log (1/s)  = = 1 +  log ∏ (1/p i )  ≤ 2 + ∑ j=1,n log (1/p i ) = 2 + ∑ k=1,|  | np k log (1/p k ) = 2 + n H 0 bits nH n bits in practice because of rounding

Integer Arithmetic Coding Problem is that operations on arbitrary precision real numbers is expensive. Key Ideas of integer version: Keep integers in range [0..R) where R=2 k Use rounding to generate integer interval Whenever sequence intervals falls into top, bottom or middle half, expand the interval by a factor 2 Integer Arithmetic is an approximation

Integer Arithmetic (scaling) If l  R/2 then (top half) Output 1 followed by m 0s m = 0 Message interval is expanded by 2 If u < R/2 then (bottom half) Output 0 followed by m 1s m = 0 Message interval is expanded by 2 If l  R/4 and u < 3R/4 then (middle half) Increment m Message interval is expanded by 2 All other cases, just continue...

You find this at

ATB Arithmetic ToolBox (L,s) L L+s (p 1,....,p  ) c c L’ s’ ATB (L’,s’) As a state machine Therefore, even the distribution can change over time

K-th order models: PPM Use previous k characters as the context. Makes use of conditional probabilities This is the changing distribution Base probabilities on counts: e.g. if seen th 12 times followed by e 7 times, then the conditional probability p(e|th) = 7/12. Need to keep k small so that dictionary does not get too large (typically less than 8).

PPM: Partial Matching Problem: What do we do if we have not seen context followed by character before? Cannot code 0 probabilities! The key idea of PPM is to reduce context size if previous match has not been seen. If character has not been seen before with current context of size 3, send an escape-msg and then try context of size 2, and then again an escape-msg and context of size 1, …. Keep statistics for each context size < k The escape is a special character with some probability. Different variants of PPM use different heuristics for the probability.

ATB PPM + Arithmetic ToolBox (L,s) L L+s p[  |context ]  = c or esc  L’ s’ ATB (L’,s’) Encoder and Decoder must know the protocol for selecting the same conditional probability distribution (PPM-variant)

PPM: Example Contexts ContextCountsContextCountsContextCounts EmptyA = 4 B = 2 C = 5 $ = 3 ABCABC C = 3 $ = 1 A = 2 $ = 1 A = 1 B = 2 C = 2 $ = 3 AC BA CA CB CC B = 1 C = 2 $ = 2 C = 1 $ = 1 C = 1 $ = 1 A = 2 $ = 1 A = 1 B = 1 $ = 2 String = ACCBACCACBA B k = 2

You find this at: compression.ru/ds/