1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
School of Computing Science Simon Fraser University
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
SWE 423: Multimedia Systems Chapter 7: Data Compression (3)
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
SWE 423: Multimedia Systems Chapter 7: Data Compression (2)
A Data Compression Algorithm: Huffman Compression
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Number Systems Lecture 02.
Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
Source Coding-Compression
Dr.-Ing. Khaled Shawky Hassan
Basics of Data Compression Paolo Ferragina Dipartimento di Informatica Università di Pisa.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
CMPT 365 Multimedia Systems
9.4 FLOATING-POINT REPRESENTATION
BR 8/99 Binary Numbers Again Recall than N binary digits (N bits) can represent unsigned integers from 0 to 2 N bits = 0 to 15 8 bits = 0 to 255.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-217, ext: 1204, Lecture 4 (Week 2)
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
1 Data Representation Characters, Integers and Real Numbers Binary Number System Octal Number System Hexadecimal Number System Powered by DeSiaMore.
EEL 3801C EEL 3801 Part I Computing Basics. EEL 3801C Data Representation Digital computers are binary in nature. They operate only on 0’s and 1’s. Everything.
Data Representation, Number Systems and Base Conversions
Data Compression Meeting October 25, 2002 Arithmetic Coding.
Source Coding Efficient Data Representation A.J. Han Vinck.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
Digital Image Processing Lecture 22: Image Compression
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
compress! From theoretical viewpoint...
CMPT365 Multimedia Systems 1 Arithmetic Coding Additional Material Spring 2015 CMPT 365 Multimedia Systems.
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Information theory Data compression perspective Pasi Fränti
Lesson Objectives Aims You should know about: Binary numbers ‘n’ that.
Department of Computer Science Georgia State University
Introduction To Computer Science
CSI-447: Multimedia Systems
Huffman Coding, Arithmetic Coding, and JBIG2
Context-based Data Compression
Analysis & Design of Algorithms (CSCE 321)
Arithmetic coding Let L be a set of items.
CSE 589 Applied Algorithms Spring 1999
CMPT 365 Multimedia Systems
CSE 589 Applied Algorithms Spring 1999
Error Correction Coding
Presentation transcript:

1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)

2 Arithmatic Coding Def.: Arithmetic coding is similar to Huffman coding; they both achieve their compression by reducing the average number of bits required to represent a symbol Unlike Huffman coding, arithmetic coding provides the ability to represent symbols with fractional values (floating point or rather fixed point representation) !!!

3 Arithmatic Coding Huffman Coding ● Replacing an input symbol with a codeword ● Hard to adapt to changing statistics ● Need to store the codeword table ● Minimum codeword length is 1 bit Arithmetic Coding ● Replace the entire input with a single floating-point number ● Adaptive coding is very easy ● No need to keep and send codeword table ● Fractional codeword length

4 Arithmatic Coding Recall the extended Huffman code ● N: alphabet size ● M: Max codeword length (order of extension) ● N^M: the total number of entries need to be encoded! Also the length of the new code words :( ● Encode sequences even if they will not appear in the context! Arithmetic Coding Features ● Normalizes the range [0, N^M] to only values in [0, 1]. ● Only consider the sequences that appears in the file. tag ● Map each input sequence to a unique tag (code, but floating point) in [0, 1).

Notes About Ranges 5 ● square brackets '[' and ']' mean the adjacent number is included ● parenthesis '(' and ')' mean the adjacent number is excluded ● Ranges assigned can then be use for encoding and decoding strings of symbols in the alphabet range coders ● Algorithms using ranges for coding are often referred to as range coders !!!!! ● The floating point tag has to be encoded further into binary format !!?? (HOW?)

Fixed-Point Arithmetic (Signed) 6 ● Let us assume a 8-bit binary sign-magnitude fixed-point representation comprising a sign bit, three integer bits, and four fractional bits ● The sign bit is used only to represent the sign of the value (0 = positive, 1 = negative). [0 is only considered in Arithmetic coding] ● Let us give an example; assume: – three integer bits that can be used to represent an integer in the range 0 to 7. [Not relevant to Arithmetic coding] – 4 bits after the fraction bit, i.e., from 0.0 to 0.934, which is divided as follows:

Fixed-Point Arithmetic (Signed) 7 8-bit binary sign-magnitude fixed-point representation

Fixed-Point Arithmetic (Signed) 8 Fixed-point representation: Integer part + Sign

Fixed-Point Arithmetic (Signed) 9 Fixed-point representation: Fraction Part

Encoding Strings 10 ● It is possible to encode a single symbol by its probability range ● Interval size is proportional to symbol probability ● More symbols can be encoded by partition its probability range by the new symbols, e.g., [0, 1)-> [0.2, 0.6) -> [0. 5, 0.60) -> [0.55, 0.58) ● The first symbol restricts the tag position to be in one of the intervals. ● Once the tag falls into an interval, it never gets out of it ● The reduced interval is partitioned recursively as more symbols are processed.

Example: 11 ● Map to real line range [0, 1) ● Order does not matter; however, decoder needs to use the same order ● partition: ● 1:[0, 0.8) - >0to …9 ● 2:[0.8, 0.82) - >0.8to …9 ● 3:[0.82, 1) - >0.82to …9 Symbol Prob

Example: 12 Now, we need to encode the input sequence: “1321” Range 14.4% Range 80% Range 100% Range 0.23 % Tag (float) = (High + Low)/2 = ( )/2 = !!!

Example: (contd.) 13 The range chart is nothing more than a CDF, where Fx(1) = 0.8, Fx(2) = 0.82, and Fx(3) = 1.0. Tag (float) = (High + Low)/2 = ( )/2 = Range = ( – ) = Accuracy in bits > ceil(log2(1/Range)) = ceil(8.7) = 9 bits!! To find the binary word, after the sign, we compute: 2^-1*1+2^-2*1+2^-3*0+2^-4*0+2^-5*0+2^-6*1+2^-7*0+2^-8*1+2^-9*1 Final word: 0| … You may take out the sign at the end!

Example: (contd.) 14 Tag (float) = (High + Low)/2 = and accuracy in bits >9 bits!! Straightforward method=> Tag*2 = ( )*2 = > 1 → 1 st bit = 1 (Ans-1)*2 = (0.5447)*2 = > 1 → 2 nd bit = 1 (Ans-1)*2 = (0.0894)*2 = < 1 → 3 rd bit = 0 (Ans-0)*2 = (0.1788)*2 = < 1 → 4 th bit = 0 (Ans-0)*2 = (0.3576)*2 = < 1 → 5 th bit = 0 (Ans-0)*2 = (0.7152)*2 = > 1 → 6 th bit = 1 (Ans-1)*2 = (0.4304)*2 = < 1 → 7 th bit = 0 (Ans-0)*2 = (0.8608)*2 = > 1 → 8 th bit = 1 (Ans-1)*2 = (0.7728)*2 = > 1 → 9 th bit = 1 –-- stop at the precision only → 9 bits Final word: 0| … You may take out the sign at the end! This value is exactly =

Encoding Pseudo-Code 15 Arithmetic Coding Pseduo-Code: Input symbols to read = 1321 => get from ReadSymbol() CDF = {0; 0.8; 0.82; 1} %0 for the first CDF(n-1) LOW=0.0, HIGH=1.0; while (not EOF) { %EOF --> no more input to the word n = ReadSymbol()+1; %this time symbols are integer RANGE = HIGH – LOW; %calculate the range HIGH = LOW + RANGE * CDF(n); %update HIGH LOW = LOW + RANGE * CDF(n-1);%update LOW } ● Keep track of three values: LOW, HIGH, RANGE ● Any two are sufficient, e.g., only LOW and RANGE. Any value between lower and upper probability can encode the input string

Decoding Procedure: Concept Decoder 16 Decode: Drawback: need to recalculate all thresholds each time

Decoding Procedure: Simplified Decoder 17 decodes the lower end: x =

Simplified Decoder Pseduo Code 18 CDF = {0,0.8,0.82,1} Low = 0; high = 1; x = GetEncodedNumber(); While (x != low) { n = DecodeOneSymbol(x); output symbol n; x = (x - CDF(n-1)) / (CDF(n) - CDF(n-1)); }; But it still needs high-precision floating point operations

Uniqueness and Efficiency 19 How to represent the final tag uniquely and efficiently? Answer: 1- Take the binary a unique value of the tag T(X) 2- Truncate T(X) to (X m is the sequence {x 1 …x m }) to this number of bits 1 bit longer than Shannon code due to selecting the TAG at the middle!! where I(X) is the bits to code a sequence {x1, …, xm} Symbol Prob = ceil(log2(1/(0.8*0.02*0.18))) + 1

Efficiency of Arithmetic Code 20 Assuming i.i.d. Sequence, then L/m → H(X) for large m.

Another Example with a Sequence 21 A= {a 1, a 2, a 3 }, P= {0.7, 0.1, 0.2) Input seq. for encoding:{a 1 a 2 a 3,... } Output Tag for decoding: cdf: F X (1) = 0.7, F X (2) = 0.8, F X (3) = 1.0

Another Example with a Sequence “iou” 22 To send and encode “iou”: Send any # C such that: ≤ C < Using Binary Fraction of (9 bits) i.e., 9 bits for Arithmetic

Another Example with a Sequence “ CAE$ ” 23 A source output symbols {A, B, C, D, E, F, $}. $ is the termination symbol. Their probabilities are as follows. P(A) = 0.2 P(B) = 0.1 P(C) = 0.2 P(D) = 0.05 P(E) = 0.3 P(F) = 0.05 P($) =

Another Example with a Sequence “ CAE$ ” 24 Assume we have an input sequence= C A E $ A B C D E F $ = Code:

Conclusion 25 For IID sequence of length m: Do you see that Arithmetic coding is worse than Huffman??!! So why consider AC: Remember, this is for coding the entire length of m symbols… You’d need N^m codewords in Huffman… which is too much! For Huffman must be kept small but for AC it can be VERY large!