compress! From theoretical viewpoint...

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

Lecture 10: Dictionary Coding
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Image Compression, Transform Coding & the Haar Transform 4c8 – Dr. David Corrigan.
Lempel-Ziv Compression Techniques
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Spatial and Temporal Data Mining
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
JPEG.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv Compression Techniques
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Lossless Compression - I Hao Jiang Computer Science Department Sept. 13, 2007.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
Data Compression Basics & Huffman Coding
Chapter 7 Special Section Focus on Data Compression.
Yehong, Wang Wei, Wang Sheng, Jinyang, Gordon. Outline Introduction Overview of Huffman Coding Arithmetic Coding Encoding and Decoding Probabilistic Model.
Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
ENGS Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Chapter 2 Source Coding (part 2)
Algorithms in the Real World
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
Computer Vision – Compression(2) Hanyang University Jong-Il Park.
Source Coding-Compression
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
exercise in the previous class
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression(2)
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
1 Chapter 3 Text and image compression Compression principles u Source encoders and destination decoders u Lossless and lossy compression u Entropy.
Information theory Data compression perspective Pasi Fränti
Lecture 4: Data Compression Techniques TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency.
CSE 589 Applied Algorithms Spring 1999
Data Coding Run Length Coding
Lempel-Ziv-Welch (LZW) Compression Algorithm
Context-based Data Compression
Chapter 11 Data Compression
UNIT IV.
CMPT 365 Multimedia Systems
Presentation transcript:

compress! From theoretical viewpoint... block Huffman codes achieve the best efficiency. A B prob. 0.8 0.2 cdwd 1 AAA AAB ABA ABB BAA BAB BBA BBB prob. 0.512 0.128 0.032 0.008 cdwd 100 101 11100 110 11101 11110 11111 𝐿 1 =1.0 for one symbol 𝐿 3 =2.184 for three symbols lim 𝑛→∞ 𝐿 𝑛 /𝑛→𝐻(𝑋) 𝐿 3 /3=0.728 for one symbol

problem of block Huffman codes From practical viewpoint... block Huffman codes have some problems: a large table is needed for the encoding/decoding  run-length Huffman code  arithmetic code probabilities must be known in advance  Lempel-Ziv codes AAA AAB ABA ABB BAA BAB BBA BBB prob. 0.512 0.128 0.032 0.008 cdwd 100 101 11100 110 11101 11110 11111 three coding techniques

run-length Huffman code 1/3 run-length Huffman code a coding scheme which is good for “biased” sequences we focus binary information source alphabet = {𝐴,𝐵}, with 𝑃 𝐴 ≫𝑃(𝐵) data compression in the facsimile system

run and run-length run = a sequence of consecutive identical symbol A B B A A A A A B A A A B run of length = 1 run of length = 5 run of length = 3 run of length = 0 of A The message is recovered if the lengths of runs are given.  encode the length of runs, not the pattern itself

upper-bound the run-length small problem? ... there can be very, very, very long run  put an upper-bound limit : run-length limited (RLL) coding upper-bound = 3 run length 1 2 3 4 5 6 7 : representation 1 2 3+0 3+1 3+2 3+3+0 3+3+1 : ABBAAAAABAAAB = one “A” followed by B zero “A” followed by B three or more “A”s two “A”s followed by B

run-length Huffman code ... is a Huffman code defined to encode the length or runs effective when there is bias of symbol probabilities p(A) = 0.9, p(B) = 0.1 run length 1 2 3 or more block pattern B AB AAB AAA prob. 0.1 0.09 0.081 0.729 codeword 10 110 111 ABBAAAAABAAAB: 1, 0, 3+, 2, 3+, 0 ⇒ 110 10 0 111 0 10 AAAABAAAAABAAB: 3+, 1, 3+, 2, 2 ⇒ 0 110 0 111 111 AAABAAAAAAAAB: 3+, 0, 3+, 3+, 2 ⇒ 0 10 0 0 111

comparison P(A) = 0.9, p(B) = 0.1 the entropy of X: H(X) = –0.9log20.9 – 0.1log20.1=0.469 bit code 1: a naive Huffman code average codeword length = 1 symbol A B prob. 0.9 0.1 codeword 1 code 2: blocked (3bit) average codeword length = 1.661/3symbols = 0.55/symbol AAA AAB ABA ABB 0.729 0.081 0.009 100 110 1010 BAA BAB BBA BBB 0.081 0.009 1110 1011 11110 11111

comparison (cnt’d) code 3: run-length Huffman (upper-bound = 8) length 1 2 3 prob. 0.1 0.09 0.081 0.073 codeword 110 1000 1001 1010 4 5 6 7+ 0.066 0.059 0.053 0.478 1011 1110 1111 consider typical 𝑛 runs... before: 0.1𝑛×1 +⋯+ 0.478𝑛×7 = 5.215𝑛; A or Bs after: 0.1𝑛×3 +⋯+ 0.478𝑛×1 = 2.466𝑛 ; 0 or 1s the average codeword length per symbol = 2.466 / 5.215 = 0.47 RLL is a small trick, but it fully utilizes Huffman coding technique

2/3 arithmetic code a coding scheme which does not use the translation table table-lookup is replaced by “on-the-fly” computation translation table is not needed slightly complicated computation is needed It is proved that its average codeword length →𝐻(𝑋) arithmetic codes a coding scheme which is advantageous for the implementation

preliminary 𝑛-th order extended source with 𝑃 𝐴 =𝑝, 𝑃 𝐵 =1−𝑝 we encode one of 2 𝑛 patterns in 𝐴,𝐵 𝑛 𝑝 = 0.7, 𝑛 = 3: 8 data patterns 𝑤𝟎,…, 𝑤7 in the dictionary order 𝑃(𝑤𝑖) :prob. that 𝑤𝑖 occurs 𝐴(𝑤𝑖) :accumulation of probs. 𝐴 𝑤 𝑖 = 𝑗=0 𝑖−1 𝑃 𝑤 𝑗 =𝐴 𝑤 𝑖−1 +𝑃 𝑤 𝑖−1 # 1 2 3 4 5 6 7 𝑤𝑖 AAA AAB ABA ABB BAA BAB BBA BBB 𝑃(𝑤𝑖) 0.343 0.147 0.063 0.027 𝐴(𝑤𝑖) 0.343 0.490 0.637 0.700 0.847 0.910 0.973 For simplicity, we consider a binary case with A and B occur with probabilities p and 1-p, respectively. Also assume that we are going to encode a sequence with n messages (the number of possible message sequence is 2^n in total). If p = 0.7 and n = 3, we have 2^3 = 8 sequences which can occur. Now order the sequences, for example in the dictionary order, and name them as w_0, ..., w_7. We write S(w_i) for the probability that w_i occurs, and write L(w_i) for the probability that w_j with j < i occur. Thus L(w_i) is the sum of S(w_j) with j < i, and we have L(w_i) = L(w_{i-1}) + S(w_{i-1}). ↑ accumulation of 𝑃 before 𝑤 𝑖

illustration of probabilities the 8 data patterns define a partition of the interval [0, 1]; 0.5 1.0 AAA AAB ABA ABB BAA BAB BBA BBB 0.343 0.147 0.147 0.063 0.147 0.063 0.063 0.027 # 1 2 3 4 5 6 7 𝑤𝑖 AAA AAB ABA ABB BAA BAB BBA BBB 𝑃(𝑤𝑖) 0.343 0.147 0.063 0.027 𝐴(𝑤𝑖) 0.490 0.637 0.700 0.847 0.910 0.973 A(ABB) A(BAA) = A(ABB)+P(ABB) 𝑤𝑖 occupies the interval 𝐼 𝑖 = 𝐴 𝑤 𝑖 , 𝐴 𝑤 𝑖 +𝑃( 𝑤 𝑖 ) basic idea: represent 𝑤𝑖 by a value 𝑥∈ 𝐼 𝑖 problem to solve: need a translation between 𝑤𝑖 and 𝑥 ↑ ↑ size & left-end of the interval

about the translation P(wA) P(wB) P(w) A(w) A(wA) A(wB) two directions of the translation: [encode] the translation from 𝑤𝑖 to 𝑥 [decode] the translation from 𝑥 to 𝑤𝑖 ...use recursive computation instead of a static table 0.343 0.147 0.063 0.027 AAA AAB ABA ABB BAA BAB BBB AA AB BA A B  “a land of a parent is divided & inherited to two children”

[encode] the translation from 𝑤𝑖 to 𝑥 recursively determine 𝑃( ) and 𝐴( ) for prefixes of 𝑤 𝑖 𝑃() = 1, 𝐴() = 0( is a null string) for 𝑤𝐴, 𝑃(𝑤𝐴)=𝑃(𝑤)𝑝, 𝐴(𝑤𝐴)=𝐴(𝑤) for 𝑤𝐵, 𝑃 𝑤𝐵 =𝑃 𝑤 1 – 𝑝 , 𝐴(𝑤𝐵)=𝐴(𝑤)+𝑃(𝑤)𝑝 - determine the subsection which corresponds to a given sequence of messages traverse the tree on-the-fly with computing values of S( ) and L( ). At the beginning, we are at the root node epsilon (e), with S(e) = 1 and L(e) = 0 (intuitively, the size of the considered section is S(e) = 1 and the left-end of the section is L(e) = 0, meaning that we have not yet partitioned the section [0, 1]). The S( ) and L( ) values of the subsequence nodes are computed when they are needed. The equations in the gray box show the rule for the computation. Again remind that S( ) is the size of the investigated section and L( ) is the left-end of the section. The tree in the bottom shows the computation example for the sequence ABB/ the interval of ABB? 𝑃(𝑤𝐴) 𝑃(𝑤𝐵) 𝑃(𝑤) 𝐴(𝑤) 𝐴(𝑤𝐴) 𝐴(𝑤𝐵) 𝑃() = 1 𝐴() = 0  𝑃(𝐴)=0.7 𝐴(𝐴) = 0 A B AA AB 𝑃(𝐴𝐵)=0.21 𝐴(𝐴𝐵) = 0.49 ABB inherits [0.637, 0.637 + 0.063) ABA ABB 𝑃(𝐴𝐵𝐵)=0.063 𝐴(𝐴𝐵𝐵) = 0.637 𝑝 = 𝑃(𝐴) = 0.7

[encode] the translation from 𝑤𝑖 to 𝑥 (cnt’d) We know the interval 𝐼𝑖; which of 𝑥∈ 𝐼 𝑖 should we choose? 𝑥 should have the shortest binary representation choose 𝑥=𝐴(𝑤𝑖)+𝑃(𝑤𝑖) but trim at ⌈–log2𝑃 𝑤𝑗 ⌉ places We next consider the representation of x. In the encoding procedure, we determine the section which corresponds to the sequence of messages which are to be encoded. However, the section contains infinitely many points. Which points should be chosen as x? Of course, we would like to choose x so that the representation of x is as short as possible. We would like to choose x which has the smallest representation (in binary) within the section. For this sake, we can choose \lceil –log S(w_j) \rceil bits of the decimal fraction of the binary representation of L(w_{j+1}) as x. Note that L(w_j) and L(w_{j+1}) are differ at the \lceil –log S(w_j) \rceil –th bits, and the above choice allow us distinguish L(w_j) and L(w_{j+1}). The average codeword length of the arithmetic code is approximately evaluaetd as the equation, and it can achieve almost the same efficiency as the Huffman codes. 𝐴(𝑤𝑖) + 𝑃(𝑤𝑖) 𝐴( 𝑤 𝑖+1 ) 0.aa...aaa...a +0.00...01b...b 0.aa...acc...c 0.aa...ac0...0 the length of 𝑥 ≈ – log2𝑃(𝑤𝑖) 𝑥 = most significant non-zero place of 𝑃(𝑤𝑗) ⌈–log2𝑃 𝑤𝑗 ⌉ 0.aa...ac almost ideal! 0.aa...aaa...a 0.aa...acc...c

choice of 𝑥 (sketch in decimal notation) Find 𝑥∈[0.123456,0.126543) that is the shortest in decimal. 0.123456 0.126543 0.12654 0.1265 0.126 0.12 round off some digits of 0.126543, but not too many... 0.126543 −) 0.123456 0.003087 # of fraction places that 𝑥 must have = the most significant nonzero place of 0.126543−0.123456 = log 10 (the size of the interval)

[decode] the translation from 𝑥 to 𝑤𝑖 given 𝑥, determine the leaf node whose interval contains 𝑥 almost the same as the first half of the encoding translation compute, compare, and move to the left or right 𝑃(𝑤𝐴) 𝑃(𝑤𝐵) 𝑃(𝑤) 𝐴(𝑤) 𝐴(𝑤𝐴) 𝐴(𝑤𝐵) 𝑥 = 0.600 𝑃() = 1 𝐴() = 0  𝑃(𝐴)=0.7 𝐴(𝐴) = 0 A B 𝐴(𝐵) = 0.7 AA AB 𝑃(𝐴𝐵)=0.21 𝐴(𝐴𝐵) = 0.49 threshold value ABA ABB 𝑃(𝐴𝐵𝐴)=0.147 𝐴(𝐴𝐵𝐴) = 0.49 𝐴(𝐴𝐵𝐵) = 0.637 0.600 is contained in the interval of ABA...decoding completed

performance, summary an 𝑛-symbol pattern 𝑤 with probability 𝑃(𝑤)  encoded to a codeword with length − log 2 𝑃( 𝑤 𝑖 ) the average codeword length per symbol is 1 𝑛 𝑤∈ 𝑉 𝑛 𝑃(𝑤) − log 2 𝑃(𝑤) ≈ 1 𝑛 𝑤∈ 𝑉 𝑛 −𝑃 𝑤 log 2 𝑃 𝑤 =𝐻(𝑋) almost optimum coding without using a translation table however... we need much computation with good precision ( use approximation?)

3/3 Lempel-Ziv codes a coding scheme which does not need probability distribution the encoder learns the statistical behavior of the source the translation table is constructed in an adaptive manner works finely even for information sources with memory Lempel-Ziv code a coding in which we don’t have to know the probability of the messages in advance

probability in advance? so far, we assumed that the probabilities of symbols are known... in the real world... the symbol probabilities are often not known in advance scan the data twice? first scan...count the number of symbol occurrences second scan...Huffman coding delay of the encoding operation... overhead to transmit the translation table...

Lempel-Ziv algorithms for information sources whose symbol probability is not known... LZ77 lha, gzip, zip, zoo, etc. LZ78 compress, arc, stuffit, etc. LZW GIF, TIFF, etc. work fine for any information sources  universal coding

LZ77 L proposed by A. Lempel and J. Ziv in 1977 represent a data substring by using a substring which has been occurred previously algorithm overview process the data from the beginning partition the data to blocks in a dynamic manner represent a block by a three-tuple (𝑖, 𝑙, 𝑥) “rewind 𝑖 symbols, copy 𝑙 symbols, and append 𝑥” Z 𝑥 –1 –𝑖+𝑙 –𝑖 𝑙–1 𝑙 encoding completed

encoding example of LZ77 consider to encode ABCBCDBDCBCD symbol A B C history first time = (here) – 2 ≠ (here) – 2 = (here) – 3 ≠ (here) – 3 = (here) – 6 codeword (0, 0, A) (0, 0, B) (0, 0, C) (2, 2, D) (3, 1, D) (6, 4, *)

decoding example of LZ77 decode (0, 0, A), (0, 0, B), (0, 0, C), (2, 2, D), (3, 1, D), (6, 4, *) possible problem: large block is good, because we can copy more symbols large block is bad, because a codeword contains a large integer ... the trade-off degrades the performance.

LZ78 proposed by A. Lempel and J. Ziv in 1978 represent a block by a thw-tuple (𝑏, 𝑥) “copy the 𝑏-th block before, and append 𝑥” encoding completed 𝑥 –1 –𝑏

encoding example of LZ78 consider to encode ABCBCBCDBCDE symbol A B C history first time = (here) – 2 block = (here) – 1 block codeword (0, A) (0, B) (0, C) (2, C) (1, D) (1, E) block # 1 2 3 4 5 6

decoding example of LZ78 decode (0, A), (0, B), (0, C), (2, C), (1, D), (1, E) advantage against LZ77: large block is good, because we can copy more symbols is there anything wrong with large blocks?  the performance slightly better than LZ78

summary of LZ algorithms in LZ algorithms, the translation table is constructed adoptively information sources with unknown symbol probabilities information sources with memory LZW: good material to learn intellectual property (知的財産) UNISYS, CompuServe, GIF format, ...

summary of today’s class Huffman codes are good, but not practical sometimes... run-length Huffman code simple but effective for certain types of sources arithmetic code not so practical, but has strong back-up from theory LZ codes practical, practical, practical