CSE 589 Applied Algorithms Spring 1999

Slides:



Advertisements
Similar presentations
CY2G2 Information Theory 1
Advertisements

15-583:Algorithms in the Real World
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Lecture 10: Dictionary Coding
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
Lempel-Ziv Compression Techniques
Algorithm Programming Some Topics in Compression
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv Compression Techniques
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
ENGS Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
Algorithms in the Real World
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
Source Coding-Compression
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
Cryptographic Attacks on Scrambled LZ-Compression and Arithmetic Coding By: RAJBIR SINGH BIKRAM KAHLON.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Theory of Automata By: Wasim Ahmad Khan.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Data Compression Meeting October 25, 2002 Arithmetic Coding.
Data Compression Reduce the size of data.  Reduces storage space and hence storage cost. Compression ratio = original data size/compressed data size.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Data Compression 황승원 Fall 2010 CSE, POSTECH 2 2 포항공과대학교 황승원 교 수는 데이터구조를 수강하 는 포항공과대학교 재학생 들에게 데이터구조를 잘해 야 전산학을 잘할수 있으니 더욱 열심히 해야한다고 말 했다. 포항공과대학교 A 데이터구조를.
CS 1501: Algorithm Implementation LZW Data Compression.
compress! From theoretical viewpoint...
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
15-853Page :Algorithms in the Real World Data Compression III Lempel-Ziv algorithms Burrows-Wheeler Introduction to Lossy Compression.
CS 1501: Algorithm Implementation
Information theory Data compression perspective Pasi Fränti
Lecture 4: Data Compression Techniques TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency.
Lecture # 2.
Compression & Huffman Codes
Data Compression.
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Increasing Information per Bit
Lexical Analysis CSE 340 – Principles of Programming Languages
Lempel-Ziv-Welch (LZW) Compression Algorithm
Algorithms in the Real World
Applied Algorithmics - week7
Lempel-Ziv Compression Techniques
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression Algorithm
Huffman Coding, Arithmetic Coding, and JBIG2
Chapter 7 Special Section
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Data Compression Reduce the size of data.
Lempel-Ziv Compression Techniques
UNIT IV.
CSE 589 Applied Algorithms Spring 1999
فشرده سازي داده ها Reduce the size of data.
CSE 589 Applied Algorithms Spring 1999
Chapter 7 Special Section
CSE 589 Applied Algorithms Spring 1999
High-capacity Reversible Data-hiding for LZW Codes
Lempel-Ziv-Welch (LZW) Compression Algorithm
Presentation transcript:

CSE 589 Applied Algorithms Spring 1999 Arithmetic Coding Dictionary Coding

Arithmetic Coding Huffman coding works well for larger alphabets and gets to within one bit of the entropy lower bound. Can we do better. Yes Basic idea in arithmetic coding: represent each string x of length n by an interval [l,r) in [0,1). The width r-l of the interval [l,r) represents the probability of x occurring. The interval [l,r) can itself be represented by any number, called a tag, within the half open interval. The k significant bits of the tag .t1t2t3... is the code of x. That is, . .t1t2t3...tk000... is in the interval [l,r). CSE 589 - Lecture 12 - Spring 1999

Example of Arithmetic Coding (1) 1. tag must be in the half open interval. 2. tag can be chosen to be (l+r)/2. 3. code is significant bits of the tag. 1/3 a 15/27 .100011100... 2/3 bba b 19/27 .101101000... bb tag = 17/27 = .101000010... code = 101 1 CSE 589 - Lecture 12 - Spring 1999

Some Tags are Better than Others 1/3 a 11/27 .011010000... ba bab 15/27 .100011100... 2/3 b Using tag = (l+r)/2 tag1 = 13/27 = .011110110... code1 = 0111 tag2 = 14/27 = .100001001... code2 = 1 1 CSE 589 - Lecture 12 - Spring 1999

Example of Codes P(a) = 1/3, P(b) = 2/3. tag = (l+r)/2 code 0/27 0/27 .000000000... aaa .000001001... 0 aaa aa 1/27 .000010010... aab .000100110... 0001 aab 3/27 .000111000... aba .001001100... 001 aba a 5/27 .001011110... ab abb .010000101... 01 abb 9/27 .010101010... baa .010111110... 01011 baa 11/27 .011010000... ba bab .011110111... 0111 bab 15/27 .100011100... bba b .101000010... 101 bba 19/27 .101101000... bb bbb .110110100... 11 bbb 1 27/27 .111111111... .95 bits/symbol .92 entropy lower bound CSE 589 - Lecture 12 - Spring 1999

Code Generation from Tag If binary tag is .t1t2t3... = (r-l)/2 in [l,r) then we want to choose k to form the code t1t2...tk. Short code: choose k to be as small as possible so that l < .t1t2...tk000... < r. Guaranteed code: choose l < .t1t2...tkb1b2b3... < r for any bits b1b2b3... for fixed length strings provides a good prefix code. example: [.000000000..., .000010010...), tag = .000001001... Short code: 0 Guaranteed code: 000001 CSE 589 - Lecture 12 - Spring 1999

Arithmetic Coding Algorithm P(a1), P(a2), … , P(am) C(ai) = P(a1) + P(a2) + … + P(ai-1) Encode x1x2...xn Initialize l := 0 and r := 1; for i = 1 to n do w := r - l; l := l + wC(xi); r := l + wP(xi); t := (l+r)/2; choose code for the tag CSE 589 - Lecture 12 - Spring 1999

Arithmetic Coding Example P(a) = 1/4, P(b) = 1/2, P(c) = 1/4 C(a) = 0, C(b) = 1/4, C(c) = 3/4 abca symbol w l r 0 1 a 1 0 1/4 b 1/4 1/16 3/16 c 1/8 5/32 6/32 a 1/32 5/32 21/128 tag = (5/32 + 21/128)/2 = 41/256 = .001010010... l = .001010000... r = .001010100... code = 00101 prefix code = 00101001 w := r - l; l := l + w C(x); r := l + w P(x) CSE 589 - Lecture 12 - Spring 1999

Decoding (1) Assume the length is known to be 3. 0001 which converts to the tag .0001000... .0001000... output a a b 1 CSE 589 - Lecture 12 - Spring 1999

Decoding (2) Assume the length is known to be 3. 0001 which converts to the tag .0001000... .0001000... aa output a a ab b 1 CSE 589 - Lecture 12 - Spring 1999

Decoding (3) Assume the length is known to be 3. 0001 which converts to the tag .0001000... .0001000... aa aab output b a ab b 1 CSE 589 - Lecture 12 - Spring 1999

Arithmetic Decoding Algorithm P(a1), P(a2), … , P(am) C(ai) = P(a1) + P(a2) + … + P(ai-1) Decode b1b2...bm, number of symbols is n. Initialize l := 0 and r := 1; t := .b1b2...bm000... for i = 1 to n do w := r - l; find j such that l + wC(aj) < t < l + w(C(aj)+P(aj)) output aj; l := l + wC(aj); r := l + wP(aj); CSE 589 - Lecture 12 - Spring 1999

Decoding Example P(a) = 1/4, P(b) = 1/2, P(c) = 1/4 C(a) = 0, C(b) = 1/4, C(c) = 3/4 00101 tag = .00101000... = 5/32 w l r output 0 1 1 0 1/4 a 1/4 1/16 3/16 b 1/8 5/32 6/32 c 1/32 5/32 21/128 a CSE 589 - Lecture 12 - Spring 1999

Practical Arithmetic Coding Scaling: By scaling we can keep l and r in a reasonable range of values so that w = r - l does not underflow. The code can be produced progressively, not at the end. Complicates decoding some. Integer arithmetic coding avoids floating point altogether. CSE 589 - Lecture 12 - Spring 1999

Coding with Scaling (1) Assume the length is known to be 3. bba Scaling Principle: If [r,l) is contained in [0,.5) the double, r := 2r; l := 2l and output 0. If [r,l) is contained in [.5,1) then shift by .5 and double, r := 2(r -.5); l := 2(l-.5) and output 1. 1/3 a l = 2/3 r = 1 2/3 b 1 CSE 589 - Lecture 12 - Spring 1999

Coding with Scaling (2) Assume the length is known to be 3. bba 1 output 1 and scale 1/3 a l = 5/9 r = 1 l = 1/9 r = 1 bb 2/3 b bb 1 CSE 589 - Lecture 12 - Spring 1999

Coding with Scaling (3) Assume the length is known to be 3. bba 10 output 0 and scale 1/3 a bba l = 1/9 r = 11/27 l = 2/9 r = 22/27 bba bb 2/3 b bb 1 CSE 589 - Lecture 12 - Spring 1999

Coding with Scaling (4) Assume the length is known to be 3. bba 101 r = 22/27 = .110100001 (l+r)/2 = 14/27 = .100001001... 1/3 output 1 a bba l = 1/9 r = 11/27 l = 2/9 r = 22/27 bba bb 2/3 b bb 1 CSE 589 - Lecture 12 - Spring 1999

Notes on Arithmetic Coding Arithmetic codes come close to the entropy lower bound. Grouping symbols is effective for arithmetic coding. Arithmetic codes can be used effectively on small symbol sets. Advantage over Huffman. Context can be added so that more than one probability distribution can be used. The best coders in the world use this method. There are very effective adaptive arithmetic coding methods. CSE 589 - Lecture 12 - Spring 1999

Dictionary Coding Most popular methods are based on Ziv and Lempel’s seminal work in 1977 and 1978. Basic idea: Maintain a dictionary of commonly used strings. Each commonly used string has an index. Static dictionary, fixed and does not change. Dynamic dictionary, adapts to the changing string. CSE 589 - Lecture 12 - Spring 1999

Static Dictionary 0 a 6 bc 1 b 7 bcc 2 c 8 ada 3 d 9 abc 4 aa 10 dda 5 ab 11 aaaa Encoding: from the current position find the longest string in source string that matches a string in the dictionary. Output its index. Decoding: for each index output the corresponding string in the dictionary. CSE 589 - Lecture 12 - Spring 1999

Static Dictionary Example 0 a 6 bc 1 b 7 bcc 2 c 8 ada 3 d 9 abc 4 aa 10 dda 5 ab 11 aaaa a a b c c a d b a a a a d d a 30 bits with 2 bits/symbol a a b c c a d b a a a a d d a 4 7 0 3 1 11 10 28 bits at 4 bits/symbol CSE 589 - Lecture 12 - Spring 1999

Dynamic Dictionary For a static dictionary both the encoder and decoder have to have the dictionary. Dynamic dictionary The encoder builds the dictionary as it scans the input. The decoder emulates the encoder, building the same dictionary as it decodes the string. CSE 589 - Lecture 12 - Spring 1999

LZW Compression Invented by Ziv and Lempel in 1978 and improved upon by Welch in 1984. Unix compress and GIF are based on LZW In LZW both encoder and decoder share the same indexes of the symbol alphabet ahead of time. For standard symbols sets like ASCII this is no problem. CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Algorithm Repeat find the longest match w in the dictionary output the index of w put wa in the dictionary where a was the unmatched symbol CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (1) Dictionary a b r a c a d a b a r a b r a 0 a 1 b 2 c 3 d 4 r CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (2) Dictionary a b r a c a d a b a r a b r a 0 a 1 b 2 c 3 d 4 r 5 ab CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (3) Dictionary a b r a c a d a b a r a b r a 0 a 1 b 2 c 3 d 4 r 5 ab 6 br 0 1 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (4) Dictionary a b r a c a d a b a r a b r a 0 a 1 b 2 c 3 d 4 r 5 ab 6 br 7 ra 0 1 4 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (5) Dictionary a b r a c a d a b a r a b r a 0 a 1 b 2 c 3 d 4 r 5 ab 6 br 7 ra 8 ac 0 1 4 0 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (6) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 2 c 3 d 4 r 5 ab 6 br 7 ra 8 ac 0 1 4 0 2 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (7) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 10 ad 2 c 3 d 4 r 5 ab 6 br 7 ra 8 ac 0 1 4 0 2 0 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (8) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 10 ad 2 c 11 da 3 d 4 r 5 ab 6 br 7 ra 8 ac 0 1 4 0 2 0 3 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (9) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 10 ad 2 c 11 da 3 d 12 aba 4 r 5 ab 6 br 7 ra 8 ac 0 1 4 0 2 0 3 5 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (10) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 10 ad 2 c 11 da 3 d 12 aba 4 r 13 ar 5 ab 6 br 7 ra 8 ac 0 1 4 0 2 0 3 5 0 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (11) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 10 ad 2 c 11 da 3 d 12 aba 4 r 13 ar 5 ab 14 rab 6 br 7 ra 8 ac 0 1 4 0 2 0 3 5 0 7 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (12) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 10 ad 2 c 11 da 3 d 12 aba 4 r 13 ar 5 ab 14 rab 6 br 15 bra 7 ra 8 ac 0 1 4 0 2 0 3 5 0 7 6 CSE 589 - Lecture 12 - Spring 1999

LZW Encoding Example (13) Dictionary a b r a c a d a b a r a b r a 0 a 9 ca 1 b 10 ad 2 c 11 da 3 d 12 aba 4 r 13 ar 5 ab 14 rab 6 br 15 bra 7 ra 8 ac 0 1 4 0 2 0 3 5 0 7 6 0 CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Algorithm Emulate the Encoder in building the dictionary. Decode each index according to its index. Problem: the current index have an incomplete entry because it is currently being added to the dictionary. The problem is solved because there is enough information in the incomplete entry to continue decoding. CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (1) Dictionary 0 1 2 4 3 6 0 a 1 b CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (2) Dictionary 0 1 2 4 3 6 0 a 1 b 2 a... a CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (3) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 b... a b CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (4) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 ab... a b ab The next index is 4, but it is incomplete! CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (5) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba a b ab The entry has a first symbol which is all we need to complete it. CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (6) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 aba... a b ab aba CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (7) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 ba... a b ab aba ba CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (8) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 bab a b ab aba ba complete 6 CSE 589 - Lecture 12 - Spring 1999

LZW Decoding Example (9) Dictionary 0 1 2 4 3 6 0 a 1 b 2 ab 3 ba 4 aba 5 abab 6 bab 7 bab... a b ab aba ba bab CSE 589 - Lecture 12 - Spring 1999

Trie Data Structure for Dictionary Fredkin (1960) 0 a 9 ad 1 b 10 da 2 c 11 aba 3 d 12 ar 4 r 13 ra 5 ab 14 abr 6 br 7 ac 8 ca 0 1 2 3 4 a b c d r a 13 b 5 c 7 d 9 r 12 a 8 r 6 a 10 a 11 r 14 Depending on the size of the dictionary it might be wise to have two array levels to minimize searching. CSE 589 - Lecture 12 - Spring 1999

Notes on Dictionary Coding Extremely effective when there are repeated patterns in the data that are widely spread. Where local context is not as significant. text some graphics program sources or binaries Variants of LZW are pervasive. CSE 589 - Lecture 12 - Spring 1999