Applied Algorithmics - week7

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Compression & Huffman Codes
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
SWE 423: Multimedia Systems
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Chapter 9: Huffman Codes
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Efficient encoding methods  Coding theory refers to study of code properties and their suitability to specific applications.  Efficient codes are used,
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Source Coding-Compression
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Fundamental Structures of Computer Science March 23, 2006 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
LZW (Lempel-Ziv-welch) compression method The LZW method to compress data is an evolution of the method originally created by Abraham Lempel and Jacob.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
HUFFMAN CODES.
Data Coding Run Length Coding
Compression & Huffman Codes
Data Compression.
Tries 07/28/16 11:04 Text Compression
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Increasing Information per Bit
Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
The Greedy Method and Text Compression
Lempel-Ziv Compression Techniques
The Greedy Method and Text Compression
Lempel-Ziv-Welch (LZW) Compression Algorithm
Optimal Merging Of Runs
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Advanced Algorithms Analysis and Design
Lempel-Ziv Compression Techniques
Chapter 11 Data Compression
CSE 589 Applied Algorithms Spring 1999
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Applied Algorithmics - week7 Huffman Coding David A. Huffman (1951) Huffman coding uses frequencies of symbols in a string to build a variable rate prefix code Each symbol is mapped to a binary string More frequent symbols have shorter codes No code is a prefix of another Example: D C B A 1 A 0 B 100 C 101 D 11 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 Variable Rate Codes Example: 1) A  00; B  01; C  10; D  11; 2) A  0; B  100; C  101; D  11; Two different encodings of AABDDCAA 0000011111100000 (16 bits) 00100111110100 (14 bits) 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 Cost of Huffman Trees Let A={a1, a2, .., am} be the alphabet in which each symbol ai has probability pi We can define the cost of the Huffman tree HT as C(HT)= pi·ri, where ri is the length of the path from the root to ai The cost C(HT) is the expected length (in bits) of a code word represented by the tree HT. The value of C(HT) is called the bit rate of the code. m i=1 05/07/2018 Applied Algorithmics - week7

Cost of Huffman Trees - example Let a1=A, p1=1/2; a2=B, p2=1/8; a3=C, p3=1/8; a4=D, p4=1/4 where r1=1, r2=3, r3=3, and r4=2 HT D C B A 1 C(HT) =1·1/2 +3·1/8 +3·1/8 +2·1/4=1.75 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 Huffman Tree Property Input: Given probabilities p1, p2, .., pm for symbols a1, a2, .., am from alphabet A Output: A tree that minimizes the average number of bits (bit rate) to code a symbol from A I.e., the goal is to minimize function: C(HT)= pi·ri, where ri is the length of the path from the root to leaf ai. This is called a Huffman tree or Huffman code for alphabet A 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 Huffman Tree Property Input: Given probabilities p1, p2, .., pm for symbols a1, a2, .., am from alphabet A Output: A tree that minimizes the average number of bits (bit rate) to code a symbol from A I.e., the goal is to minimize function: C(HT)= pi·ri, where ri is the length of the path from the root to leaf ai. This is called a Huffman tree or Huffman code for alphabet A 05/07/2018 Applied Algorithmics - week7

Construction of Huffman Trees Form a (tree) node for each symbol ai with weight pi Insert all nodes to a priority queue PQ (e.g., a heap) ordered by nodes probabilities while (the priority queue has more than two nodes) min1  remove-min(PQ); min2  remove-min(PQ); create a new (tree) node T; T.weight  min1.weight + min2.weight; T.left  min1; T.right  min2; insert(PQ, T) return (last node in PQ) 05/07/2018 Applied Algorithmics - week7

Construction of Huffman Trees P(A)= 0.4, P(B)= 0.1, P(C)= 0.3, P(D)= 0.1, P(E)= 0.1 0.1 0.1 0.1 C 0.3 0.4 D E B A 0.2 0.1 0.3 0.4 B C A 1 D E 05/07/2018 Applied Algorithmics - week7

Construction of Huffman Trees 0.1 0.2 0.3 0.4 B C A 1 D E 0.3 0.3 0.4 C A 1 B 1 D E 05/07/2018 Applied Algorithmics - week7

Construction of Huffman Trees 0.3 0.3 0.4 0.6 0.4 C A A 1 1 B C 1 1 D E B 1 D E 05/07/2018 Applied Algorithmics - week7

Construction of Huffman Trees 0.4 0.6 A 1 1 A C 1 1 B C 1 1 B D E 1 D E 05/07/2018 Applied Algorithmics - week7

Construction of Huffman Trees 1 A = 0 A B = 100 1 C = 11 C D = 1010 1 E = 1011 B 1 D E 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 Huffman Codes Theorem: For any source S the Huffman code can be computed efficiently in time O(n·log n) , where n is the size of the source S. Proof: The time complexity of Huffman coding algorithm is dominated by the use of priority queues One can also prove that Huffman coding creates the most efficient set of prefix codes for a given text It is also one of the most efficient entropy coder 05/07/2018 Applied Algorithmics - week7

Basics of Information Theory The entropy of an information source (string) S built over alphabet A={a1, a2, .., am}is defined as: H(S) = ∑ i pi·log2(1/pi) where pi is the probability that symbol ai in S will occur log2(1/pi) indicates the amount of information contained in ai, i.e., the number of bits needed to code ai. For example, in an image with uniform distribution of gray-level intensity, i.e. all pi = 1/256, then the number of bits needed to encode each gray level is 8 bits. The entropy of this image is 8. 05/07/2018 Applied Algorithmics - week7

Huffman Code vs. Entropy P(A)= 0.4, P(B)= 0.1, P(C)= 0.3, P(D)= 0.1, P(E)= 0.1 Entropy: 0.4 · log2(10/4) + 0.1 · log2(10) + 0.3 · log2(10/3) + 0.1 · log2(10) + 0.1 · log2(10) = 2.05 bits per symbol Huffman Code: 0.4 · 1 + 0.1 · 3 + 0.3 · 2 + 0.1 · 4 + 0.1 · 4 = 2.10 Not bad, not bad at all. 05/07/2018 Applied Algorithmics - week7

Lempel-Ziv-Welch Compression The Lempel-Ziv-Welch (LZW) compression algorithm is an example of dictionary based methods, in which longer fragments of the input text are replaced by much shorter references to code words stored in the special set called dictionary LZW is an implementation of a lossless data compression algorithm developed by Abraham Lempel and Jacob Ziv. It was published by Terry Welch in 1984 as an improved version of the LZ78 dictionary coding algorithm developed by Lempel and Ziv. 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 LZW Compression The key insight of the method is that it is possible to automatically build a dictionary of previously seen strings in the text being compressed. The dictionary starts off with 256 entries, one for each possible character (single byte string). Every time a string not already in the dictionary is seen, a longer string consisting of that string appended with the single character following it in the text, is stored in the dictionary. 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 LZW Compression The output consists of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary grows, can increase to up to 16 bits. A special symbol is reserved for "flush the dictionary" which takes the dictionary back to the original 256 entries, and 9 bit indices. This is useful if compressing a text which has variable characteristics, since a dictionary of early material is not of much use later in the text. This use of variably increasing index sizes is one of Welch's contributions. Another was to specify an efficient data structure to store the dictionary. 05/07/2018 Applied Algorithmics - week7

LZW Compression - example Fibonacci language: w0 =a, w1=b, wi = wi-1·wi-2 for i>1 For example, w6 = babbababbabba We show how LZW compresses babbababbabba CW0 CW1 CW2 CW3 CW4 CW5 b a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Virtual part In general: CWi = CWj o First(CWi+1) and j<i And in particular: CW4 = CW3 o First(CW5) 05/07/2018 Applied Algorithmics - week7

LZW Compression - example cw-2 = b cw-1 = a cw0 = ba cw1 = ab cw2 = bb cw3 = bab cw4 = babb cw5 = babba a b cw-1 cw-2 a b b cw1 cw2 cw0 b cw3 b cw4 a cw5 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 LZW Compression - compression stage 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 LZW Compression - compression stage cw  ; while ( read next symbol s from IN ) if cw·s exists in the dictionary then cw  cw·s; else add cw·s to the dictionary; save cw in OUT; cw  s; 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 Decompression stage Input IN – Compressed file of integers. Output OUT – Decompressed file of characters. |IN| = Z – Size of the compressed file. Copy all numbers from file IN to vector V [256………..Z+255] Create vector F [256…Z+255] containing first characters of each code word Create vector CW [256…Z+255] of all code words for i=256 to Z+255 do if V[i] < 256 then CW[i]  Concatenate(char(V[i]), F[i+1]) else CW[i]  Concatenate(CW(V[i]), F[i+1]) Write to the output file OUT all code words without their last symbols 05/07/2018 Applied Algorithmics - week7

Applied Algorithmics - week7 Theorem: For any input string S LZW algorithm computes its compressed counterpart in time O(n), where n is the length of S. Proof: The most complex and expensive operations are performed on dictionary. However with a help of hash tables all operations can be performed in linear time. Also decompression process is linear. 05/07/2018 Applied Algorithmics - week7