Adaptive Huffman Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-20062 Why Adaptive Huffman Coding? Huffman coding suffers.

Slides:



Advertisements
Similar presentations
More on Canonical Huffman coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a As we have seen canonical Huffman coding allows.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
An introduction to Data Compression
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
296.3: Algorithms in the Real World
Lecture04 Data Compression.
Compression & Huffman Codes
Introduction to Data Compression
Adaptive Huffman coding The trees are automatically generated by a Generator Designed at SUCO.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Huffman Coding: An Application of Binary Trees and Priority Queues
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Data Compression Basics & Huffman Coding
Gzip Compression and Decompression 1. Gzip file format 2. Gzip Compress Algorithm. LZ77 algorithm. LZ77 algorithm.Dynamic Huffman coding algorithm.Dynamic.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Encoding Veronica Morales.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
BAB 3 HUFFMAN CODE Universitas Telkom INFORMATION THEORY.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
Data Coding Run Length Coding
EE465: Introduction to Digital Image Processing
ISNE101 – Introduction to Information Systems and Network Engineering
Proving the Correctness of Huffman’s Algorithm
Chapter 9: Huffman Codes
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Proving the Correctness of Huffman’s Algorithm
Presentation transcript:

Adaptive Huffman Coding

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Why Adaptive Huffman Coding? Huffman coding suffers from the fact that the uncompresser need have some knowledge of the probabilities of the symbols in the compressed files this can need more bit to encode the file if this information is unavailable compressing the file requires two passes first pass: find the frequency of each symbol and construct the huffman tree second pass: compress the file

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a The key idea The key idea is to build a Huffman tree that is optimal for the part of the message already seen, and to reorganize it when needed, to maintain its optimality

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Pro & Con - I Adaptive Huffman determines the mapping to codewords using a running estimate of the source symbols probabilities Effective exploitation of locality For example suppose that a file starts out with a series of a character that are not repeated again in the file. In static Huffman coding, that character will be low down on the tree because of its low overall count, thus taking lots of bits to encode. In adaptive huffman coding, the character will be inserted at the highest leaf possible to be decoded, before eventually getting pushed down the tree by higher- frequency characters

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Pro & Con - II only one pass over the data overhead In static Huffman, we need to transmit someway the model used for compression, i.e. the tree shape. This costs about 2n bits in a clever representation. As we will see, in adaptive schemes the overhead is nlogn. sometimes encoding needs some more bits w.r.t. static Huffman (without overhead) But adaptive schemes generally compare well with static Huffman if overhead is taken into account

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Some history Adaptive Huffman coding was first conceived independently by Faller (1973) and Gallager (1978) Knuth contributed improvements to the original algorithm (1985) and the resulting algorithm is referred to as algorithm FGK A more recent version of adaptive Huffman coding is described by Vitter (1987) and called algorithm V

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a An important question Better exploiting locality, adaptive Huffman coding is sometimes able to do better than static Huffman coding, i.e., for some messages, it can have a greater compression... but we’ve assessed optimality of static Huffman coding, in the sense of minimal redundancy There is a contradiction?

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - I The basis for algorithm FGK is the Sibling Property (Gallager 1978) A binary code tree with nonnegative weights has the sibling property if each node (except the root) has a sibling and if the nodes can be numbered in order of nondecreasing weight with each node adjacent to its sibling. Moreover the parent of a node is higher in the numbering A binary prefix code is a Huffman code if and only if the code tree has the sibling property

9 Algorithm FGK - II Note that node numbering corresponds to the order in which the nodes are combined by Huffman’s algorithm, first nodes 1 and 2, then nodes 3 and a2a 3b3b 5d5d 6e6e 5c5c 11 f

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - III In algorithm FGK, both encoder and decoder maintain dynamically changing Huffman code trees. For each symbol the encoder sends the codeword for that symbol in current tree and then update the tree The problem is to change quickly the tree optimal after t symbols (not necessarily distinct) into the tree optimal for t+1 symbols If we simply increment the weight of the t+1-th symbols and of all its ancestors, the sibling property may no longer be valid  we must rebuild the tree

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - IV Suppose next symbol is “b” if we update the weigths sibling property is violated!! This is no more a Huffman tree 2a2a 3 b 5d5d 6e6e 5c5c 11 f b no more ordered by nondecreasing weight

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - V The solution can be described as a two- phase process first phase: original tree is transformed in another valid Huffman tree for the first t symbols, that has the property that simple increment process can be applied succesfully second phase: increment process, as described previously

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - V The first phase starts at the leaf of the t+1-th symbol We swap this node and all its subtree, but not its numbering, with the highest numbered node of the same weight New current node is the parent of this latter node The process is repeated until we reach the root

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - VI First phase Node 2: nothing to be done Node 4: to be swapped with node 5 Node 8: to be swapped with node 9 Root reached: stop! Second phase 2a2a 3 b 5d5d 6e6e 5c5c 11 f b

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Why FGK works? The two phase procedure builds a valid Huffman tree for t+1 symbols, as the sibling properties is satisfied In fact, we swap each node which weight is to be increased with the highest numbered node with the same weight After the increasing process there is no node with previous weight that is higher numbered

16 The Not Yet Seen problem - I When the algorithm starts and sometimes during the encoding we encounter a symbol that has not been seen before. How do we face this problem? We use a single 0-node (with weight 0) that represents all the unseen symbols. When a new symbol appears we send the code for the 0-node and some bits to discern which is the new symbol. As each time we send logn bits to discern the symbol, total overhead is nlogn bits It is possible to do better, sending only the index of the symbol in the list of the current unseen symbols. In this way we can save some bit, on average

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a The Not Yet Seen problem - II Then the 0-node is splitted into two leaves, that are sibling, one for the new symbol, with weight 1, and a new 0-node Then the tree is recomputed as seen before in order to satisfy the sibling property

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - summary The algorithm starts with only one leaf node, the 0-node. As the symbols arrive, new leaves are created and each time the tree is recomputed Each symbol is coded with its codeword in the current tree, and then the tree is updated Unseen symbols are coded with 0-node codeword and some other bits are needed to specify the symbol

19 Algorithm FGK - VII Algorithm FGK compares favourably with static Huffman code, if we consider also overhead costs (it is used in the Unix utility compact) Exercise Construct the static Huffman tree and the FGK tree for the message e eae de eabe eae dcf and evaluate the number of bits needed for the coding with both the algorithms, ignoring the overhead for Huffman SOL. FGK  60 bits, Huffman  52 bits FGK is obtained using the minimum number of bits for the element in the list of the unseen symbols

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm FGK - VIII if T=“total number of bits transmitted by algorithm FGK for a message of length t containing n distinct symbols“, then where S is the performance of the static Huffman (Vitter 1987) So the performance of algorithm FGK is never much worse than twice optimal

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm V - I Vitter in his work of the 1987 introduces two improvements over algorithm FGK, calling the new scheme algorithm As a tribute to his work, the algorithm is become famous... with the letter flipped upside-down... algorithm

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a The key ideas - I swapping of nodes during encoding and decoding is onerous In FGK algorithm the number of swapping (considering a double cost for the updates that move a swapped node two levels higher) is bounded by, where is the length of the added symbol in the old tree (this bound require some effort to be proved and is due to the work of Vitter) In algorithm V, the number of swapping is bounded by 1

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a The key ideas - II Moreover algorithm V, not only minimize as Huffman and FGK, but also, i.e. the height of the tree, and, i.e. is better suited to code next symbol, given it could be represented by any of the leaves of the tree This two objectives are reached through a new numbering scheme, called implicit numbering

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Implicit numbering The nodes of the tree are numbered in increasing order by level; nodes on one level are numbered lower than the nodes on the next higher level Nodes on the same level are numbered in increasing order from left to right If this numbering is satisfied (and in FGK it is not always satisfied), certain types of updates cannot occur

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a An invariant The key to minimize the other kind of interchanges is to maintain the following invariant for each weight w, all leaves of weight w precede (in the implicit numbering) all internal nodes of weight w The interchanges, in the algorithm V, are designed to restore implicit numbering, when a new symbol is read, and to preserve the invariant

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Algorithm V - II if T=“total number of bits transmitted by algorithm V for a message of length t containing n distinct symbols“, then At worst then, Vitter's adaptive method may transmit one more bit per codeword than the static Huffman method Empirically, algorithm V slightly outperforms algorithm FGK