More on Canonical Huffman coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2005-20062 As we have seen canonical Huffman coding allows.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
Foundations of Data Structures Practical Session #7 AVL Trees 2.
Huffman Coding An implementation using C++ STL (part of the material is due to the work of Mark Nelson, Dr. Dobb’s Journal, January 1996)
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Min Chen School of Computer Science and Engineering Seoul National University Data Structure: Chapter 9.
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
An introduction to Data Compression
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Heapsort. 2 Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n) Quicksort.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 19: Heap Sort.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
A Data Compression Algorithm: Huffman Compression
Searches & Sorts V Deena Engel’s class Adapted from W. Savitch’s text An Introduction to Computers & Programming.
Chapter 9: Huffman Codes
CSE Lectures 22 – Huffman codes
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
Heapsort Based off slides by: David Matuszek
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Heapsort CSC Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n)
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
data ordered along paths from root to leaf
Heapsort. Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines.
P p Chapter 10 has several programming projects, including a project that uses heaps. p p This presentation shows you what a heap is, and demonstrates.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Adaptive Huffman Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Why Adaptive Huffman Coding? Huffman coding suffers.
1 Joe Meehean.  We wanted a data structure that gave us... the smallest item then the next smallest then the next and so on…  This ADT is called a priority.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Liang, Introduction to Java Programming, Tenth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 23 Sorting.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Sorting With Priority Queue In-place Extra O(N) space
HUFFMAN CODES.
Applied Algorithmics - week7
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Ch. 8 Priority Queues And Heaps
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Greedy Algorithms Alexandra Stefan.
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
CO 303 Algorithm Analysis and Design
Presentation transcript:

More on Canonical Huffman coding

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a As we have seen canonical Huffman coding allows a faster decompression, and a more efficient use of memory But... until now we’ve supposed that code lengths are given. In order to prove the viability of canonical Huffman we need a way to compute the code lengths

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Computing code lengths - I The efficiency of computing the code lengths affects only encoding performances, and encoding is less important than decoding However, when we use whole words as source symbols, it is common to have an alphabet composed of many thousand different symbols (words). In this case efficient solutions can make the difference w.r.t. the use of the traditional Huffman tree

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a The problem Input data a file of n integers, where the i-th integer is the number of times symbol i appears in the text if is the i-th integer in the file, the probability of the i-th symbol is... only small changes if we have directly a file with the probabilities Output data Huffman code lengths

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a The key idea We use a heap data structure It is easy to find the smallest value, that is located at the root There is an elegant and efficient solution to store in memory the binary tree using a simple vector

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a A closer view to the heap - I As we have already seen, the left child of a node in position i is in position 2i, while the right child si stored in location 2i+1. This means that the parent of a node in position k is in location What is the depth of the heap if there are n elements? SOL.

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a A closer view to the heap - II How this heap is stored in an array?

A closer view to the heap - III Removing the smallest item and reorganizing the heap cost, 2 comparisons for each level of the tree < > 8 8 < > 8 11 < > 11

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Construction of the heap It is possible to prove that the cost of constructing a heap from an unsorted list of n items requires about 2n comparisons By the way, how much does it cost, in the worst case, to sort the vector with one of the popular sorting algorithms? SOL. nlogn

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Computing code lengths - II The algorithm works constructing an initial array with 2n positions the last n positions store the frequencies of the symbols the first half of the array contains a heap in order to efficiently locate the symbol with le lowest frequency As entries are removed from the heap, freed space is used to store branches of the tree

Computing code lengths - III Also frequencies are overwritten to store the pointers that constitute the tree At the end the array contains only a one- element heap and the Huffman tree heap leaves (frequencies) 1n2n heap leaves & tree pointers 1h2n tree pointers h=12n

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Computing code lengths: phase 1 Frequencies are read from the file and stored in the last n positions of the array (let’s call it A) Each position, points to the corrispondent frequency A[n+i] Then the first half of A is ordered in a heap with the method seen before In practice we must ensure that At the end A[1] stores m1: A[m1]=min{A[n+1...2n]}

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Computing code lengths: phase 2 h=n while h>1 { m1=A[1] -- take the root of the heap h=h-1 -- position A[h] is no more part of the heap “reorder the heap” m2=A[1] -- now m1,m2 point to the two smallest freq. A[h+1]=A[m1]+A[m2] -- new item is saved in pos. h+1 A[1]=h+1 -- the new element is pushed back in the heap A[m1]=A[m2]=h+1 -- smallest frequencies are discarded and changed into tree pointers “reorder the heap” }

example 457 1hm1m h+1m1m1m2m2 97 1m1m1m2m h m1m1m2m2

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Computing code lengths: phase 3 After n-1 iterations a single aggregate remains in the heap, in position 2 and A[1]=2, as this is the only item in the heap To find the deep in the tree of a particular leaf we can simply start from it and follow the pointers until we reach location 2. The number of pointers followed are the desidered length for { d=0; r=i; -- d is the counter, r is the current element while r>2 d=d+1; r=A[r]; -- follow the pointer to location 2 A[i]=d } -- now A[i] is the length of codeword i

The cost of the algorithm first phase  O(n) second phase  knlogn a heap of n element is reordered about 2n times each reordering takes iterations, each of which has a constant cost third phase  O(n 2 ) in the worst case there is one iteration for each bit of each codeword how many total bits? uniform distribution  n*logn bits worst case: i-th symbol requires i bits to be coded

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Revised third phase - I Note that nodes are added to the tree from position h toward position 2 for this reason all pointers of the tree are “from right to left” Then if we start from position 2 towards position 2n, when we find a node we’ve always already found its parent!

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a Revised third phase - II So a very efficient algorithm (O(n)) to find code lengths is to start from position 2 that has length 0 and then to proceed towards position 2n labeling each position with the length associated to its parent augmented by 1 Then third phase becomes A[2]=0 for i=3 to 2n A[i]=A[A[i]]+1 -- A[A[i]] is the parent of A[i]