ENHANCED EXTRACTION FROM HUFFMAN ENCODED FILES Shmuel T. Klein Dana Shapira Bar Ilan University Ariel University PSC-AUGUST 2015.

Slides:



Advertisements
Similar presentations
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
B+ - Tree & B - Tree By Phi Thong Ho.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
CSE Lectures 22 – Huffman codes
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
B+ Trees COMP
 Divide the encoded file into blocks of size b  Use an auxiliary bit vector to indicate the beginning of each block  Time – O(b)  Time vs. Memory.
Dr. O.Bushehrian ALGORITHM DESIGN HUFFMAN CODE. Fixed length code a: 00b: 01c: 11 Given this code, if our file is ababcbbbc our encoding is
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Introduction n – length of text, m – length of search pattern string Generally suffix tree construction takes O(n) time, O(n) space and searching takes.
Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
CMSC 100 Storing Data: Huffman Codes and Image Representation Professor Marie desJardins Tuesday, September 18, 2012 Tue 9/18/12 1CMSC Data Compression.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Huffman Coding The most for the least. Design Goals Encode messages parsimoniously No character code can be the prefix for another.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Huffman encoding.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Huffman Coding with Non-Sorted Frequencies Shmuel Tomi Klein Dana Shapira Bar Ilan University, Ashkelon Academic College, ISRAEL.
Lower Bounds & Sorting in Linear Time
Succinct Data Structures
HUFFMAN CODES.
Greedy Algorithms Alexandra Stefan.
CSC317 Greedy algorithms; Two main properties:
Assignment 6: Huffman Code Generation
Top 50 Data Structures Interview Questions
Chapter 5. Greedy Algorithms
The use and Usefulness of Fibonacci Codes
B+ Tree.
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 9: Huffman Codes
Find in a linked list? first last 7  4  3  8 NULL
Advanced Algorithms Analysis and Design
Lower Bounds & Sorting in Linear Time
Trees Addenda.
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Podcast Ch23d Title: Huffman Compression
Metric Preserving Dense SIFT Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

ENHANCED EXTRACTION FROM HUFFMAN ENCODED FILES Shmuel T. Klein Dana Shapira Bar Ilan University Ariel University PSC-AUGUST 2015

FIXED AND VARIABLE LENGTH CODES i i PSC-AUGUST 2015

 Divide the encoded file into blocks of size b  Use an auxiliary bit vector to indicate the beginning of each block  Time – O(b)  Time vs. Memory storage tradeoff PSC-AUGUST 2015

 Grossi, Gupta and Vitter – PSC-AUGUST 2015

RANK AND SELECT  rank 1 (B,i)- (resp. rank 0 (B,i)) - the number of 1s (resp. 0s) up to and including position i in B  select 1 (B,i)- (resp. select 0 (B,i)) - returns the index of the i th 1 (resp. 0s) PSC-AUGUST 2015

 Random Access to Huffman files  Enhanced Direct Access  Rank and Select using Skeleton Wavelet trees.  Reduced Skeleton trees  Experimental Results PSC-AUGUST 2015

When scanning its leaves from left to right they appear in non-decreasing order of their depth.

 T = A--HUFFMAN--WAVELET--TREE--MATTERS   = {-,E,A,T,F,M,R,H,L,N,S,U,V,W}  fr = {8,5,4,4,2,2,2,1,1,1,1,1,1,1}  E (T)= … - EAT FM RHLNSUVW

- EAT FM RHLNSUVW  E (T)= …  The bitmaps can be stored as a single bit stream

EXTRACT extract(V, i){ cw  while v is not a leaf if B v [i] = 0 v left(v) cw cw  0 i rank 0 (B v, i) else v right(v) cw cw  1 i rank 1 (B v, i) return D (cw)

EAT FM RHLNSUVW  A canonical Huffman tree from which all full subtrees of depth h  1 have been pruned.

 A canonical Huffman tree from which all full subtrees of depth h  1 have been pruned.  The prefix is the shortest necessary in order to identify the length EAT FM RHLNSUVW

EAT FM RHLNSUVW  A canonical Huffman tree from which all full subtrees of depth h  1 have been pruned.  The prefix is the shortest necessary in order to identify the length.

THE SELECT OPERATION - EAT FM RHLNSUVW PSC-AUGUST 2015

THE SELECT OPERATION EAT FM RHLNSUVW PSC-AUGUST 2015

DERIVING A LOWER BOUND AND EXACT VALUE  Deriving a lower bound: o If select(x,i)=j then the index of the i th occurrence of x is  j  Deriving an exact value: o First occurrence of x: if select(x,1)=j and extract(v root,j)=x o Otherwise: counter 0; k 1; m 0; while counter < i and m  n m select(x,k); if extract(v root, m)=x counter++; k++

REDUCED SKELETON TREES  The reduced Skeleton tree prunes the Skeleton Huffman tree at internal node at which the length of the current codewords belong to a set of size at most 2.  Canonical, Skeleton and Reduced Huffman Tree for 200 elements of Zipf distribution with weights p i =1/(i H n ), where H n is the harmonic number PSC-AUGUST 2015

EXTRACT FOR REDUCED SKELETON TREE else // h(v)  0 cw cw  B v [h(v)  i..h(v)  (i+1)-1] if flag(v)=1 then if cw  first codeword of length |cw| then remove trailing 0 from cw return D (cw) PSC-AUGUST 2015

THANK YOU !!! PSC-AUGUST 2015