Delta Encoding in the compressed domain A semi compressed domain scheme with a compressed output.

Slides:



Advertisements
Similar presentations
Information Retrieval in Practice
Advertisements

<Insert Picture Here>
Data Compression CS 147 Minh Nguyen.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
INSTRUCTION SET ARCHITECTURES
CS252: Systems Programming Ninghui Li Program Interview Questions.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J.
Chapter 11: File System Implementation
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
File Management Systems
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
On the interdependence of routing and data compression in multi-hop sensor networks Anna Scaglione, Sergio D. Servetto.
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Chapter 4: Transaction Management
Domain Name System: DNS
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
2015/7/2Deadlock-free Packet Switching1 Introduction to Distributed Algorithm Part One: Protocols Chapter 5- Deadlock-free Packet Switching Teacher: Chun-Yuan.
File System Implementation
Adnan Ozsoy & Martin Swany DAMSL - Distributed and MetaSystems Lab Department of Computer Information and Science University of Delaware September 2011.
1 3 Web Proxies Web Protocols and Practice. 2 Topics Web Protocols and Practice WEB PROXIES  Web Proxy Definition  Three of the Most Common Intermediaries.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
1 Analysis of Algorithms Chapter - 08 Data Compression.
JPEG image compression Group 7 Arvind Babel (y07uc024) Nikhil Agarwal (y08uc086)
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Lesson 11: Looking at Files and Folders what a file or folder is on the computer how to recognize a file or folder on the desktop how to recognize the.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
SMS Software Distribution. Overview  Explaining How SMS Distributes Software  Managing Distribution Points  Configuring Software Distribution and the.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Author: Hoang Le, Weirong Jiang and Viktor K. Prasanna Publisher: IEEE.
Performed by: Dor Kasif, Or Flisher Instructor: Rolf Hilgendorf Jpeg decompression algorithm implementation using HLS PDR presentation Winter Duration:
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Storage and File structure COP 4720 Lecture 20 Lecture Notes.
1 Copyright © 2011 Tata Consultancy Services Limited Virtual Access Storage Method (VSAM) and Numeric Intrinsic Functions (NUMVAL and NUMVAL-C) LG - TMF148.
1 Chapter 2 Notation and Definitions Data Structures Transformations.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Storage and File Organization
Module 11: File Structure
CPSC 231 Organizing Files for Performance (D.H.)
Compression of documents
Chapter 11: File System Implementation
Data Compression.
IP Routers – internal view
of Dynamic NFV-Policies
Web Caching? Web Caching:.
Data Compression CS 147 Minh Nguyen.
Improving Program Efficiency by Packing Instructions Into Registers
Chapter 11: File System Implementation
CS212: Object Oriented Analysis and Design
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Image Coding and Compression
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Delta Encoding in the compressed domain A semi compressed domain scheme with a compressed output

Agenda Delta encoding types and schemes Applications The algorithm principles Results Similar works Contributions

The Problem We would like to have a version updating algorithm which transforms a compressed reference into a compressed version without decoding and re-encoding a reference.

What is “Delta Encoding” Definition: Delta Encoding is the task of compactly encoding a new version as a set of copy and add commands using a reference.

Types Of Delta Encoding Uncompressed domain Compressed domain Semi Compressed domain The proposed Semi Compressed domain with compressed output

Why Semi Compressed Scheme Textual data is produced in an uncompressed form Digital data is first acquired then compressed for most cases This work focuses on the data network path

Compression Base We uses LZSS (Storer-Syzmanski) as the compression base LZSS has (off,len) & strings mixed structure LZSS is a repetitions based algorithm (LZ family)

Delta Compression The Schemes

Uncompressed Domain version reference Delta Encoder Decoder

Compressed Domain Ver c Ref c Delta Encoder Decoder version

Semi Compressed Domain version Ref c Delta Encoder Decoder version

The Proposed Semi Compressed Domain With Compressed Output version Ref c Delta Encoder Decoder Ver c

The Main Differences 1.Delta file has additional new commands 2.The decoder manipulates the compressed reference to become the compressed version 3.Decoder outputs the compressed version

Applications Forward and reverse proxies Caching devices Traffic accelerators Server farming Low bandwidth networks Online storage & backups Version & source control All the intermediate devices do not use the data but only transfer it ! ! !

Application – The Topology

The Key Benefits Eliminate the need to extract, compare and re-encode  reduction in CPU consumption Network Hop by Hop scheme of data caching. Reducing storage space Reducing decompression work space.

The Algorithmic Steps For Each Scheme Type

Uncompressed Domain stepServerNetworkClient 1 Decompress (R c )  RDecode (R c )  R 2 Delta Encode (R,V)   Delta Decode (R,  )  V 3 Compress (V)  V c 4 Store V c  R c ’ 5 Send  Store  6 Send 

Compressed Domain stepServerNetworkClient 1 Compress (V)  V c Delta Decode (R c,  )  V 2 Delta Encode (R c, V c )   Compress (V)  V c 3 Store V c  R c ’ 4 Store  5 Send  6

Semi Compressed Domain With Compressed Output stepServerNetworkClient 1 Delta Encode (R c, V)   Delta Decode (R c,  )  V c 2 Decode (R c,  )  V c Store V c  R c ’ 3 Store  Decode (V c )  V 4 Store  Send  5 6

The Algorithm Principles Iterative Steps Of Encode And Compare Local Reference Approach Dependency chain breaking

Constraints And Assumptions 1.Both versions are highly correlated 2.The changes are local and sparse 3.The change size is very small compared to the size of the version 4.We do not seek optimal solution but rather to show that there exist a comprehensive solution

Ref : (10,10)(10,20) Ver : 1 st Ver: Local Reconstruction : The Algorithm Principles (10, 4)

The Algorithm Principles How to detect mismatch type How to handle a mismatch Dependency chain breaking Synchronizing the encoder to continue encode and compare

The Algorithm Principles - Replacement Determined by scanning forward both version and the temporary local reconstructed buffer Bounded by the change maximum length ( > i ) and by O ( I * synch )

The Algorithm Principles - Insertion Determined by version skipping and comparing to the temporary local reconstructed buffer Bounded by the change maximum length ( > j ) and by O ( j * synch )

The Algorithm Principles - Deletion Determined by skipping forward in temporary local reconstructed buffer Bounded by the change maximum length ( > j ) and by O ( j * synch )

Handling A Mismatch According to mismatch type –Add or remove characters –Add or remove pointers –Split pointers into 3 parts Prefix – up to the change The change Postfix – after the change

Handling A Mismatch - Example Ref : (10,10)(10,20) Ver : 1 st Ver: Local Reconstruction : (10, 4) Output to Delta file : SplitTo3 command for pointer (10,10)SplitTo3 command for pointer (10,10) (10,4)(10,4) [ 6 ] [ 6 ] (10,5)(10,5) And we need to break the dependency chain of pointer (10,20)

Handling A Mismatch - Advance If the mismatch covers a set of elements –We will replace the entire section (pointers might be split and characters replaced) –Break the dependency chain

xxxxxxx Handling A Mismatch - Advance Ref : Ver : 1 st Ver: Local Reconstruction : (10, 4) (10,10)(10,20) change result to Delta file : 1.SplitTo3 command 1.(10,4) 2. [ xxxxxx ] SplitTo3 command [ x ] 6.(20,9)!(=CB) Exceptional case: self pointer For (10,20) we use the local reconstructed buffer to continue the reconstruction ADDP (30,10) 7. ADDP (30,10)

R c = (10,10)(10,20) V c = (10,4)xxxxxx(0,0)(0,0)x(20,9)(30,10) Handling A Mismatch - Advance V c = (10,4)xxxxxxx(20,9)(30,10) Delta File: (3 bit per command, offset = 16 bit, length = 8 bit ) 1.Copy [0,9] 2.SplitTo3 (10,4) [xxxxxx] 0 3.SplitTo3 0 [x] (20,9) 4.ADDP (30,10) Total of 172bits Re-encoding V produces 208 bits output (10,4)x(1,6)(10,3)(20,10)(10,6) Saving ~20% of the bits in this short sample

Handling A Mismatch - LSP LSP is calculated according to the reference LSP might be located beyond the version’s change Encoder’s internal data structure synchronization

Chain Breaking A must, due to the repetition base algorithmic nature of LZ based compressions Quarantines – restricted zones and change tags Pointer modifications are bounded by window size – first occurrence elimination Part of the encoder’s implementation (Hash, tags …)

The Delta File Commands COPY – instruct the decoder to copy part of the reference ADDP – Add a pointer to the compressed version ADDS – Same but adds a string

The Delta File Commands SplitTo3 – instruct the decoder to break an element into 3 parts ADJUSTJP – instruct the decoder to adjust pointers offsets CTag ( optional )- Marks to the decoder a specific tagged change boundaries (uncompressed)

The Decoder Modifies the compressed reference to become the compressed version Linear in time and space Do not need temporary decompression space

The Decoder R c = (10,10)(10,20) Delta File: 1.Copy [0,9] 2.SplitTo3 (10,4) [xxxxxx] 0 3.SplitTo3 0 [x] (20,9) 4.ADDP (30,10) V c = (10,4)xxxxxxx(20,9)(30,10)

Results Linear Time & Space encoding/decoding Constant bound addition of compares (Locality) Throughput is very similar to base LZSS encoding/decoding

Results

Similar Works T. Serebro - Modeling delta encoding of compressed files (2006) S. Klein & D. Shapira - Compressed delta encoding for lzss encoded files (2007)

Contributions Comprehensive solution Addresses insertion, deletion and replacement local reference approach – no right to left decoding CDELTA -New Delta File scheme Ongoing Dependency chain breaking

Contributions Utilization of textual data being produced uncompressed Network perspective - devices along the path stores & forwards data (decoder compressed output ) Implementation of the algorithms – a proof of concept

Thank You

Chain Breaking