Evaluating memory compression and deduplication Yuhui Deng, Liangshan Song, Xinyu Huang Department of Computer Science Jinan University.

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
SEMINAR ON FILE SLACK AND DISK SLACK
Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration Xiang Zhang 1,2, Zhigang Huo 1, Jie Ma 1, Dan Meng 1 1. National Research Center.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
School of Computing Science Simon Fraser University
SWE 423: Multimedia Systems
A Data Compression Algorithm: Huffman Compression
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Elementary Data Types Scalar Data Types Numerical Data Types Other
Methods of Image Compression by PHL Transform Dziech, Andrzej Slusarczyk, Przemyslaw Tibken, Bernd Journal of Intelligent and Robotic Systems Volume: 39,
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Chapter 7 Special Section Focus on Data Compression.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
VPC3: A Fast and Effective Trace-Compression Algorithm Martin Burtscher.
Compressed Instruction Cache Prepared By: Nicholas Meloche, David Lautenschlager, and Prashanth Janardanan Team Lugnuts.
An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.
Compression Algorithms Robert Buckley MCIS681 Online Dr. Smith Nova Southeastern University.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Memory Systems Architecture and Hierarchical Memory Systems
1 Telematics/Networkengineering Confidential Transmission of Lossless Visual Data: Experimental Modelling and Optimization.
IMAGE COMPRESSION USING BTC Presented By: Akash Agrawal Guided By: Prof.R.Welekar.
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Improving Content Addressable Storage For Databases Conference on Reliable Awesome Projects (no acronyms please) Advanced Operating Systems (CS736) Brandon.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Group No 5 1.Muhammad Talha Islam 2.Karim Akhter 3.Muhammad Arif 4.Muhammad Umer Khalid.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Damian Gordon. HARD DISK (MAIN MEMORY) (SECONDARY MEMORY) 2 CACHE 1.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
A New Operating Tool for Coding in Lossless Image Compression Radu Rădescu University POLITEHNICA of Bucharest, Faculty of Electronics, Telecommunications.
Speeding up Lossless Image Compression: Experimental Results on a Parallel Machine Luigi Cinque and Sergio De Agostino Computer Science Department Sapienza.
Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Lossless Compression(2)
JPEG.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Compressing Bi-Level Images by Block Matching on a Tree Architecture Sergio De Agostino Computer Science Department Sapienza University of Rome ITALY.
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
Computer Sciences Department1. 2 Data Compression and techniques.
IS502:M ULTIMEDIA D ESIGN FOR I NFORMATION S YSTEM M ULTIMEDIA OF D ATA C OMPRESSION Presenter Name: Mahmood A.Moneim Supervised By: Prof. Hesham A.Hefny.
Virtual Memory By CS147 Maheshpriya Venkata. Agenda Review Cache Memory Virtual Memory Paging Segmentation Configuration Of Virtual Memory Cache Memory.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
IMAGE COMPRESSION.
Succinct Data Structures
Data Compression.
ITEC 202 Operating Systems
Applied Algorithmics - week7
Data Compression.
Data Compression CS 147 Minh Nguyen.
UNIT IV.
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Virtual Memory: Working Sets
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Presentation transcript:

Evaluating memory compression and deduplication Yuhui Deng, Liangshan Song, Xinyu Huang Department of Computer Science Jinan University

Agenda Motivation Memory compression Memory deduplication Characteristics of memory data Evaluation Conclusions

Motivation Many programs require more RAM (e.g. memory database). For example, the maximal number of virtual machines that can run on a physical machine is in most cases limited by the amount of RAM space on that physical machine. Furthermore, the runtime of programs that page or swap is likely to be dominated by the disk access time when the amount of physical RAM is less than what the programs require. Therefore, we need more RAM!!!

Data compression Data compression transforms a string of characters into a new string that contains the same information but whose length is as small as possible. The compression algorithms are categorized as lossless compression and lossy compression.

Memory compression Memory compression reserves some memory space that would normally be used directly by programs, It compresses relatively unused memory pages, and stores the compressed pages in the reserved space. This method enlarges effective memory space. We employ six lossless compression algorithms (Arithmetic algorithm, Huffman algorithm, LZ77, LZ78, LZW, and RLE) to compress memory data.

Data deduplication Data deduplication involves chunking and deduplication detection. The chunking phase splits data into non-overlapping data blocks (chunks). The duplication detection phase detects if another chunk with exactly the same content has already been stored by using hash algorithms. Chunking phase can be classified into four categories: Whole file chunking (WFC), Fixed-size partition (FSP), Content-defined Chunking (CDC), Sliding Block (SB).

Memory deduplication Memory deduplication periodically calculates a unique hash number for every physical memory page by using hash algorithms such as MD5 and SHA-1. The calculated hash number is then compared against other existing hash numbers in a database that dedicates for storing page hash number. If the hash number is already in the database, the memory page does not need to be stored again, a pointer to the first instance is inserted in place of the duplicated memory page. Otherwise, the new hash number is inserted into the database and the new memory page is stored.

Evaluation environment Trace name Description Size (GByte) 1espressoA circuit simulator1.47 2gcc-2.7.2A GNU C/C++ compiler1.13 3gnuplotA GNU plotting utility1.47 4grobnerCalculated Grobner basis functions3.28 5lindsayA hypercube simulator1.11 6p2cA Pascal->C transformer1.36 7rschemeAn implementation of Scheme0.25 Table 1. Features of memory page traces

ComponentsDescription CPUIntel Core2 T6400 (2M Cache, 2.00 GHz, 800 MHz FSB) Memory2G, DDR2 800MHz Hard disk WDC WD2500BEVT-60ZCT1 ( 250GB /5400RPM ) ChipsetIntel 4 Series - ICH9M Table 2. Configuration of the experimental platform

Statistic results TracenameZero(%) Continuous zeros (%) Bound(%)Low(%)Power(2,n)(%)Entropy espresso gcc gnuplot grobner lindsay p2c rscheme Table 3. Statistic results of the page image traces Entropy is normally employed as a measure of redundancy. The entropy of a source means the average number of bits required to encode each symbol present in the source. The compressibility grows with the decrease of the entropy value.

The Zero column indicates that a large volume of memory data are zero bytes across seven traces. The Continuous zero column summarizes the percentage of pages that contain continuous zeros longer than 32 bytes. The Bound column includes the percentage of memory data which is continuous zeroes longer than 32 bytes, and the continuous zeroes start or end at a page boundary. The Low column shows percentage of memory data that are low values ranging between 1 and 9. The Power(2,n) column implies the integral power-of-two values. It indicates the values of 2, 4, 8, 16, 32, 64, 128, 255 using decimal.

Compression ratio The compression ratio is defined as the size of compressed memory data divided by the size of uncompressed memory data. The block size of this evaluation is 8KB that is equal to two memory pages. It shows that the LZ algorithms (LZ77, LZ78, LZW) obtain the significant compression ratios (around 0.4), gunplot trace achieves the best compression ratio across the six algorithms.

Compression/decompression time the compression time of LZ77 and decompression time of LZW are over 70 milliseconds and 50 milliseconds, respectively. Not acceptable! the latest Hitachi Ultrastar 15K milliseconds LZ78 strikes a good balance between compression ratio, compression time, and decompression time. Compression timedecompression time

(a)compression ratio (b)compression time (c) decompression time Impact of block size (LZ78 ) It shows that the compression ratio decreases with the increase of block size (from 4Kbyte to 128Kbyte) across the seven traces. Fig. (b) and (c) reveal that the bigger the block size is, the higher the compression and decompression time are. This pattern is reasonable, because larger data block is more compressible and requires more time to compress and decompress. However, the performance decrease is not linearly proportional to the block size.

Deduplication ratio It shows that FSP-4K and SB-4K achieve the best deduplication ratio across the seven traces. When the chunking size of FSP is increased from 4Kbyte to 32Kbyte, the deduplication ratio is significantly increased. The deduplication ratio of CDC is close to 1. This evaluation adopts three schemes including FSP, CDC, and SB.

Deduplication/restore time From a compression ratio standpoint, FSP-4K and SB-4K are the best candidates to perform memory deduplication. The deduplication time of SB-4K is about 40 times higher than that of the FSP-4K, although the restore time is comparable. FSP-4K is the best candidate policy for memory deduplication. Please note that the Y axis of the two figures is in microseconds.

The chunking size has an opposite impact on the performance of compression and deduplication. This is because the probability of those identical characters contained in a chunk grows with the increase of the chunk size, while the probability of two chunks that are exactly the same is decreased with the growth of the chunk size.

Conclusion Memory deduplication greatly outperforms memory block compression. Fixed-size partition (FSP) achieves the best performance in contrast to Content-defined Chunking (CDC) and Sliding Block (SB). The optimal chunking size of FSP is equal to the size of a memory page. We believe, the analysis results in this paper should be able to provide useful insights for designing or implementing systems that require abundant memory resources to enhance the system performance.

Thanks!