An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.

Slides:



Advertisements
Similar presentations
Paging: Design Issues. Readings r Silbershatz et al: ,
Advertisements

More on File Management
The Linux Kernel: Memory Management
Effects of Virtual Cache Aliasing on the Performance of the NetBSD Operating System Rafal Boni CS 535 Project Presentation.
File Systems.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
1 Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud Chun-Ho Ng, Mingcao Ma, Tsz-Yeung Wong, Patrick P. C. Lee, John C. S. Lui.
Chapter 11: File System Implementation
CS 153 Design of Operating Systems Spring 2015
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
Jonathan Walpole Computer Science Portland State University
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
Memory Management.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
CS 333 Introduction to Operating Systems Class 19 - File System Performance Jonathan Walpole Computer Science Portland State University.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
DATA DEDUPLICATION By: Lily Contreras April 15, 2010.
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
Virtual Memory CS Introduction to Operating Systems.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
10/28/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
1 Shared Files Sharing files among team members A shared file appearing simultaneously in different directories Share file by link File system becomes.
1 Virtual Memory and Address Translation. 2 Review Program addresses are virtual addresses.  Relative offset of program regions can not change during.
Chapter 4 Memory Management Virtual Memory.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Virtual Memory 1 1.
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
CS333 Intro to Operating Systems Jonathan Walpole.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Demand Paging Reference Reference on UNIX memory management
Chapter 5 Index and Clustering
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 the hash table. hash table A hash table consists of two major components …
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Jonathan Walpole Computer Science Portland State University
CS703 - Advanced Operating Systems
Chapter 11: File System Implementation
Hashing CENG 351.
Review.
Chapter 11: File System Implementation
Filesystems 2 Adapted from slides of Hank Levy
Chapter 11: File System Implementation
Lecture 29: Virtual Memory-Address Translation
Overview: File system implementation (cont)
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Andy Wang COP 5611 Advanced Operating Systems
Contents Memory types & memory hierarchy Virtual memory (VM)
CSE451 Virtual Memory Paging Autumn 2002
Computer Architecture
Chapter 11: File System Implementation
Paging and Segmentation
Virtual Memory: Working Sets
Page Cache and Page Writeback
Virtual Memory 1 1.
Presentation transcript:

An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng

Motivation Deduplication detects duplicate pages in storage  NetApp, Data Domain: billion $ business We explore another direction: use deduplication in swappers Our experimental results indicate that using deduplication in swappers is beneficial

What is a swapper? A mechanism to expand usable address spaces  Swap out: swap a page in memory to swap area  Swap in: swap a page in swap area to memory Swap area is on disk pte’ Free P1 Used P1

Why deduplication is useful? Writes to disk is slow  Disk accesses is much slower than memory! When duplicate pages exist:  Do we really need to swap out all of them?  If a duplicate page appear in swap area, we can save one I/O. P1P3P2 P1

Architecture Swap out A page Compute checksum Lookup in the dedup cache YES Skip pageout pageout NO Add to dedup cache

Computing Checksum SHA-1 checksum (160bit)  Collision probability of one in 2 80  Only use the first 32bit (one in 2 16 ) Related to the implementation of dedup cache  Only store checksum We assume two pages are identical if their checksums are equal  Trade consistency for performance

Dedup Cache Dedup cache - radix tree  Checksum -> dedup_entry_t  A Trie with O(|key|) lookup and update overhead  Well written in the kernel Key in radix tree is 32 bits  We only keep the first 32 bits of a checksum as key

Entries in Dedup Cache The index of a page in swap area The number of duplicates pages given a checksum A lock for consistency typedef struct { swp_entry_t base; atomic_t count; spinlock_t lock; }dedup_entry_t;

Changes to Linux Kernel Swap cache  swap_entry_t ->page  Avoid repeatedly swapping in Happens when a page swapped out is shared by multiple processes Example Process A and B share the page P P is swapped out, PTE in A and B are updated A wants to access P B wants to access P

Will dedup cache grows infinitely? Swap Counter for each swap_entry_t  # of reference in the memory  counter++ when one more pte contains swap_entry_t It’s in swap cache It’s in dedup cache  counter-- when swap in a page  remove swap_entry_t from dedup cache and swap cache when counter = 2

Reference Counters (4) A B Swap cache dedup cache Swap area (2)

Changes to Swap Cache Maintain the mapping between swap_entry and page We change that mapping to swap_entry and a list of pages of same contents Why we need a list?

Possible Inconsistency Swap out page P1 to swap_entry e1 Swap out page P2, a duplicate of P1  The mapping of e1->P2 can not be added to swap cache Swap in P1: mapping is deleted Swap in P2: Ooops! Swap Cache E1 -> P1

Our Solution Swap out page P1 to swap_entry E1 Swap out page P2, a duplicate of P1  The mapping of e1->P2 is added to the list Swap in P1: only P1 is deleted Swap in P2: delete E1->P2 Swap Cache E1 -> P2E1 -> P1,P2 E1 -> P1

Experimental Evaluation We run our experiment on VMWare with Linux Our testing program: sequentially access an array  Each element is of size 4KB  We change the percentage of duplicate pages in that array

All of the pages are duplicates Duplication significantly reduces the access time

No Duplicate Pages However, duplication also incurs a significant overhead

Overheads in Deduplication Major overheads:  Calculating checksums: 35 us When a page is swapped in or swapped out, we all calculate the checksums.  Maintain the reference counter Explicitly require locks impose significant overhead: average of 65 us in our experiments

Conclusion Deduplication is a double-edged sword in swappers  When a lot of duplicate pages are presented, deduplication reduces the access time by orders of magnitude  When few duplicate pages are presented, the overhead is also non-negligible