Download presentation
Presentation is loading. Please wait.
Published bySydney Carr Modified over 9 years ago
1
An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng
2
Motivation Deduplication detects duplicate pages in storage NetApp, Data Domain: billion $ business We explore another direction: use deduplication in swappers Our experimental results indicate that using deduplication in swappers is beneficial
3
What is a swapper? A mechanism to expand usable address spaces Swap out: swap a page in memory to swap area Swap in: swap a page in swap area to memory Swap area is on disk pte’ Free P1 Used P1
4
Why deduplication is useful? Writes to disk is slow Disk accesses is much slower than memory! When duplicate pages exist: Do we really need to swap out all of them? If a duplicate page appear in swap area, we can save one I/O. P1P3P2 P1
5
Architecture Swap out A page Compute checksum Lookup in the dedup cache YES Skip pageout pageout NO Add to dedup cache
6
Computing Checksum SHA-1 checksum (160bit) Collision probability of one in 2 80 Only use the first 32bit (one in 2 16 ) Related to the implementation of dedup cache Only store checksum We assume two pages are identical if their checksums are equal Trade consistency for performance
7
Dedup Cache Dedup cache - radix tree Checksum -> dedup_entry_t A Trie with O(|key|) lookup and update overhead Well written in the kernel Key in radix tree is 32 bits We only keep the first 32 bits of a checksum as key
8
Entries in Dedup Cache The index of a page in swap area The number of duplicates pages given a checksum A lock for consistency typedef struct { swp_entry_t base; atomic_t count; spinlock_t lock; }dedup_entry_t;
9
Changes to Linux Kernel Swap cache swap_entry_t ->page Avoid repeatedly swapping in Happens when a page swapped out is shared by multiple processes Example Process A and B share the page P P is swapped out, PTE in A and B are updated A wants to access P B wants to access P
10
Will dedup cache grows infinitely? Swap Counter for each swap_entry_t # of reference in the memory counter++ when one more pte contains swap_entry_t It’s in swap cache It’s in dedup cache counter-- when swap in a page remove swap_entry_t from dedup cache and swap cache when counter = 2
11
Reference Counters (4) A B Swap cache dedup cache Swap area (2)
12
Changes to Swap Cache Maintain the mapping between swap_entry and page We change that mapping to swap_entry and a list of pages of same contents Why we need a list?
13
Possible Inconsistency Swap out page P1 to swap_entry e1 Swap out page P2, a duplicate of P1 The mapping of e1->P2 can not be added to swap cache Swap in P1: mapping is deleted Swap in P2: Ooops! Swap Cache E1 -> P1
14
Our Solution Swap out page P1 to swap_entry E1 Swap out page P2, a duplicate of P1 The mapping of e1->P2 is added to the list Swap in P1: only P1 is deleted Swap in P2: delete E1->P2 Swap Cache E1 -> P2E1 -> P1,P2 E1 -> P1
15
Experimental Evaluation We run our experiment on VMWare with Linux 2.6.26 Our testing program: sequentially access an array Each element is of size 4KB We change the percentage of duplicate pages in that array
16
All of the pages are duplicates Duplication significantly reduces the access time
17
No Duplicate Pages However, duplication also incurs a significant overhead
18
Overheads in Deduplication Major overheads: Calculating checksums: 35 us When a page is swapped in or swapped out, we all calculate the checksums. Maintain the reference counter Explicitly require locks impose significant overhead: average of 65 us in our experiments
19
Conclusion Deduplication is a double-edged sword in swappers When a lot of duplicate pages are presented, deduplication reduces the access time by orders of magnitude When few duplicate pages are presented, the overhead is also non-negligible
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.