Download presentation
Presentation is loading. Please wait.
Published byToby Brooks Modified over 8 years ago
1
Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah
2
this talk utilizing a deduplicating storage system within a fast disk-imaging system 3× decrease in storage negligible run-time overhead “don’t be the bottleneck” Aligned Fixed-size Chunking 2 VF results techniques
3
3 disk image server loaded on demand be fast! deliver data as fast as clients can receive it
4
4 disk image server
5
5 Utah Emulab 1,000+ disk images 21 TB total Amazon EC2 37,000+ public AMIs fast ☛ ☚ compact
6
deduplication 6 dedup. storage system
7
deduplication 7 Image 1 fingerprint 1; fingerprint 2; fingerprint 3; … Image 1 fingerprint 1; fingerprint 2; fingerprint 3; … Image 2 fingerprint 1; fingerprint 2; fingerprint 19; … Image 2 fingerprint 1; fingerprint 2; fingerprint 19; … small “recipe”
8
dedup. for disk images images are often derived from other images – users add packages to testbed “base” images – users’ work-in-progress snapshots – … a lot of duplicated data across images! 8
9
9 disk image server
10
10 disk image server dedup. disk image storage problem: dedup. storage can be slow our contrib: add dedup. without system slowdown
11
why is frisbee fast? compression use filesystem info pipeline independent “chunks” 11 disk image server lower network bandwidth smaller files fewer disk writes disk read net xfer decompress disk write keep receiving disk busy keep pipeline filled new clients can join sequential disk writes
12
from frisbee to VF Frisbee: disk images stored as files VF: disk image data stored in Venti reformed into chunks by Chunkmaker 12 disk image server Venti Chunkmaker [Quinlan & Dorward, FAST ’02] [Rhea et al., ATC ’08]
13
image corpus 430 Linux images from Utah Emulab – 76 “standard” images – 354 user-created images based on RedHat, Fedora, CentOS, & Ubuntu 13 Venti
14
addressing the challenges compression use filesystem info pipeline independent “chunks”
15
compression 15 Venti image server capture partition store retrieve
16
compression 16 Venti image server capture partition store retrieve compressed disk image compress poor deduplication (1.11×)
17
compression 17 Venti image server capture partition store retrieve
18
compression 18 Venti image server capture partition store retrieve compress disk data compress too slow compress30.29 MB/s disk write71.07 MB/s
19
compression 19 Venti image server capture partition store retrieve
20
compression 20 Venti image server capture partition retrieve store compressed dedup blocks compress preserves opportunities for dedup server retrieves & concatenates compressed blocks to form chunks 6% more chunks vs. original Frisbee
21
addressing the challenges compression use filesystem info pipeline independent “chunks”
22
use filesystem info exclude unallocated sectors from image promote sequential disk writes process the “stream” of allocated sectors 22
23
23 Venti 12345678 12345678 1234 5678 sector stream make dedup blocks via “fixed-size chunking”
24
24 Venti sector allocations & frees move the dedup block boundaries fixed-sized chunking over sector stream leads to poor deduplication across disk images 12345678 1234 5678 abc345678 abc345678 abc345678
25
aligned fixed-size chunking 25 123456789 abc3456789 deduplicate! Venti 12zz z345 6789zzab czzz block boundaries based on sector offsets “pad” partially filled blocks with zero sectors
26
how big should dedup blocks be? better dedup – more likely to match slower – more accesses to Venti lower compression ratio – less data per block more metadata per image lower dedup – less likely to match faster – fewer accesses to Venti higher compression ratio – more data per block less metadata per image 26 big — say, 48K small — say, 4K
27
addressing the challenges compression use filesystem info pipeline independent “chunks”
28
pipeline speed through parallelism choose maximum storage benefit that doesn’t slow down the pipeline 28 disk image server Venti read net xfer decompress disk write Venti i.e., the smallest dedup block size
29
29 ✖✖✖✔
30
image corpus @ 32K (compressed) image data:239.89 GB (compressed) data in Venti:073.62 GB deduplication ratio: 3.26 image metadata:001.49 GB total space savings versus Frisbee: 67.8% 30
31
addressing the challenges compression use filesystem info pipeline independent “chunks”
32
independent chunks chunk — Frisbee’s network protocol unit – contains multiple groups of sectors – client requests chunks until it has them all 32 disk image server Venti Chunkmaker Metadata chunk headers; fingerprints Metadata chunk headers; fingerprints
33
independent chunks client requests chunk find precomputed chunk metadata – chunk header – dedup block fingerprints retrieve dedup blocks from Venti concatenate blocks with header and transmit to client cache constructed chunk 33 disk image server Venti Chunkmaker Metadata chunk headers; fingerprints Metadata chunk headers; fingerprints
34
evaluation storage savings synchronized deployment staggered deployment
35
storage savings load our image corpus into Venti – 430 Linux images – load from oldest to newest track storage as images are added – compressed, dedup’ed data in Venti – storage required by “baseline Frisbee” 35
36
36 3× 233 GB 75 GB
37
disk image deployment setup 1 Gbps switched LAN single server – running “baseline Frisbee” or VF – configured to distribute data at 500 Mbps up to 20 client machines – Dell PowerEdge R710s (see paper for specs) 37
38
synchronized deployment deploy single disk image to – 1 client – 8 clients that start at the same time – 16 clients that start at the same time measure time to deploy over 10 trials (image: 1.4 GB uncompressed data) 38
39
39 2% increase in run time
40
staggered deployment deploy single disk image to – 20 clients – organized into 5 groups – groups start at 5-second intervals measure time to deploy over 10 trials 40
41
41 3% increase in run time
42
conclusions VF combines deduplicating storage with a high-performance disk distribution system 3× reduction in required storage 2–3% run-time overhead “don’t be the bottleneck”: careful design – obtain dedup benefit: AFC – preserve existing optimizations 42
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.