Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah

this talk utilizing a deduplicating storage system within a fast disk-imaging system 3× decrease in storage negligible run-time overhead “don’t be the bottleneck” Aligned Fixed-size Chunking 2 VF results techniques

3 disk image server loaded on demand be fast! deliver data as fast as clients can receive it

4 disk image server

5 Utah Emulab 1,000+ disk images 21 TB total Amazon EC2 37,000+ public AMIs fast ☛ ☚ compact

deduplication 6 dedup. storage system

deduplication 7 Image 1 fingerprint 1; fingerprint 2; fingerprint 3; … Image 1 fingerprint 1; fingerprint 2; fingerprint 3; … Image 2 fingerprint 1; fingerprint 2; fingerprint 19; … Image 2 fingerprint 1; fingerprint 2; fingerprint 19; … small “recipe”

dedup. for disk images images are often derived from other images – users add packages to testbed “base” images – users’ work-in-progress snapshots – … a lot of duplicated data across images! 8

9 disk image server

10 disk image server dedup. disk image storage problem: dedup. storage can be slow our contrib: add dedup. without system slowdown

why is frisbee fast? compression use filesystem info pipeline independent “chunks” 11 disk image server lower network bandwidth smaller files fewer disk writes disk read net xfer decompress disk write keep receiving disk busy keep pipeline filled new clients can join sequential disk writes

from frisbee to VF Frisbee: disk images stored as files VF: disk image data stored in Venti reformed into chunks by Chunkmaker 12 disk image server Venti Chunkmaker [Quinlan & Dorward, FAST ’02] [Rhea et al., ATC ’08]

image corpus 430 Linux images from Utah Emulab – 76 “standard” images – 354 user-created images based on RedHat, Fedora, CentOS, & Ubuntu 13 Venti

addressing the challenges compression use filesystem info pipeline independent “chunks”

compression 15 Venti image server capture partition store retrieve

compression 16 Venti image server capture partition store retrieve compressed disk image compress poor deduplication (1.11×)

compression 18 Venti image server capture partition store retrieve compress disk data compress too slow compress30.29 MB/s disk write71.07 MB/s

compression 20 Venti image server capture partition retrieve store compressed dedup blocks compress preserves opportunities for dedup server retrieves & concatenates compressed blocks to form chunks 6% more chunks vs. original Frisbee

use filesystem info exclude unallocated sectors from image promote sequential disk writes process the “stream” of allocated sectors 22

23 Venti 12345678 12345678 1234 5678 sector stream make dedup blocks via “fixed-size chunking”

24 Venti sector allocations & frees move the dedup block boundaries fixed-sized chunking over sector stream leads to poor deduplication across disk images 12345678 1234 5678 abc345678 abc345678 abc345678

aligned fixed-size chunking 25 123456789 abc3456789 deduplicate! Venti 12zz z345 6789zzab czzz block boundaries based on sector offsets “pad” partially filled blocks with zero sectors

how big should dedup blocks be? better dedup – more likely to match slower – more accesses to Venti lower compression ratio – less data per block more metadata per image lower dedup – less likely to match faster – fewer accesses to Venti higher compression ratio – more data per block less metadata per image 26 big — say, 48K small — say, 4K

pipeline speed through parallelism choose maximum storage benefit that doesn’t slow down the pipeline 28 disk image server Venti read net xfer decompress disk write Venti i.e., the smallest dedup block size

29 ✖✖✖✔

image corpus @ 32K (compressed) image data:239.89 GB (compressed) data in Venti:073.62 GB deduplication ratio: 3.26 image metadata:001.49 GB total space savings versus Frisbee: 67.8% 30

independent chunks chunk — Frisbee’s network protocol unit – contains multiple groups of sectors – client requests chunks until it has them all 32 disk image server Venti Chunkmaker Metadata chunk headers; fingerprints Metadata chunk headers; fingerprints

independent chunks client requests chunk find precomputed chunk metadata – chunk header – dedup block fingerprints retrieve dedup blocks from Venti concatenate blocks with header and transmit to client cache constructed chunk 33 disk image server Venti Chunkmaker Metadata chunk headers; fingerprints Metadata chunk headers; fingerprints

evaluation storage savings synchronized deployment staggered deployment

storage savings load our image corpus into Venti – 430 Linux images – load from oldest to newest track storage as images are added – compressed, dedup’ed data in Venti – storage required by “baseline Frisbee” 35

36 3× 233 GB 75 GB

disk image deployment setup 1 Gbps switched LAN single server – running “baseline Frisbee” or VF – configured to distribute data at 500 Mbps up to 20 client machines – Dell PowerEdge R710s (see paper for specs) 37

synchronized deployment deploy single disk image to – 1 client – 8 clients that start at the same time – 16 clients that start at the same time measure time to deploy over 10 trials (image: 1.4 GB uncompressed data) 38

39 2% increase in run time

staggered deployment deploy single disk image to – 20 clients – organized into 5 groups – groups start at 5-second intervals measure time to deploy over 10 trials 40

41 3% increase in run time

conclusions VF combines deduplicating storage with a high-performance disk distribution system 3× reduction in required storage 2–3% run-time overhead “don’t be the bottleneck”: careful design – obtain dedup benefit: AFC – preserve existing optimizations 42

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

Similar presentations

Presentation on theme: "Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

Similar presentations

Presentation on theme: "Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah."— Presentation transcript:

Similar presentations

About project

Feedback