Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco.

Similar presentations


Presentation on theme: "Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco."— Presentation transcript:

1 Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco. Thursday, July 31, 2003

2 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:2 OceanStore Context: Ubiquitous Computing Computing everywhere: –Desktop, Laptop, Palmtop. –Cars, Cellphones. –Shoes? Clothing? Walls? Connectivity everywhere: –Rapid growth of bandwidth in the interior of the net. –Broadband to the home and office. –Wireless technologies such as CDMA, Satellite, laser.

3 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:3 Archival Storage Where is persistent information stored? –Want: Geographic independence for availability, durability, and freedom to adapt to circumstances How is it protected? –Want: Encryption for privacy, secure naming and signatures for authenticity, and Byzantine commitment for integrity Is it Available/Durable? –Want: Redundancy with continuous repair and redistribution for long-term durability

4 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:4 Path of an Update

5 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:5 Questions about Data? How to use redundancy to protect against data being lost? How to verify data? Amount of resources used to keep data durable? Storage? Bandwidth?

6 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:6 Archival Dissemination Built into Update Erasure codes –redundancy without overhead of strict replication –produce n fragments, where any m is sufficient to reconstruct data. m < n. rate r = m/n. Storage overhead is 1/r.

7 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:7 Durability Fraction of Blocks Lost Per Year (FBLPY)* –r = ¼, erasure-enncoded block. (e.g. m = 16, n = 64) –Increasing number of fragments, increases durability of block Same storage cost and repair time. –n = 4 fragment case is equivalent to replication on four servers. * Erasure Coding vs. Replication, H. Weatherspoon and J. Kubiatowicz, In Proc. of IPTPS 2002.

8 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:8 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n*(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1F2F3F4 H1H2H3H4 H12H34 H14 B-GUID Hd Data Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data

9 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:9 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n*(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1 H1H2 H12H34 H14 B-GUID Hd Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data

10 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:10 Enabling Technology GUID Fragments

11 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:11 Complex Objects I Unit of Coding data Verification Tree GUID of d Encoded Fragments: Unit of Archival Storage

12 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:12 Complex Objects II GUID of d 1 d1d1 Unit of Coding Encoded Fragments: Unit of Archival Storage Verification Tree Blocks Data VGUID d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 Data B -Tree Indirect Blocks M

13 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:13 Complex Objects III Data Blocks VGUID i VGUID i + 1 d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 d1d1 Data B -Tree Indirect Blocks M d' 8 d' 9 M backpointer copy on write AGUID = hash{name+keys}

14 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:14 Mutable Data Need mutable data for real system. –Entity in network. –A-GUID to V-GUID mapping. –Byzantine Commitment for Integrity –Verifies client privileges. –Creates a serial order. –Atomically applies update. Versioning system –Each version is inherently read-only.

15 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:15 Deployment Planet Lab global network –98 machines at 42 institutions, in North America, Europe, Asia, Australia. –1.26Ghz PIII (1GB RAM), 1.8Ghz PIV (2GB RAM) –North American machines (2/3) on Internet2

16 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:16 Deployment Deployed storage system in November of 2002. –~ 50 physical machines. –100 virtual nodes. 3 clients, 93 storage serves, 1 archiver, 1 monitor. –Support OceanStore API NFS, IMAP, etc. –Fault injection. –Fault detection and repair.

17 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:17 Performance Performance of the Archival Layer –Performance of an OceanStore server in archiving a objects. –analyze operations of archiving data (this includes signing updates in a BFT protocol). No archiving Archiving (synchronous) (m = 16, n = 32) Experiment Environment –OceanStore servers were analyzed on a 42-node cluster. –Each machine in the cluster is a IBM xSeries 330 1U rackmount PC with two 1.0 GHz Pentium III CPUs 1.5 GB ECC PC133 SDRAM two 36 GB IBM UltraStar 36LZX hard drives. The machines use a single Intel PRO/1000 XF gigabit Ethernet adaptor to connect to a Packet Engines Linux 2.4.17 SMP kernel.

18 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:18 Performance: Throughput Data Throughput –No archive 8MB/s. –Archive 2.8MB/s.

19 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:19 Performance: Latency Latency –Fragmentation Y-intercept 3ms, slope 0.3s/MB. –Archive = No archive + Fragmentation.

20 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:20 Closer Look: Update Latency Threshold Signature dominates small update latency –Common RSA tricks not applicable Batch updates to amortize signature cost Tentative updates hide latency Update Latency (ms) Key Size Update Size 5% Time Median Time 95% Time 512b 4kB394041 2MB103710861348 1024b 4kB9899100 2MB109811501448 Latency Breakdown PhaseTime (ms) Check0.3 Serialize6.1 Apply1.5 Archive4.5 Sign77.8

21 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:21 Current Situation Stabilized routing layer under churn and extraordinary circumstances NSF infrastructure grant –Deploy code as a service for Berkeley –Target 1/3 PB Future Collaborations –CMU for PB Store. –Internet Archive?

22 FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:22 Conclusion Storage efficient, self-verifying mechanism. –Erasure codes are good. Self-verifying data assist in –Secure read-only data –Secure caching infrastructures –Continuous adaptation and repair For more information: http://oceanstore.cs.berkeley.edu/ Papers: Pond: the OceanStore Prototype - Naming and Integrity: Self-Verifying Data in P2P Systems


Download ppt "Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco."

Similar presentations


Ads by Google