Download presentation
Presentation is loading. Please wait.
Published byLynn Robertson Modified over 9 years ago
1
Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco. Thursday, July 31, 2003
2
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:2 OceanStore Context: Ubiquitous Computing Computing everywhere: –Desktop, Laptop, Palmtop. –Cars, Cellphones. –Shoes? Clothing? Walls? Connectivity everywhere: –Rapid growth of bandwidth in the interior of the net. –Broadband to the home and office. –Wireless technologies such as CDMA, Satellite, laser.
3
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:3 Archival Storage Where is persistent information stored? –Want: Geographic independence for availability, durability, and freedom to adapt to circumstances How is it protected? –Want: Encryption for privacy, secure naming and signatures for authenticity, and Byzantine commitment for integrity Is it Available/Durable? –Want: Redundancy with continuous repair and redistribution for long-term durability
4
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:4 Path of an Update
5
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:5 Questions about Data? How to use redundancy to protect against data being lost? How to verify data? Amount of resources used to keep data durable? Storage? Bandwidth?
6
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:6 Archival Dissemination Built into Update Erasure codes –redundancy without overhead of strict replication –produce n fragments, where any m is sufficient to reconstruct data. m < n. rate r = m/n. Storage overhead is 1/r.
7
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:7 Durability Fraction of Blocks Lost Per Year (FBLPY)* –r = ¼, erasure-enncoded block. (e.g. m = 16, n = 64) –Increasing number of fragments, increases durability of block Same storage cost and repair time. –n = 4 fragment case is equivalent to replication on four servers. * Erasure Coding vs. Replication, H. Weatherspoon and J. Kubiatowicz, In Proc. of IPTPS 2002.
8
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:8 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n*(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1F2F3F4 H1H2H3H4 H12H34 H14 B-GUID Hd Data Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data
9
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:9 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n*(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1 H1H2 H12H34 H14 B-GUID Hd Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data
10
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:10 Enabling Technology GUID Fragments
11
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:11 Complex Objects I Unit of Coding data Verification Tree GUID of d Encoded Fragments: Unit of Archival Storage
12
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:12 Complex Objects II GUID of d 1 d1d1 Unit of Coding Encoded Fragments: Unit of Archival Storage Verification Tree Blocks Data VGUID d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 Data B -Tree Indirect Blocks M
13
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:13 Complex Objects III Data Blocks VGUID i VGUID i + 1 d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 d1d1 Data B -Tree Indirect Blocks M d' 8 d' 9 M backpointer copy on write AGUID = hash{name+keys}
14
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:14 Mutable Data Need mutable data for real system. –Entity in network. –A-GUID to V-GUID mapping. –Byzantine Commitment for Integrity –Verifies client privileges. –Creates a serial order. –Atomically applies update. Versioning system –Each version is inherently read-only.
15
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:15 Deployment Planet Lab global network –98 machines at 42 institutions, in North America, Europe, Asia, Australia. –1.26Ghz PIII (1GB RAM), 1.8Ghz PIV (2GB RAM) –North American machines (2/3) on Internet2
16
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:16 Deployment Deployed storage system in November of 2002. –~ 50 physical machines. –100 virtual nodes. 3 clients, 93 storage serves, 1 archiver, 1 monitor. –Support OceanStore API NFS, IMAP, etc. –Fault injection. –Fault detection and repair.
17
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:17 Performance Performance of the Archival Layer –Performance of an OceanStore server in archiving a objects. –analyze operations of archiving data (this includes signing updates in a BFT protocol). No archiving Archiving (synchronous) (m = 16, n = 32) Experiment Environment –OceanStore servers were analyzed on a 42-node cluster. –Each machine in the cluster is a IBM xSeries 330 1U rackmount PC with two 1.0 GHz Pentium III CPUs 1.5 GB ECC PC133 SDRAM two 36 GB IBM UltraStar 36LZX hard drives. The machines use a single Intel PRO/1000 XF gigabit Ethernet adaptor to connect to a Packet Engines Linux 2.4.17 SMP kernel.
18
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:18 Performance: Throughput Data Throughput –No archive 8MB/s. –Archive 2.8MB/s.
19
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:19 Performance: Latency Latency –Fragmentation Y-intercept 3ms, slope 0.3s/MB. –Archive = No archive + Fragmentation.
20
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:20 Closer Look: Update Latency Threshold Signature dominates small update latency –Common RSA tricks not applicable Batch updates to amortize signature cost Tentative updates hide latency Update Latency (ms) Key Size Update Size 5% Time Median Time 95% Time 512b 4kB394041 2MB103710861348 1024b 4kB9899100 2MB109811501448 Latency Breakdown PhaseTime (ms) Check0.3 Serialize6.1 Apply1.5 Archive4.5 Sign77.8
21
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:21 Current Situation Stabilized routing layer under churn and extraordinary circumstances NSF infrastructure grant –Deploy code as a service for Berkeley –Target 1/3 PB Future Collaborations –CMU for PB Store. –Internet Archive?
22
FDIS 2003©2003 Hakim Weatherspoon/UC BerkeleyDistributed Archival Service:22 Conclusion Storage efficient, self-verifying mechanism. –Erasure codes are good. Self-verifying data assist in –Secure read-only data –Secure caching infrastructures –Continuous adaptation and repair For more information: http://oceanstore.cs.berkeley.edu/ Papers: Pond: the OceanStore Prototype - Naming and Integrity: Self-Verifying Data in P2P Systems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.