OceanStore: An Architecture for Global-Scale Persistent Storage Authors: J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao University of California, Berkeley http://oceanstore.cs.berkeley.edu
Presentation Overview Purpose and Vision of OceanStore Data Location and Routing Deep Archival Storage Current Status
Applications for Persistent Storage Storage for ubiquitous computing Need for transparency Large inexpensive memory allows for this Personal Information Management tools: Calendars, Contact Lists, etc. E-mail Need consistency Need privacy and security Repositories, Digital Libraries
OceanStore Goals OceanStore will accommodate persistent storage for ubiquitous computing. Consistant Highly Available Durable Information Divorced from location Unique Goals Levels of trusted and untrusted servers Nomadic Data
Data Location and Routing Routing is maintained as location independent by addressing GUIDs Distributed data structure tracks the location of objects based on a Randomized Hierarchical Distributed Data Structure (Plaxton et al) Routing is tiered Local routing is probabilistic. Backup is a highly redundant randomized hierarchical distributed data structure
Probabilistic Routing Attenuated Bloom Filters Multiple Hashes on the same data Can give a false positive answer Hash1(x) = 0 Hash2(x) = 3 Hash3(x) = 4 1 Hash1(x) = 2 GUID 4356 GUID 7382
Attenuated Bloom Filters Union of neighbor-node filters yield a consistent hash. Cheap and easy Probabilistic
Wide-Scale Data Location Bits in an object’s GUID becomes node IDs in a random hierarchical tree Each link in the tree is graded by how much of the node ID’s match L1 = No Match L2 = LSB Match Every level on a node has 16 links to closest ping IP’s.
Random Trees Roots occur where highest level links occur By traversing through greater than or equal to links that have the desired bit strings the desired node ID is found. Only disjoint networks prevent object location
Example
Deep Archival Storage Assumed uncorrelated faults Highly redundant fragments Intelligently distributed to both trusted and untrusted systems
Erasure Codes Reed-Solomon Codes Transforms n fragments into 2n or 4n fragments Any set of n fragments from the larger set of fragments can help determine the data carried by the original n fragments. B1 B2 B3 B4 P1 P2 P3 P4 Expensive Code Calculations Using Erasure Codes
Smaller Example Using Erasure Codes are similar to using parity bits in strings of bits. 1 0 1 1 1 b0 b1 b2 b3 p 1 ? 1 1 1 b0 b1 b2 b3 p 1+1+1+1=4 %2=0 1+0+1+1=3 %2=1
Current State Pond: a prototype system Tapestry Infrastructure for fault resilient, decentralized location and routing Fast becoming a reality
Questions Comments