Download presentation
Presentation is loading. Please wait.
Published byAnn Justina Robinson Modified over 9 years ago
1
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley Proc. of the 2 nd USENIX Conf. On File and Storage Technologies (FAST ‘03) Presented by Park, Seon-Yeong
2
2/26 Ubiquitous Computing Telephone SPO Watch PDA Cell Phone Digital TV PC Storage Pool
3
3/26 OceanStore Overview Internet-scale, Cooperative File System Application Calendars, Email, Contact Lists, Large Digital Libraries, Repositories for Scientific Data, Distributed Design Tool, etc. Requirements Universal Availability Durability Understandable Consistency Model Privacy vs. Information Sharing
4
4/26 Data Model (1/2) Data Object A File in a Traditional File System Named by an Active Globally-Unique Identifier, AGUID –Location Independent –Preventing Name Space Collisions SHA-1 AGUID Application-specified Name + Owner’s Public Key
5
5/26 Data Model (2/2) Data Object Sequences of Read-only Versions Block Reference –Cryptographically-secure Hash of Child Block’s Contents
6
6/26 Underlying Technology Access Control Data Update Primary Replica Archival Storage Secondary Replica Data Read Data Location & Routing ;Tapestry
7
7/26 Access Control Reader Restriction Encrypt All Data Distribute Encryption Key to Users with Read Permission Writer Restriction Access Control List (ACL) for an Object All Writes be Signed so that Well-behaved Servers and Clients Verify them based on the ACL
8
8/26 Underlying Technology Access Control Data Update Primary Replica Archival Storage Secondary Replica Data Read Data Location & Routing
9
9/26 Data Update (1/2) Update Adding a New Version to the Head of Version Stream Array of Potential Actions each Guarded by a Predicate –Predicate Examples Checking Latest Version_Num, Comparing a Region of Bytes to an Expected Value, etc. –Action Examples Replacing a Set of Bytes, Appending New Data, Truncating the Object, etc. Timestamp Client ID... Client Signature
10
10/26 Data Update (2/2) Application Primary Replica (Inner Ring) Archival Storages Application Secondary Replica Secondary Replica
11
11/26 Primary Replica Inner Ring A Set of Servers that Implement Object’s Primary Replica Applies Updates and Creates New Versions –Serialization –Access Control –Create Archival Fragments Update Agreements –Byzantine Agreement Protocol Distributed Decision Process in which All Non-faulty Participants Reach the Same Decision for a Group of Size 3 f +1, no more than f Faulty Servers
12
12/26 Archival Storage Simple Replication Tolerance of One Failure for an Addition 100% Storage Cost Erasure Codes Efficient and Stable Storage for Archival Copies Storage Cost by a Factor of N/M Original Block can be Reconstructed from Any M Fragments Block Fragment 1 Fragment 2 Fragment N... Fragment 1 Fragment 2 Fragment M... Encoded by Erasure Code M < N Fragment 3
13
13/26 Secondary Replica Whole-block Caching to Avoid Erasure Codes on Frequently-read Objects Push-based Update Every Time the Primary Replica Applies an Update Dissemination Tree Application-level Multicast Tree Rooted at Primary Replica Parent Nodes are Pre-existing Replicas to Serve Objects
14
14/26 Underlying Technology Access Control Data Update Primary Replica Archival Storage Secondary Replica Data Read Data Location & Routing
15
15/26 Data Read Application Primary Replica (Inner Ring) Archival Storages Secondary Replica 1. AGUID 2. Latest VGUID 3. Search Blocks from Secondary Replicas 4. Search enough Fragments from Archival Storages
16
16/26 Underlying Technology Access Control Data Update Primary Replica Archival Storage Secondary Replica Data Read Data Location & Routing
17
17/26 Data Location & Routing (1/4) Tapestry Decentralized Object Location and Routing System Using Globally Unique Identifier (GUID) to Hosts and Resources Location Independent Locality Aware
18
18/26 Data Location & Routing (2/4) Routing Example Messages are Routed to the Destination ID Digit by Digit ***8=>**98=>*598=>4598 B4F8 9098 0325 2BB8 7598 4598 87CA 0098 3E98 1598 D598 2118 L1 L2 L3 L4 L2 L4 L3 L1
19
19/26 Data Location & Routing (3/4) Location Independent & Locality Aware L1 L2 L3 L4 L2 L4 L3 Replica Location Pointer L1
20
20/26 Data Location & Routing (4/4) Routing Table
21
21/26 Prototype Prototype Software Architecture
22
22/26 Experimental Results (1/2) Update Performance
23
23/26 Experimental Results (2/2) Comparison with NFS Write Read Read/Write
24
24/26 Related Work Other Peer-to-peer File Systems PAST[Rows01] and CFS[Dabe01] –No Write Sharing IVY[Muth02], Pangaea[Sait02] –Provide Both Read and Write Sharing but, –No Single Point of Consistency
25
25/26 Conclusion Operational OceanStore Prototype Universally Accessible, Fault-tolerance, Security and Information Sharing Future Research Improving Performance –Efficient Threshold Schemes and Archival Data Generation Self-Maintenance Stability and Fault-tolerance Supporting More Applications
26
26/26 Discussion System Design Choice Security vs. Fast Response Simple vs. Complicate Design Storage Service Provider (SSP) Independent SSP vs. Confederation of Companies such as IBM, AT&T Efficient Storage Usage
27
27/26 Primary Replica (Ext.) Modification of Byzantine Agreement Protocol Public Key Cryptography –Symmetric-key Message Authentication Codes (MACs) for Inner Ring –Public-key Cryptography for All Other Machines Proactive Threshold Signatures –Flexibility in Choosing the Membership of Inner Ring –Single Public Key with l Private Key Shares –Any k Correctly Generated Signature Shares among l –Independent Sets of Key Shares can be Used to Control Membership Responsible Party –To Choose the Hosts that Make Up Inner Rings
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.