Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells, John Kubiatowicz University of California, Berkeley.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.
Henry C. H. Chen and Patrick P. C. Lee
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage.
Pond: the OceanStore Prototype CS 6464 Cornell University Presented by Yeounoh Chung.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
Pond The OceanStore Prototype. Introduction Problem: Rising cost of storage management Observations: Universal connectivity via Internet $100 terabyte.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
David Choffnes, Winter 2006 OceanStore Maintenance-Free Global Data StorageMaintenance-Free Global Data Storage, S. Rhea, C. Wells, P. Eaton, D. Geels,
OceanStore Exploiting Peer-to-Peer for a Self-Repairing, Secure and Persistent Storage Utility John Kubiatowicz University of California at Berkeley.
OceanStore Status and Directions ROC/OceanStore Retreat 1/16/01 John Kubiatowicz University of California at Berkeley.
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore An Architecture for Global-scale Persistent Storage By John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Failure Independence in Oceanstore Archive Hakim Weatherspoon University of California, Berkeley.
OceanStore Status and Directions ROC/OceanStore Retreat 1/13/03 John Kubiatowicz University of California at Berkeley.
Tentative Updates in MINO Steven Czerwinski Jeff Pang Anthony Joseph John Kubiatowicz ROC Winter Retreat January 13, 2002.
OceanStore: Data Security in an Insecure world John Kubiatowicz.
OceanStore Theoretical Issues and Open Problems John Kubiatowicz University of California at Berkeley.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley
Opportunities for Continuous Tuning in a Global Scale File System John Kubiatowicz University of California at Berkeley.
OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage John Kubiatowicz University of California at Berkeley.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
Slide 1 Ubiquitous Storage Breakout Group Endeavour mini-retreat January, 2000.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
Long Term Durability with Seagull Hakim Weatherspoon (Joint work with Jeremy Stribling and OceanStore group) University of California, Berkeley ROC/Sahara/OceanStore.
OceanStore: An Architecture for Global - Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton, Dennis Geels,
Efficient Proactive Security for Sensitive Data Storage Arun Subbiah Douglas M. Blough School of ECE, Georgia Tech {arun,
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz, et al ASPLOS 2000.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Zuse-Institute Berlin (ZIB) Computer Science Research Artur Andrzejak Zuse-Institute Berlin (ZIB) Overview: Challenges in P2P Systems.
Hot Topics in OS Research Andy Wang COP 5611 Advanced Operating Systems.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
1 The Design of a Robust Peer-to-Peer System Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Reference: SIGOPS European Workshop.
Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
OceanStore: In Search of Global-Scale, Persistent Storage John Kubiatowicz UC Berkeley.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
OceanStore: An Architecture for Global- Scale Persistent Storage.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Effective Replica Maintenance for Distributed Storage Systems USENIX NSDI’ 06 Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon,
Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
Secure Location-Independent Autonomic Storage Architectures GR/S44501/01 February January 2007 Graham Kirby, Alan Dearle, Ron Morrison & Stuart.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
Overview Issues in Mobile Databases – Data management – Transaction management Mobile Databases and Information Retrieval.
OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.
Persistence of Data in a Dynamic Unreliable Network
Making the Archive Real
OceanStore August 25, 2003 John Kubiatowicz
OceanStore: Data Security in an Insecure world
A Redundant Global Storage Architecture
Pond: the OceanStore Prototype
OceanStore: An Architecture for Global-Scale Persistent Storage
Content Distribution Network
Outline for today Oceanstore: An architecture for Global-Scale Persistent Storage – University of California, Berkeley. ASPLOS 2000 Feasibility of a Serverless.
Presentation transcript:

Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells, John Kubiatowicz University of California, Berkeley Future Directions in Distributed Computing. Thursday, June 6, 2002

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:2 OceanStore Context: Ubiquitous Computing Computing everywhere: –Desktop, Laptop, Palmtop. –Cars, Cellphones. –Shoes? Clothing? Walls? Connectivity everywhere: –Rapid growth of bandwidth in the interior of the net. –Broadband to the home and office. –Wireless technologies such as CDMA, Satellite, laser.

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:3 Archival Storage Where is persistent information stored? –Want: Geographic independence for availability, durability, and freedom to adapt to circumstances How is it protected? –Want: Encryption for privacy, secure naming and signatures for authenticity, and Byzantine commitment for integrity Is it Available/Durable? –Want: Redundancy with continuous repair and redistribution for long-term durability

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:4 Path of an Update

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:5 Questions about Data? How to use redundancy to protect against data being lost? How to verify data?

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:6 Archival Dissemination Built into Update Erasure codes –redundancy without overhead of strict replication –produce n fragments, where any m is sufficient to reconstruct data. m < n. rate r = m/n. Storage overhead is 1/r.

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:7 Durability Fraction of Blocks Lost Per Year (FBLPY)* –r = ¼, erasure-encoded block. (e.g. m = 16, n = 64) –Increasing number of fragments, increases durability of block Same storage cost and repair time. –n = 4 fragment case is equivalent to replication on four servers. * Erasure Coding vs. Replication, H. Weatherspoon and J. Kubiatowicz, In Proc. of IPTPS 2002.

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:8 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n.(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1F2F3F4 H1H2H3H4 H12H34 H14 B-GUID Hd Data Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:9 Enabling Technology GUID Fragments

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:10 Complex Objects I Unit of Coding data Verification Tree GUID of d Encoded Fragments: Unit of Archival Storage

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:11 Complex Objects II GUID of d 1 d1d1 Unit of Coding Encoded Fragments: Unit of Archival Storage Verification Tree Blocks Data VGUID d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 Data B -Tree Indirect Blocks M

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:12 Complex Objects III Data Blocks VGUID i VGUID i + 1 d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 d1d1 Data B -Tree Indirect Blocks M d' 8 d' 9 M backpointer copy on write AGUID = hash{name+keys}

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:13 Mutable Data Need mutable data for real system. –Entity in network. –A-GUID to V-GUID mapping. –Byzantine Commitment for Integrity –Verifies client privileges. –Creates a serial order. –Atomically applies update. Versioning system –Each version is inherently read-only.

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:14 Archiver I Archiver Server Architecture –Requests to archive objects recv’d thru network layers. –Consistency mechanisms decides to archive obj. Asynchronous DiskAsynchronous Network Network Operating System Java Virtual Machine Thread Scheduler X Y Consistency Location & Routing Archiver Introspection Modules DispatchDispatch

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:15 Archiver Control Flow GenerateChkpt Stage GenerateFrags Stage Disseminator Stage Consistency Stage(s) GenerateFragsChkptReq DisseminateFragsReq GenerateFragsReq GenerateFragsResp GenerateFragsChkptResp Send Frags to Storage Servers DisseminateFragsResp Req

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:16 Performance Performance of the Archival Layer –Performance of an OceanStore server in archiving a objects. –analyze operations of archiving data (this includes signing updates in a BFT protocol). No archiving Inlined archiving (synchronous) (m = 16, n = 32) Delayed archiving (asynchronous) (m = 16, n = 32) Experiment Environment –OceanStore servers were analyzed on a 42-node cluster. –Each machine in the cluster is a IBM xSeries 330 1U rackmount PC with two 1.0 GHz Pentium III CPUs 1.5 GB ECC PC133 SDRAM two 36 GB IBM UltraStar 36LZX hard drives. The machines use a single Intel PRO/1000 XF gigabit Ethernet adaptor to connect to a Packet Engines Linux SMP kernel.

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:17 Performance: Throughput Data Throughput –No archive 5MB/s. –Delayed 3MB/s. –Inlined 2.5MB/s.

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:18 Performance: Latency Latency –Archive only Y-intercept 3ms, slope 0.3s/MB. –Inlined Archive = No archive + only archive.

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:19 Future Directions Caching for performance –Automatic Replica Placement Automatic Repair

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:20 Caching Automatic Replica Placement –Replicas are soft-state. –Can be constructed and destroyed as necessary. Prefetching –Reconstruct replicas from fragments in advance of use

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:21 Efficient Repair Global. –Global Sweep and repair not efficient. –Want detection/notification of node removal in system. –Not as affective as distributed mechanisms. Distributed. –Exploit DOLR’s distributed information and locality properties. –Efficient detection and then reconstruction of fragments AE L2 L1 L2 L3 L2 L1 L2 L3 L2 Ring of L1 Heartbeats

FuDiCo 2002©2002 Hakim Weatherspoon/UC BerkeleyNaming and Integrity:22 Conclusion Storage efficient, self-verifying mechanism. –Erasure codes are good. Self-verifying data assist in –Secure read-only data –Secure caching infrastructures –Continuous adaptation and repair For more information: