OceanStore Exploiting Peer-to-Peer for a Self-Repairing, Secure and Persistent Storage Utility John Kubiatowicz University of California at Berkeley.

Slides:



Advertisements
Similar presentations
Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
Advertisements

What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.
POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage.
Pond: the OceanStore Prototype CS 6464 Cornell University Presented by Yeounoh Chung.
Pond The OceanStore Prototype. Pond -- Dennis Geels -- January 2003 Talk Outline System overview Implementation status Results from FAST paper Conclusion.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
Pond The OceanStore Prototype. Introduction Problem: Rising cost of storage management Observations: Universal connectivity via Internet $100 terabyte.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
David Choffnes, Winter 2006 OceanStore Maintenance-Free Global Data StorageMaintenance-Free Global Data Storage, S. Rhea, C. Wells, P. Eaton, D. Geels,
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage John Kubiatowicz.
Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage John Kubiatowicz.
CS252/Kubiatowicz Lec 1.1 8/25/03 CS294-4 Peer-to-Peer systems Introduction August 25, 2003 John Kubiatowicz/Anthony Joseph
OceanStore Status and Directions ROC/OceanStore Retreat 1/16/01 John Kubiatowicz University of California at Berkeley.
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage John Kubiatowicz University of California at Berkeley.
An OceanStore Retrospective John Kubiatowicz University of California at Berkeley.
OceanStore An Architecture for Global-scale Persistent Storage By John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.
OceanStore Status and Directions ROC/OceanStore Retreat 1/13/03 John Kubiatowicz University of California at Berkeley.
Tentative Updates in MINO Steven Czerwinski Jeff Pang Anthony Joseph John Kubiatowicz ROC Winter Retreat January 13, 2002.
Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells, John Kubiatowicz University of California, Berkeley.
The Oceanic Data Utility: (OceanStore) Global-Scale Persistent Storage John Kubiatowicz.
Cloud Storage is the Future Lessons from the OceanStore Project John Kubiatowicz University of California at Berkeley.
OceanStore: Data Security in an Insecure world John Kubiatowicz.
OceanStore Theoretical Issues and Open Problems John Kubiatowicz University of California at Berkeley.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.
OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley
Opportunities for Continuous Tuning in a Global Scale File System John Kubiatowicz University of California at Berkeley.
OceanStore Toward Global-Scale, Self-Repairing, Secure and Persistent Storage John Kubiatowicz University of California at Berkeley.
CITRIS Poster Supporting Wide-area Applications Complexities of global deployment  Network unreliability.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
Or, Providing High Availability and Adaptability in a Decentralized System Tapestry: Fault-resilient Wide-area Location and Routing Issues Facing Wide-area.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
CS162 Operating Systems and Systems Programming Lecture 27 Peer-to-peer Systems and Other Topics December 7 th, 2005 Prof. John Kubiatowicz
Long Term Durability with Seagull Hakim Weatherspoon (Joint work with Jeremy Stribling and OceanStore group) University of California, Berkeley ROC/Sahara/OceanStore.
OceanStore: An Architecture for Global - Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton, Dennis Geels,
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz, et al ASPLOS 2000.
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca Fabio Picconi Pierre Sens LIP6, Université Paris 6.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
OceanStore: In Search of Global-Scale, Persistent Storage John Kubiatowicz UC Berkeley.
OceanStore: An Architecture for Global- Scale Persistent Storage.
Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.
Persistence of Data in a Dynamic Unreliable Network
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage
OceanStore August 25, 2003 John Kubiatowicz
Cloud Storage is the Future Lessons from the OceanStore Project
OceanStore: Data Security in an Insecure world
John D. Kubiatowicz UC Berkeley
Pond: the OceanStore Prototype
OceanStore: An Architecture for Global-Scale Persistent Storage
Content Distribution Network
Outline for today Oceanstore: An architecture for Global-Scale Persistent Storage – University of California, Berkeley. ASPLOS 2000 Feasibility of a Serverless.
Presentation transcript:

OceanStore Exploiting Peer-to-Peer for a Self-Repairing, Secure and Persistent Storage Utility John Kubiatowicz University of California at Berkeley

OceanStore:2WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley OceanStore Context: Ubiquitous Computing Computing everywhere: –Desktop, Laptop, Palmtop –Cars, Cellphones –Shoes? Clothing? Walls? Connectivity everywhere: –Rapid growth of bandwidth in the interior of the net –Broadband to the home and office –Wireless technologies such as CMDA, Satelite, laser Where is persistent data????

OceanStore:3WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Utility-based Infrastructure? Pac Bell Sprint IBM AT&T Canadian OceanStore IBM Data service provided by storage federation Cross-administrative domain Contractual Quality of Service (“someone to sue”)

OceanStore:4WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley What are the advantages of a utility? For Clients: –Outsourcing of Responsibility Someone else worries about quality of service –Better Reliability Utility can muster greater resources toward durability System not disabled by local outages Utility can focus resources (manpower) at security- vulnerable aspects of system –Better data mobility Starting with secure network model  sharing For Utility Provider: –Economies of scale Dynamically redistribute resources between clients Focused manpower can serve many clients simultaneously

OceanStore:5WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Key Observation: Want Automatic Maintenance Can’t possibly manage billions of servers by hand! System should automatically: –Adapt to failure –Exclude malicious elements –Repair itself –Incorporate new elements System should be secure and private –Encryption, authentication System should preserve data over the long term (accessible for 1000 years): –Geographic distribution of information –New servers added from time to time –Old servers removed from time to time –Everything just works

OceanStore:6WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley OceanStore: Everyone’s Data, One Big Utility “The data is just out there” How many files in the OceanStore? –Assume people in world –Say 10,000 files/person (very conservative?) –So files in OceanStore! –If 1 gig files (ok, a stretch), get 1 mole of bytes! (or a Yotta-Byte if you are a computer person) Truly impressive number of elements… … but small relative to physical constants Aside: SIMS school: 1.5 Exabytes/year (1.5  )

OceanStore:7WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Outline Motivation Applications We Have Now –What would it actually mean to have such a utility? Why Peer to Peer? –A new way of thinking? A Peek at OceanStore –Self-Verifying Data –Active Data Authentication –Archival Data Maintenance Prototype: It’s Alive Conclusion

OceanStore:8WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Pushing the Vision

OceanStore:9WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Secure Object Storage Security: Access and Content controlled by client –Privacy through data encryption –Optional use of cryptographic hardware for revocation –Authenticity through hashing and active integrity checking Flexible self-management and optimization: –Performance and durability –Efficient sharing Client (w/ TCPA) Client (w/ TCPA) Client (w/ TCPA) OceanStore Client Data Manager

OceanStore:10WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Internet MINO: Wide-Area Service Traditional Mail Gateways Client Local network Replicas OceanStore Client API Mail Object Layer IMAP Proxy SMTP Proxy Complete mail solution – inbox –Imap folders OceanStore Objects

OceanStore:11WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley OceanStore Concepts Applied to Tape-less backup –Self-Replicating, Self-Repairing, Self-Managing –No need for actual Tape in system (Although could be there to keep with tradition) The Berkeley PetaByte Archival Service

OceanStore:12WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley High-Performance, Authenticated Access to legacy Archives via HTTP

OceanStore:13WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Why Peer-to-Peer?

OceanStore:14WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Peer-to-Peer is: Old View: –A bunch of flakey high-school students stealing music New View: –A philosophy of systems design at extreme scale –Probabilistic design when it is appropriate –New techniques aimed at unreliable components –A rethinking (and recasting) of distributed algorithms –Use of Physical, Biological, and Game-Theoretic techniques to achieve guarantees

OceanStore:15WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Untrusted Infrastructure: –The OceanStore is comprised of untrusted components –Individual hardware has finite lifetimes –All data encrypted within the infrastructure Mostly Well-Connected: –Data producers and consumers are connected to a high-bandwidth network most of the time –Exploit multicast for quicker consistency when possible Promiscuous Caching: –Data may be cached anywhere, anytime Responsible Party: –Some organization (i.e. service provider) guarantees that your data is consistent and durable –Not trusted with content of data, merely its integrity OceanStore Assumptions Peer-to-peer Quality-of-Service

OceanStore:16WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley The Peer-To-Peer View: Irregular Mesh of “Pools”

OceanStore:17WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley The Thermodynamic Analogy Large Systems have a variety of latent order –Connections between elements –Mathematical structure (erasure coding, etc) –Distributions peaked about some desired behavior Permits “Stability through Statistics” –Exploit the behavior of aggregates (redundancy) Subject to Entropy –Servers fail, attacks happen, system changes Requires continuous repair –Apply energy (i.e. through servers) to reduce entropy

OceanStore:18WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley The Biological Inspiration Biological Systems are built from (extremely) faulty components, yet: –They operate with a variety of component failures  Redundancy of function and representation –They have stable behavior  Negative feedback –They are self-tuning  Optimization of common case Introspective (Autonomic) Computing: –Components for performing –Components for monitoring and model building –Components for continuous adaptation Adapt Dance Monitor

OceanStore:19WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Important Example: DOLR (Decentralized Object Location and Routing) GUID1 DOLR GUID1 GUID2

OceanStore:20WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Single Node Tapestry DOLR: Like a router Transport Protocols Network Link Management Application Interface / Upcall API OceanStore Application-Level Multicast Other Applications Router Routing Table & Object Pointer DB Dynamic Node Management

OceanStore:21WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley It’s Alive! Planet Lab testbed –104 machines at 43 institutions, in North America, Europe, Australia (~ 60 machines utilized) –1.26Ghz PIII (1GB RAM), 1.8Ghz PIV (2GB RAM) –North American machines (2/3) on Internet2 Tapestry Java deployment –50,000 lines of Java code –6-7 nodes on each physical machine –IBM Java JDK 1.30 –Node virtualization inside JVM and SEDA –Scheduling between virtual nodes increases latency

OceanStore:22WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Object Location with Tapestry

OceanStore:23WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Stability under extreme circumstances (May 2003: 1.5 TB over 4 hours) DOLR Model generalizes to many simultaneous apps

OceanStore:24WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Peek at OceanStore

OceanStore:25WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley OceanStore Data Model Versioned Objects –Every update generates a new version –Can always go back in time (Time Travel) Each Version is Read-Only –Can have permanent name –Much easier to repair An Object is a signed mapping between permanent name and latest version –Write access control/integrity involves managing these mappings Comet Analogy updates versions

OceanStore:26WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Self-Verifying Objects Data Blocks VGUID i VGUID i + 1 d2d2 d4d4 d3d3 d8d8 d7d7 d6d6 d5d5 d9d9 d1d1 Data B - Tree Indirect Blocks M d' 8 d' 9 M backpointe r copy on write AGUID = hash{name+keys} Updates Heartbeats + Read-Only Data  Heartbeat: {AGUID,VGUID, Timestamp} signed

OceanStore:27WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley OceanStore API: Universal Conflict Resolution Consistency is form of optimistic concurrency –Updates contain predicate-action pairs –Each predicate tried in turn: If none match, the update is aborted Otherwise, action of first true predicate is applied Role of Responsible Party (RP): –Updates submitted to RP which chooses total order IMAP/SMTP NFS/AFS NTFS (soon?) HTTPNative Clients 1.Conflict Resolution 2.Versioning/Branching 3.Access control 4.Archival Storage OceanStore API

OceanStore:28WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Two Types of OceanStore Data Active Data: “Floating Replicas” –Per object virtual server –Interaction with other replicas for consistency –May appear and disappear like bubbles Archival Data: OceanStore’s Stable Store –m-of-n coding: Like hologram Data coded into n fragments, any m of which are sufficient to reconstruct (e.g m=16, n=64) Coding overhead is proportional to n  m (e.g 4) Other parameter, rate, is 1/overhead –Fragments are cryptographically self-verifying Most data in the OceanStore is archival!

OceanStore:29WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley The Path of an OceanStore Update Second-Tier Caches Inner-Ring Servers Clients

OceanStore:30WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Simple algorithms for placing replicas on nodes in the interior –Intuition: locality properties of Tapestry help select positions for replicas –Tapestry helps associate parents and children to build multicast tree Preliminary results encouraging Current Investigations: –Game Theory –Thermodynamics Self-Organizing Soft-State Replication

OceanStore:31WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Archival Dissemination of Fragments Archival Servers Archival Servers

OceanStore:32WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Differing Degrees of Responsibility Inner-ring provides quality of service –Handles of live data and write access control –Focus utility resources on this vital service –Compromised servers must be detected quickly Caching service can be provided by anyone –Data encrypted and self-verifying –Pay for service “Caching Kiosks”? Archival Storage and Repair –Read-only data: easier to authenticate and repair –Tradeoff redundancy for responsiveness Could be provided by different companies!

OceanStore:33WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Aside: Why erasure coding? High Durability/overhead ratio! Exploit law of large numbers for durability! 6 month repair, FBLPY: –Replication: 0.03 –Fragmentation: Fraction Blocks Lost Per Year (FBLPY)

OceanStore:34WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley The Dissemination Process: Achieving Failure Independence Model Builder Set Creator Introspection Human Input Network Monitoring model Inner Ring set probe type fragments Anti-correlation Analysis/Models Mutual Information Human Input Data Mining Independent Set Generation Effective Dissemination

OceanStore:35WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Extreme Durability? Exploiting Infrastructure for Repair –DOLR permits efficient heartbeat mechanism to notice: Servers going away for a while Or, going away forever! –Continuous sweep through data also possible –Erasure Code provides Flexibility in Timing Data transferred from physical medium to physical medium –No “tapes decaying in basement” –Information becomes fully Virtualized Thermodynamic Analogy: Use of Energy (supplied by servers) to Suppress Entropy

OceanStore:36WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley PondStore Prototype

OceanStore:37WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley OceanStore Prototype All major subsystems operational –Self-organizing Tapestry base –Primary replicas use Byzantine agreement –Secondary replicas self-organize into multicast tree –Erasure-coding archive –Application interfaces: NFS, IMAP/SMTP, HTTP 280K lines of Java (J2SE v1.3) –JNI libraries for cryptography, erasure coding PlanetLab Deployment (FAST 2003, “Pond” paper) –104 machines at 43 institutions, in North America, Europe, Australia (~ 60 machines utilized) –1.26Ghz PIII (1GB RAM), 1.8Ghz PIV (2GB RAM) –OceanStore code running with 1000 virtual-node emulations

OceanStore:38WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Event-Driven Architecture of an OceanStore Node Data-flow style –Arrows Indicate flow of messages Potential to exploit small multiprocessors at each physical node World

OceanStore:39WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Phase 4 kB write 2 MB write Validate Serialize Apply Archive Sign Result Closer Look: Write Cost Small writes –Signature dominates –Threshold sigs. slow! –Takes 70+ ms to sign –Compare to 5 ms for regular sigs. Times in ms Large writes –Encoding dominates –Archive cost per byte –Signature cost per write The Future –Tentative Writes remove Inner-Ring from latency path PondStore Prototype

OceanStore:40WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley Conclusions Peer-to-Peer System are a Hot Area –Large amounts of redundancy and connectivity –New Organizations for Large-Scale Systems –Thermodynamics of systems –Continuous Introspection and Repair Help the Infrastructure to Help you –Decentralized Object Location and Routing (DOLR) –Object-based Storage –Self-Organizing redundancy –Continuous Repair OceanStore properties: –Provides security, privacy, and integrity –Provides extreme durability –Lower maintenance cost through redundancy, continuous adaptation, self-diagnosis and repair

OceanStore:41WarburgPincus/Nov 19 ©2003 John Kubiatowicz/UC Berkeley For more info: OceanStore vision paper for ASPLOS 2000 “OceanStore: An Architecture for Global-Scale Persistent Storage” Pond Implementation paper for FAST 2003 “Pond: the OceanStore Prototype” Tapestry algorithms paper (SPAA 2002): “Distributed Object Location in a Dynamic Network” Tapestry deployment paper (JSAC, to appear) “Tapestry: A Resilient Global-scale Overlay for Service Deployment” Bloom Filters for Probabilistic Routing (INFOCOM 2002): “Probabilistic Location and Routing” CACM paper (February 2003): “Extracting Guarantees from Chaos”