OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage.
Pond: the OceanStore Prototype CS 6464 Cornell University Presented by Yeounoh Chung.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
Pond The OceanStore Prototype. Introduction Problem: Rising cost of storage management Observations: Universal connectivity via Internet $100 terabyte.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
David Choffnes, Winter 2006 OceanStore Maintenance-Free Global Data StorageMaintenance-Free Global Data Storage, S. Rhea, C. Wells, P. Eaton, D. Geels,
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage John Kubiatowicz.
OceanStore Global-Scale Persistent Storage John Kubiatowicz.
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage John Kubiatowicz.
OceanStore Status and Directions ROC/OceanStore Retreat 1/16/01 John Kubiatowicz University of California at Berkeley.
OceanStore Global-Scale Persistent Storage John Kubiatowicz.
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
1 OceanStore Global-Scale Persistent Storage Ying Lu CSCE496/896 Spring 2011.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
OceanStore An Architecture for Global-scale Persistent Storage By John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.
Tentative Updates in MINO Steven Czerwinski Jeff Pang Anthony Joseph John Kubiatowicz ROC Winter Retreat January 13, 2002.
Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells, John Kubiatowicz University of California, Berkeley.
The Oceanic Data Utility: (OceanStore) Global-Scale Persistent Storage John Kubiatowicz.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
OceanStore: Data Security in an Insecure world John Kubiatowicz.
OceanStore Theoretical Issues and Open Problems John Kubiatowicz University of California at Berkeley.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.
OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley
Opportunities for Continuous Tuning in a Global Scale File System John Kubiatowicz University of California at Berkeley.
Tapestry on PlanetLab Deployment Experiences and Applications Ben Zhao, Ling Huang, Anthony Joseph, John Kubiatowicz.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
OceanStore: An Architecture for Global - Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton, Dennis Geels,
Review Session for Fourth Quiz Jehan-François Pâris Summer 2011.
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz, et al ASPLOS 2000.
CS Storage Systems Lecture 14 Consistency and Availability Tradeoffs.
Jan 17, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements I will be out of town Monday and Tuesday to present at Multimedia Computing and Networking.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
OceanStore: In Search of Global-Scale, Persistent Storage John Kubiatowicz UC Berkeley.
Distributed Architectures. Introduction r Computing everywhere: m Desktop, Laptop, Palmtop m Cars, Cellphones m Shoes? Clothing? Walls? r Connectivity.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
OceanStore: An Architecture for Global- Scale Persistent Storage.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
Peer to Peer Network Design Discovery and Routing algorithms
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage
OceanStore: An Architecture for Global-Scale Persistent Storage
Accessing nearby copies of replicated objects
OceanStore: Data Security in an Insecure world
Pond: the OceanStore Prototype
OceanStore: An Architecture for Global-Scale Persistent Storage
CSE 542: Operating Systems
Mid term grades Mean = 48.59, Median = 48.5, Min = 40, Max = 56.
Review Stateless (NFS) vs Statefull (AFS)
Content Distribution Network
Outline for today Oceanstore: An architecture for Global-Scale Persistent Storage – University of California, Berkeley. ASPLOS 2000 Feasibility of a Serverless.
Presentation transcript:

OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, Ben Zhao A few slides are taken from John Kubiatowicz’s presentation

Vision what is Oceanstore? “a utility infrastructure to span the globe and provide continuous access to persistent information” Source: Berkeley OceanStore Website

Where does the data come from All kinds of information Desktop, laptop, palmtop Cell phones, embedded devices

Persistence of data Data should be affected even if device is lost or replaced reliable, durable data “deep archival” (will last forever) Automatic maintenance

Continuous access to data Connectivity even to tiniest devices, possibly intermittent variable bandwidth, latency Availability Highly available, replicated in a global scale comparable to LAN-based networked storage fault-tolerant, DoS-tolerant

How much data? scale geographically distributed Assume users Each use has 10,000 files or objects files / objects Each file = 1 MB bytes = 100 Exabytes

Data service Economics Pay monthly fee to ISPs / data service providers Essentially, you use others’ storage capacity via subscription Cloud computing was not introduced then …

Assumptions Untrusted infrastructure Servers may crash or leak information, but Most of the servers functioning correctly “Financially responsible” servers ensure integrity but only clients are trusted with cleartext Nomadic data Data divorced from location, but flows freely within the storage infrastructure Promiscuous caching: “anywhere, anytime” Proximity of location important for performance

System overview persistent object Globally Unique Identifier GUID for each object: 160-bit SHA-1 hash secure identification – globally unique and unforgeable 2 80 unique objects before collisions (birthday paradox) Encrypted data, unless data is public read try fast probabilistic replica search (Bloom filter) fallback to slower deterministic search (Tapestry) [Tapestry is a practical P2P network that uses Plaxton routing]

System overview Write Update with predicates [as in Bayou] – [what is Bayou?] Update creates new version, in principle need to group updates, retire objects [Think of a shared calendar]

What is Bayou The Bayou System is a platform of replicated, highly-available, databases on which to build collaborative applications. Supports weak consistency models

System overview application interface sessions: sequence of read/writes session guarantees [Bayou] weak consistency levels, ACID Unix File Sharing semantics active and archival forms active: latest version, with update handle archive: “erasure coded” read-only version

Comparison with Bayou Similarities update with predicates [Example: compare version: Check if this version number is greater than X] creates new version, in principle Uses replication to improve availability at the expense of consistency Differences anti-entropy vs. promiscuous caching Encryption (unlike Bayou) Designated servers Anytime, anywhere

naming self-certifying path names (Mazières) (a mechanism that avoids central naming & certification & key distribution) object GUID = hash of owner key and readable name other objects server GUID = hash of public key archival GUID = hash of data read restriction (through client encryption of data) write restriction (associate ACL lists with object, respected by servers

addressing and routing Address an object by its GUID message: GUID, small predicate route to closest GUID replica matching predicate combines data location and routing: no central name service to attack

addressing and routing fast, probabilistic search algorithm Bloom filter probabilistic set membership test using bit vector n-bit vector generated from n hashes of each set element filter is union (OR) of all bit vectors attenuated Bloom filter array of d Bloom filters i th Bloom filter is union of all <i -hop nodes slow, deterministic algorithm Tapestry

Bloom Filter Quickly tests if an element w belongs to a set S (Taken from Wikipedia) For each element of the set S, compute K hash functions, and enter them into the corresponding slots of the array. There is a very small risk of false positive. Note that w does not belong to the set S K=3

Attenuated Bloom Filter Quickly tests if an object w is stored in a site S, and if not, then in which direction to route the query It gives a hint of how far away a match can possibly be found W

addressing and routing probabilistic deterministic

Attenuated Bloom Filter

21 Tapestry uses Plaxton Routing Using the routing tables, the query is forwarded towards the root node of the object. The root points to the server storing the object. Extra copies may be discovered earlier as the query moves towards the root. Root of O (O,S’,2) (O,S,1) (O,S,2) (O,S’,1) (O,S) Server S (O,S’,3) Server S’

updates based on versioning and conflict resolution i.e. no locking update: actions with predicates commit – apply action of first true predicate abort – no true predicates conflict resolution on encrypted data possible predicates: compare-version, compare-size, compare-block, search possible actions: replace-block, insert-block, delete-block, append

Update on ciphertext

updates serializing updates will not trust any single server Byzantine agreement among primary tier servers secondary tier gossips tentative data during commit multicast dissemination of commit to secondary tier primary secondary

archival produced when objects idle use erasure codes (redundant fragmentation) simplest example: parity bit need any (n-1) out of n fragments Reed-Solomon codes fragmentation improves reliability File Any 4 can Reconstruct the block Any 4 can Reconstruct the block

dynamic optimization (introspection) observation modules collect and summarize information incrementally update system database and optimization modules periodically process the observation database replica management: maintain replica count and location periodic migration: work-home-work-home… etc