OceanStore: An Architecture for Global - Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton, Dennis Geels,

Slides:



Advertisements
Similar presentations
Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.
Advertisements

What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
POND: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon, Ben Zhao and John Kubiatowicz UC, Berkeley File and Storage.
Pond: the OceanStore Prototype CS 6464 Cornell University Presented by Yeounoh Chung.
Pond The OceanStore Prototype. Pond -- Dennis Geels -- January 2003 Talk Outline System overview Implementation status Results from FAST paper Conclusion.
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
Pond: the OceanStore Prototype Sean Rhea, Patrick Eaton, Dennis Geels, Hakim Weatherspoon,
David Choffnes, Winter 2006 OceanStore Maintenance-Free Global Data StorageMaintenance-Free Global Data Storage, S. Rhea, C. Wells, P. Eaton, D. Geels,
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage John Kubiatowicz.
OceanStore Global-Scale Persistent Storage John Kubiatowicz.
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage John Kubiatowicz.
OceanStore Status and Directions ROC/OceanStore Retreat 1/16/01 John Kubiatowicz University of California at Berkeley.
OceanStore Global-Scale Persistent Storage John Kubiatowicz.
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley.
OceanStore An Architecture for Global-scale Persistent Storage By John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
Tentative Updates in MINO Steven Czerwinski Jeff Pang Anthony Joseph John Kubiatowicz ROC Winter Retreat January 13, 2002.
Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells, John Kubiatowicz University of California, Berkeley.
The Oceanic Data Utility: (OceanStore) Global-Scale Persistent Storage John Kubiatowicz.
Gnutella, Freenet and Peer to Peer Networks By Norman Eng Steven Hnatko George Papadopoulos.
OceanStore: Data Security in an Insecure world John Kubiatowicz.
OceanStore Theoretical Issues and Open Problems John Kubiatowicz University of California at Berkeley.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.
OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory, Enhanced Chapter 7: Active Directory Replication.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
OceanStore: An Architecture for Global-Scale Persistent Storage John Kubiatowicz, et al ASPLOS 2000.
Jan 17, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements I will be out of town Monday and Tuesday to present at Multimedia Computing and Networking.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.
Designing a global repository using OceanStore Steven Czerwinski, Anthony Joseph, John Kubiatowicz Summer Retreat June 11, 2002 UC Berkeley.
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,
OceanStore: In Search of Global-Scale, Persistent Storage John Kubiatowicz UC Berkeley.
Distributed Architectures. Introduction r Computing everywhere: m Desktop, Laptop, Palmtop m Cars, Cellphones m Shoes? Clothing? Walls? r Connectivity.
Secure Messaging Workshop The Open Group Messaging Forum February 6, 2003.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
OceanStore: An Architecture for Global- Scale Persistent Storage.
Databases Illuminated
Freenet “…an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity.
Security fundamentals Topic 5 Using a Public Key Infrastructure.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
Peer to Peer Network Design Discovery and Routing algorithms
Introduction to Active Directory
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.
Option 2: The Oceanic Data Utility: Global-Scale Persistent Storage
Open Source distributed document DB for an enterprise
OceanStore: An Architecture for Global-Scale Persistent Storage
Accessing nearby copies of replicated objects
Distributed P2P File System
OceanStore: Data Security in an Insecure world
Pond: the OceanStore Prototype
OceanStore: An Architecture for Global-Scale Persistent Storage
Mid term grades Mean = 48.59, Median = 48.5, Min = 40, Max = 56.
Review Stateless (NFS) vs Statefull (AFS)
Content Distribution Network
An Architecture for Secure Wide-Area Service Discovery
Outline for today Oceanstore: An architecture for Global-Scale Persistent Storage – University of California, Berkeley. ASPLOS 2000 Feasibility of a Serverless.
Presentation transcript:

OceanStore: An Architecture for Global - Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patric Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, and Ben Zhao Ελευθερία Φιλτζαντζίδη, 2002

OceanStore  Ubiquitous Computing: Car, Clothing, Books, Houses.  Computing devices must have high performance.  Computing devices should consume low power.  Computing devices must be transparent to the user.  Persistent information is necessary for transparency.  Where does persistent information reside?

OceanStore  Requirements for a persistent infrastructure.  Connectivity through: cable-modems, DSL, cell phones and wireless data services.  Information must be kept secure.  Information must be extremely durable.  Archiving of information should be automatic and reliable.  Information must be divorced from location.  OceanStore is a utility infrastructure for persistent storage.

OceanStore  As a rough estimate, OceanStore will provide service to roughly users, each with at least 10,000 files.  OceanStore must therefore support over files.  Consumers will pay a monthly fee in exchange for acess to persistent storage follow services.  Companies buy and sell capacity from each other.  The core of the system is composed of a multitude of highly connected “pools”.

The OceanStore system

Two Unique Goals  Untrusted Infrastructure  Servers may crash without warning or leak information to third parties.  Only clients can be trusted.  All the information that enters the OceanStore is encrypted.  Nomadic Data: Data that is allowed to flow freely.  Promiscuous Caching: Data can be cached anywhere, anytime.

Applications  Groupware and personal information management tools. (calendars, , contact lists and distributed design tools)  OceanStore can be used to create large digital libraries and repositories for scientific data.  OceanStore provides an ideal platform for new streaming applications, such as sensor data aggregation and dissemination.

System Architecture  System Overview  Naming  Access Control  Data Location and Routing  Update Model  Deep Archival Storage  Introspection

System Overview  The fundamental unit is the persistent object  Objects exist in both active and archival forms.  Active Object: Is the latest version of its data together with a handle for update.  Archival Object:  Permanent read-only version of the object.  Archival Objects are encoded with an erasure code.  The OceanStore API provides: sessions, session guarantees, updates and callbacks.  OceanStore provides an array of familiar interfaces such as the Unix and a transactional interface.

Naming  Objects are identified by a GUID, a pseudo-random, fixed-length bit string.  An object GUID is the secure hash of the owner’s key and some human-readable name (Self-certifying path).  Certain objects act as directories, mapping human- readable names to GUIDs (SDSI).  A user can choose several directories as “roots”. The system as a whole has no “roots”.

Access Control  Reader restriction  All data that is not completely public is encrypted. The encryption key is distributed to those users with read permission.  Writer restriction  All writes can be verified against an access control list (ACL).  An owner of an object can choose the ACL x for an object foo by providing a signed certificate.

Data Location and Routing  OceanStore messages are labeled with a destination GUID, a random number, and a small predicate.  OceanStore combines data location and routing  The task of routing is handled by the aggregate resources of many different node.  Messages route directly to destinations.  The underlying infrastructure has more up-to-date routing information.  Routing mechanism is a two-tiered approach.  Probabilistic algorithm.  Deterministic algorithm.

Probabilistic Algorithm  Is fully distributed and uses a constant amount of storage per server.  Using an array of D normal Bloom filter the attenuated Bloom filter.  The first filter contains the objects which are locally on the node.  The ith Bloom Filter is the union of all the Bloom filters for all of the nodes a distance I through any path from the current node.  Bloom Filter : A method for representing a set A.  It consists of a vector B of m bits and k hash functions h 1,h 2,..h k of range {1..n}.  For each element of the set A, the bits at positions h 1 (a),h 2 (a)..h k (a) of the vector B is set to 1.  Given a query for b we check the bits at positions h 1 (a), h 2 (a),..h k (a).

The Probabilistic Query Process

The Global Algorithm  OceanStore uses a variation on Plaxton et. al.’s hierarchical distributed data structure.  The Basic Plaxton scheme  Every server in the system is assigned a random node-ID.  Each link is labeled with a level number.  In OceanStore each object is mapped to a single node whose node-ID matches the object’s GUID in the most bits. This node is called object’s root.  The location of a replica is “published” in the infrastructure.  This process requires O(logn) hops.

The Global Algorithm

Inserting Object # Searching Object #62942 Root Node Search Client Object Location

Update Model  Update model base on Conflict Resolution - Bayou System.  Update: list of predicates associated with actions.  Commit  Abort  Possible Predicates  compare-version, compare size : applied to unencrypted meta- data of an object.  compare-block: the encryption technology is a position- dependent block cipher.  search: is preformed directly to cipher data.

Update Model  Operations applied to ciphertext  replace block, append block: a position dependent block cipher.  insert block, delete block  Two sets of blocks: index blocks, data blocks.

Update Model  Someone must choose the final commit order of updates.  OceanStore choose two tier of replicas  A primary tier of replicas:  Byzantine agreement protocol.  Small number of replicas located in high bandwidth, high - connectivity regions of the network.  Stronger consistency guarantees.  A secondary tier of replicas:  Epidemic Algorithm.  They are organized into dissemination trees.  Contain both tentative and committed data.  Secondary replicas order tentative updates in timestamp order.  Lesser degree of consistency.

Update Model  After generating an update, a client send it to the object’s primary tier,as well as to several random replicas for that object.  The primary tier performs the Byzantine protocol. The secondary replicas propagate the update among themselves epidemically.  The result is multicast down the dissemination tree.

Deep Archival Storage  The archival mechanism employs erasure codes (interleaved Read-Solomon, Tornado codes).  Erasure coding treats data as a series of fragments and transforms these fragments into a greater number of fragments.  The fragments are spread widely. Any n of the coded fragments are sufficient to construct the original data.  Fragmentation increases reliability and survivability.

Introspection  Introspection augments a system’s normal operation (computation) with observation and optimization. The Cycle of Introspection An architecture for introspective systems in OceanStore

Status  The first implementation is deployed in Java.  They use the Unix file system interface and a read-only proxy for the WWW.  They have explored the security guarantees that are required for the OceanStore.  Included Components  A prototype for the probabilistic algorithm.  Prototype archival systems that use Read-Solomon and Tornado codes for redundancy encoding.  An introspective prefetching mechanism for a local file system.