Give Your Data the Edge A Scalable Data Delivery Platform University of Arizona University of North Carolina Open Networking Lab Princeton University
Data Management Challenge Distributed Set of Collaborators Data Management Experts Share Pre-Stage Write-Back Institutional Resources Commodity Cloud Storage S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE
Our Goal Enable a scalable number of collaborators (and their applications) to share access to data independent of where it is stored, in a way that (1) minimizes the operational burden on users and (2) maximizes the uses of commodity infrastructure.
Syndicate Solution CDN Metadata Service Shared Volume SG SG SG SG SG CyVerse DropBox Talk to the value of a CDN. Leverage the same service as Netflix, but for scientific data. XSEDE
Syndicate Solution CDN Manages data consistency and Shared Volume Manages data consistency and key distribution Bridges application workflow and HTTP transport; e.g., – Jupyter – Hadoop SG SG SG Metadata Service CDN Aquires data from existing data stores; e.g., – CyVerse – XSEDE Treats cloud storage as a block device SG SG SG SG S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE
Syndicate Solution CDN As easy as mounting Dropbox Auto-mount in Shared Volume Auto-mount in Cloud VMs SG SG SG Metadata Service CDN SG SG SG SG S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE
OpenCloud – Service Delivery Platform Shared Volume SG SG SG Metadata Service SG SG SG SG S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE
The “Value-Add” Strategy Syndicate = CDN Object Store NoSQL DB Value-Add Storage Service Scalable Read Bandwidth (Akamai HyperCache & RequestRouter) Data Durability (S3, Glacier, DropBox, Box, Swift) Data Consistency (Google App Engine)
Value-Add Storage Service OpenCloud Commodity Clouds Private Clouds Internet2 Backbone Regional & Campus End Users HPC Amazon AWS S3 iRODS RR . … Google Cloud Platform MS Latency matters Shared state matters Sufficient resources matters
Syndicate Value Proposition Cloud-Ready – Allows users to mount shared volumes into cloud-hosted virtual machines (VMs) with minimal operational overhead. Scalable Read Bandwidth – Provides scalable read bandwidth (i.e., supports a scalable number of users) with minimal operational overhead. Provider Independence – Allows users to take advantage of cost/performance tradeoffs among multiple storage providers (as well as spread risk across those providers) with minimal operational overhead.
Syndicate Value Proposition Secure-by-Default – Allows users to securely share files across organizational boundaries, at scale, with minimal operational overhead. Adapt to Existing Workflows – Makes it easy to integrate existing user workflows, datasets, and toolkits, as well as extend and customize to meet specific community requirements (e.g., privacy). Sustainable Design – Provides a general-purpose storage platform that leverages commodity storage and network caches at every opportunity. Commodity!!! Value to NSF No up-front capital investment. Pay-as-you-go approach.