S4: A Simple Storage Service for Sciences Matei Ripeanu Adriana Iamnitchi University of British Columbia University of South Florida.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Case Study: Photo.net March 20, What is photo.net? An online learning community for amateur and professional photographers 90,000 registered users.
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Storage Technologies Learning Objectives: –Renew acquaintance with disk and file system characteristics –Describe operational limitations of conventional.
Security by Design A Prequel for COMPSCI 702. Perspective “Any fool can know. The point is to understand.” - Albert Einstein “Sometimes it's not enough.
AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION? Mayur Palankar and Adriana Iamnitchi University of South Florida Matei Ripeanu University of British Columbia.
JamesRH  7 major AWS Services (  Amazon E-Commerce Service (ECS)  Amazon.
STANFORD UNIVERSITY INFORMATION TECHNOLOGY SERVICES IT Services Storage And Backup Low Cost Central Storage (LCCS) January 9,
The Total Cost of (Non) Ownership of Storage In The Cloud Jinesh Varia Technology Evangelist.
Amazon Web Services (aws) B. Ramamurthy. Introduction  Amazon.com, the online market place for goods, has leveraged the services that worked for their.
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
High Performance Computing Course Notes Grid Computing.
Data Grids Jon Ludwig Leor Dilmanian Braden Allchin Andrew Brown.
VoipNow Core Solution capabilities and business value.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Small-World File-Sharing Communities Adriana Iamnitchi, Matei Ripeanu and Ian Foster,
Are P2P Data-Dissemination Techniques Viable in Today's Data- Intensive Scientific Collaborations? Samer Al-Kiswany – University of British Columbia joint.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Chapter 7 Configuring & Managing Distributed File System
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
EE616 Technical Project Video Hosting Architecture By Phillip Sutton.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Module 6: Designing Active Directory Security in Windows Server 2008.
MODULE – 8 OBJECT-BASED AND UNIFIED STORAGE
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Cloud Computing Characteristics A service provided by large internet-based specialised data centres that offers storage, processing and computer resources.
Session 7 Windows Platform Eng. Dina Alkhoudari. Learning Objectives Active Directory review Managing users and groups Single Master Operations Delegation.
1 presentation of article: Small-World File-Sharing Communities Article: Adriana Iamnitchi, Matei Ripeanu, Ian Foster Presentation: Periklis Akritidis.
A Distributed Paging RAM Grid System for Wide-Area Memory Sharing Rui Chu, Yongzhen Zhuang, Nong Xiao, Yunhao Liu, and Xicheng Lu Reporter : Min-Jyun Chen.
Data Transfers in the Grid: Workload Analysis of Globus GridFTP Nicolas Kourtellis, Lydia Prieto, Gustavo Zarrate, Adriana Iamnitchi University of South.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
1 NETE4631 Working with Cloud-based Storage Lecture Notes #11.
Amit Warke Jerry Philip Lateef Yusuf Supraja Narasimhan Back2Cloud: Remote Backup Service.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
National Computational Science National Center for Supercomputing Applications National Computational Science GSI Online Credential Retrieval Requirements.
MSE Portfolio Presentation 1 Doug Smith November 13, 2008
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Introduction to Active Directory
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
USGS GRID Exploratory Status Review Stuart Doescher Mike Neiers USGS/EDC May
File Grouping for Scientific Data Management: Lessons from Experimenting with Real Traces Shyamala Doraimani* and Adriana Iamnitchi University of South.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Course: Cluster, grid and cloud computing systems Course author: Prof
Amazon Web Services (aws)
100% Exam Passing Guarantee & Money Back Assurance
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Amazon Storage- S3 and Glacier
(ITI310) SESSIONS 6-7-8: Active Directory.
Geographically distributed storage over multiple universities
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
Building a Database on S3
A Software-Defined Storage for Workflow Applications
Distributed Ledger Technology (DLT) and Blockchain
AWS Cloud Computing Masaki.
Data Management Components for a Research Data Archive
Presentation transcript:

S4: A Simple Storage Service for Sciences Matei Ripeanu Adriana Iamnitchi University of British Columbia University of South Florida

2 Services as utilities have gained traction –Economy of scale  lower costs –One of the present drivers for Grid computing Success story: Amazon Simple Storage Service (S3) –S3 growth is capacity constrained –Direct access to storage: open protocols, APIs –Performance claims: Infinite data durability, 99.99% availability, fast access –Billing: pay-as-you go $0.15/month/GB stored; $ /GB transferred Science communities are huge storage users The Situation

3 The Motivating Questions The immediate question: Is offloading data storage to a storage utility feasible and cost-effective for science Grids? The long-term question: How should a storage utility that targets scientific applications look like?

4 Characterize S3 –Does it live up to its own objectives? Toy scenario: consider a representative scientific application (DZero) –Is the functionality provided adequate? –Estimate performance and costs Q: Is offloading data storage from an in-house storage system to S3 feasible and cost-effective for science Grids? The Approach

5 The Answer: Risky. New risk: direct monetary loss –Magnified as there is no built-in solution to limit loss –In addition to well-known risk in distributed systems Security mechanisms -- too simple to be useful for large collaborations –Access control using ACLs, hard to use in large systems, needs at least groups –No support for delegation –Implicit trust between users and the S3 service No transaction ‘receipts’, no support for un-repudiabiliy But … standard techniques to deal with these problems

6 The Answer: Costly. Scenario: S3 used by a high-energy physics collaboration The DØ Experiment Traces from January ‘03 to March ’05 (27 months) 375TB stored, 5.2 PB processed, 561 users, 13 countries DataS3 ProcessingDØ Storage $675K Access$462K (per year) S3 DØ $675K $66K Add caching: 4TB/site cooperative cache S3 EC2 $675K $44K S3 DØ $200K…$400K $66K Move processing to EC2 Realize that data gets ‘cold’:  archive cold raw-data  throw away cold derived data (keep definitions)

7 Unbundle performance characteristics –S3: high-availability, high-durability, high-access performance, bundled at a single pricing point Applications often do not need all three Each characteristic requires different resources and generates different costs –Solution: classes of service that allow applications to specify their requirements and chose pricing point Exploit usage patterns –e.g., data gets cold Facilitate the use of application-level information to reduce costs –E.g., raw vs. derived data Guidelines for a Simple Storage Service for Sciences (S4)

8 Questions? To access the S3 evaluation technical report:

9 Two level namespace –Buckets (think directories) Unique names Two goals: data organization and charging –Data objects Opaque object (max 5GB) Metadata (attribute-value, up to 4K) Functionality –Simple put/get functionality –Limited search functionality –Objects are immutable, cannot be renamed Data access protocols –SOAP –REST –BitTorrent Simple Storage Service (S3) Architecture

10 Security –Identities Assigned by S3 when initial contract is ‘signed’ –Authentication Public/private key scheme But private key is generated by Amazon! –Access control Access control lists (limited to 100 principals) ACL attributes –FullControl, –Read & Write (for buckets only for writes) –ReadACL & WriteACL (for buckets or objects) –Auditing (pseudo) S3 can provide a log record S3 Architecture (…cont)

11 Durability –Perfect (but based on limited scale experiment) Availability –Four weeks of traces, about 3000 access requests from 5 PlanetLab nodes –Retry protocol, exponential back-off, –‘Cleaned’ data 99.03% availability after original access 99.55% availability after first retry 100% availability after second retry Access performance S3 Evaluation

12 CharacteristicsResources and techniques to provide them High-performance data access Geographical replication to improve access locality, high-speed storage, fat networks. DurabilityReplication at various scales: RAID, erasure codes, multiple locations, multiple media;. Availabilityservice replication, hot-swap technologies, multi ‑ hosting, increase availability for auxiliary services (e.g., authentication, access control)

13 Application classDurabilityAvailabilityHigh access speed CacheNoDependsYes Long-term archivalYesNo Online productionNoYes Batch productionNo Yes