Bringing cloud technology to distributed data infrastructures EGI CF 2013 Martin Hellmich (presenter) Jedrzej Rybicki Maciej Brzeźniak Date :

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.
Volunteer Computing Laurence Field IT/SDC 21 November 2014.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
FutureGrid Image Repository: A Generic Catalog and Storage System for Heterogeneous Virtual Machine Images Javier Diaz, Gregor von Laszewski, Fugang Wang,
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
WebFTS as a first WLCG/HEP FIM pilot
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Minerva Infrastructure Meeting – October 04, 2011.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Windows Azure SQL Database and Storage Name Title Organization.
Opensource for Cloud Deployments – Risk – Reward – Reality
10 May 2007 HTTP - - User data via HTTP(S) Andrew McNab University of Manchester.
Evolution to CIMI Charles (Cal) Loomis & Mohammed Airaj LAL, Univ. Paris-Sud, CNRS/IN2P3 29 August 2013.
OSG Public Storage and iRODS
Connect.usatlas.org ci.uchicago.edu ATLAS Connect Technicals & Usability David Champion Computation Institute & Enrico Fermi Institute University of Chicago.
REPLIX Max Planck Institute for Psycholinguistics, TLA.
Cloud Standard API and Contextualization
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Bridge Laurence Field IT/SDC 6 March 2015.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
IRODS workshop, September , Linköping (Sweden) iRODS Workshop users needs summary Agnès Ansari – Wednesday, 26 September.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Federating Grid and Cloud Storage in EUDAT
Evaluating distributed EOS installation in Russian Academic Cloud for LHC experiments A.Kiryanov 1, A.Klimentov 2, A.Zarochentsev 3. 1.Petersburg Nuclear.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
DPM Python tools Ivan Calvet IT/SDC-ID DPM Workshop 10 th October 2014.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Document Name CONFIDENTIAL Version Control Version No.DateType of ChangesOwner/ Author Date of Review/Expiry The information contained in this document.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
IT-SDC : Support for Distributed Computing Dynafed FTS3 Human Brain Project use cases Fabrizio Furano Alejandro Alvarez.
Maciej Brzeźniak, Stanisław Jankowski, Paweł Woszuk, PSNC Shaun de Witt, STFC Martin Hellmich, CERN Federating Grid and Cloud Storage in EUDAT International.
Virtual multidisciplinary EnviroNments USing Cloud infrastructures Data Management at VENUS-C Ilja Livenson KTH
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
DPM: Future Proof Storage Ricardo Rocha ( on behalf of the DPM team ) EMI INFSO-RI
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
1 EGI Federated Cloud Architecture Matteo Turilli Senior Research Associate, OeRC, University of Oxford Chair – EGI Federated Clouds Task Force
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Interface: CDMI for CMF Ilja Livenson PDC KTH.
Federating Data in the ALICE Experiment
PaaS services for Computing and Storage
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Jean-Philippe Baud, IT-GD, CERN November 2007
Dynamic Storage Federation based on open protocols
Give Your Data the Edge A Scalable Data Delivery Platform
Give Your Data the Edge A Scalable Data Delivery Platform
AAI for a Collaborative Data Infrastructure
Vincenzo Spinoso EGI.eu/INFN
CyberSKA: Global Federated e-Infrastructure
Dynafed, DPM and EGI DPM workshop 2016 Speaker: Fabrizio Furano
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Introduction to Data Management in EGI
Research Data Archive - technology
Odum Institute iRODS Policies to Support Preservation
Mix & Match: Resource Federation
Web Server Design Week 16 Old Dominion University
Presentation transcript:

Bringing cloud technology to distributed data infrastructures EGI CF 2013 Martin Hellmich (presenter) Jedrzej Rybicki Maciej Brzeźniak Date :

A bit of context 2 Towards a pan-European Collaborative Data Infrastructure Production Services Safe Replication Data Staging Metadata AAI Research & Development Scalable Federation Architectures Data Preservation Data Access and Transfer Workflows

Three Projects Cloud storage integration –iRODS managing an OpenStack Swift backend –Extending DPM with S3 storage In-storage processing –Call Hadoop jobs from iRODS 3

My Goal Show the projects Find interest in the communities (we are interdisciplinary) Start discussion about cloud integration –Backend or frontend? –Outsource or restructure? –Where are limitations? 4

The Cloud Integration Projects iRODS-OpenStack Expose existing S3/OpenStack storage (managed otherwise) iRODS frontend protocols Local storage as cache 5 DPM-S3 Add new storage to DPM Expose HTTP only (but grid-aware, X509, VOMS) Outsource storage and network traffic

iRODS-OpenStack Swift Maciej Brzezniak Date :

Sidestep: iRODS compound resources 7 iRODS resources: Cache Archive Virtual iRODS compound resources: Virtual resource Maps from PUT/GET to POSIX Provides a cache

iRODS managing an S3 backend Ingredients: iRODS server S3 Driver (in C) iRODS-S3 Driver Glue Swift-to-S3 frontend 8 iRODS Site Disks OpenStack Swift/S3

Achievements Transparent cloud storage Cloud auth through central accounts Low Overhead through iRODS Speedups with caching Limitations: Filesize limit (2/5GB) Issue moving files inside the cloud 9 iRODS Site Disks S3/OpenStack

DPM-S3 Martin Hellmich Date :

DPM now uses dmlite 11 S3

Sidestep: the S3 protocol HTTP + custom headers Access ID + Secret Key + HTTP Cmd + Time => Signature Can be: Header: Authorization: AWS WSAccessKeyId:Signature In URL: ?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature= NpgCjnDzr%2BWFzoENXmpNDUsSn8%3D&Expires=

Extending DPM with S3 Storage 13 Site Disks S3 Signed URL redirect Ingredients: dmlite dmlite-plugins-s3 Amazon S3 OpenStack Swift S3 frontend Ceph/RadosGW

Achievements Only nameserver traffic local Cloud storage managed with central account Grid-enabled HTTP Standard HTTP clients Filesize limit (or S3 client) 14 Site Disks S3 Signed URL redirect

In-Storage Processing Jedrzej Rybicki & Benedikt von St. Vieth Date :

Motivation Example HPC workflow: 16 Site High Performance Computing Storage preprocessing Site High Performance Computing Storage + preprocessing

Sidestep: iRODS rules 17 Condition: $objPath like /x/y/z/* Or $rescName == demoResc8 Rule: printHello { print_hello; } Act freely on certain triggers At least C and Python

Benedikt von St. Vieth & Jedrzej Rybicki 18 In-Storage Processing

Achievements 19 Everything is a file Easy job specification in Apache Pig Caching of results Predefined scripts or custom jobs?

Summary 20 There are different ways to integrate cloud storage for different scenarios Storage-based computing can be made transparent

Thank you! OpenStack/iRODS –Maciej Brzezniak (PSNC) DPM-S3 –Martin Hellmich (CERN) In-storage processing on iRODS –Jedrzej Rybicki / Benedikt von St. Vieth (JSC) 21 Projects contacts Any Questions?