Download presentation
Presentation is loading. Please wait.
Published byChristina Marylou Campbell Modified over 9 years ago
1
Bringing cloud technology to distributed data infrastructures EGI CF 2013 Martin Hellmich (presenter) Jedrzej Rybicki Maciej Brzeźniak Date :
2
A bit of context 2 Towards a pan-European Collaborative Data Infrastructure Production Services Safe Replication Data Staging Metadata AAI Research & Development Scalable Federation Architectures Data Preservation Data Access and Transfer Workflows
3
Three Projects Cloud storage integration –iRODS managing an OpenStack Swift backend –Extending DPM with S3 storage In-storage processing –Call Hadoop jobs from iRODS 3
4
My Goal Show the projects Find interest in the communities (we are interdisciplinary) Start discussion about cloud integration –Backend or frontend? –Outsource or restructure? –Where are limitations? 4
5
The Cloud Integration Projects iRODS-OpenStack Expose existing S3/OpenStack storage (managed otherwise) iRODS frontend protocols Local storage as cache 5 DPM-S3 Add new storage to DPM Expose HTTP only (but grid-aware, X509, VOMS) Outsource storage and network traffic
6
iRODS-OpenStack Swift Maciej Brzezniak Date :
7
Sidestep: iRODS compound resources 7 iRODS resources: Cache Archive Virtual iRODS compound resources: Virtual resource Maps from PUT/GET to POSIX Provides a cache
8
iRODS managing an S3 backend Ingredients: iRODS server S3 Driver (in C) iRODS-S3 Driver Glue Swift-to-S3 frontend 8 iRODS Site Disks OpenStack Swift/S3
9
Achievements Transparent cloud storage Cloud auth through central accounts Low Overhead through iRODS Speedups with caching Limitations: Filesize limit (2/5GB) Issue moving files inside the cloud 9 iRODS Site Disks S3/OpenStack
10
DPM-S3 Martin Hellmich Date :
11
DPM now uses dmlite 11 S3
12
Sidestep: the S3 protocol HTTP + custom headers Access ID + Secret Key + HTTP Cmd + Time => Signature Can be: Header: Authorization: AWS WSAccessKeyId:Signature In URL: ?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature= NpgCjnDzr%2BWFzoENXmpNDUsSn8%3D&Expires=1175139620 12
13
Extending DPM with S3 Storage 13 Site Disks S3 Signed URL redirect Ingredients: dmlite dmlite-plugins-s3 Amazon S3 OpenStack Swift S3 frontend Ceph/RadosGW
14
Achievements Only nameserver traffic local Cloud storage managed with central account Grid-enabled HTTP Standard HTTP clients Filesize limit (or S3 client) 14 Site Disks S3 Signed URL redirect
15
In-Storage Processing Jedrzej Rybicki & Benedikt von St. Vieth Date :
16
Motivation Example HPC workflow: 16 Site High Performance Computing Storage preprocessing Site High Performance Computing Storage + preprocessing
17
Sidestep: iRODS rules 17 Condition: $objPath like /x/y/z/* Or $rescName == demoResc8 Rule: printHello { print_hello; } Act freely on certain triggers At least C and Python
18
Benedikt von St. Vieth & Jedrzej Rybicki 18 In-Storage Processing
19
Achievements 19 Everything is a file Easy job specification in Apache Pig Caching of results Predefined scripts or custom jobs?
20
Summary 20 There are different ways to integrate cloud storage for different scenarios Storage-based computing can be made transparent
21
Thank you! OpenStack/iRODS –Maciej Brzezniak (PSNC) DPM-S3 –Martin Hellmich (CERN) In-storage processing on iRODS –Jedrzej Rybicki / Benedikt von St. Vieth (JSC) 21 Projects contacts Any Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.