Introduction to Data Management in EGI Vincenzo Spinoso vincenzo.spinoso@egi.eu EGI.eu/INFN
Outline Categorisation of data services in EGI Status and future plans
Components Data management is performed by interoperable components Different components address different needs Storage management at site level Transfer between sites Security Catalogue, metadata
How data are managed at site level? Storage endpoints How data are managed at site level?
Storage endpoints A unique namespace is provided to the client Authentication and encryption guarantee confidentiality and integrity Several protocols are supported for file access and transfer Distribute data across several disk servers guarantees scalability at site level If tapes are provided, access to tape is transparent
Storage endpoints DPM Lustre or GPFS StoRM
What about interoperability, access, transfers?
Access, transfers DPM StoRM Abstraction layer SRM GridFTP WebDAV NFS/pNFS «Storage element» Applications and users can interact with the endpoints using different protocols SRM offers storage management disk/tape transparent management interface between different transfer protocols standard interface GridFTP offers advanced data transfer Parallel streams Fault tolerance Security (authorization, encryption) Optimization
Access, transfers DPM StoRM Abstraction layer SRM GridFTP WebDAV NFS/pNFS «Storage element» Applications and users can interact with the endpoints using different protocols WebDAV offers a «web-based network file system» Widely supported by many OSes Standard (IETF) NFS4.1 provides «local access» (fast, POSIX)
Access, transfers DPM Abstraction layer SRM GridFTP WebDAV NFS/pNFS
Data transfer scheduling Can transfers be scheduled?
Data transfer scheduling schedule continuous sustained data transfer across multiple endpoints prioritize inter-VO and intra-VO file transfers Many different clients available towards several protocols (SRM, GridFTP, webdav… ) Useful in the VO management context to control data transfers
Catalogue Where are my files? lfn:grid/20150407/store/data/run1312
Catalogue LFC hierarchical view of files to users, with a UNIX-like client interface Logical File Name (LFN) to Storage URL (SURL) mappings authorization on namespace
EGI «whole picture» Really complex infrastructure based on elementary «bricks» each VO chooses its «recipe» of components mature and stable integration in a unified release controls stability of the «off-line» machinery operations control stability of the «on-line» machinery
Globus Online provides robust and easy to use file transfer capabilities Web interface Transfer management Performance monitoring Retries after failures, autorecover when possible It’s a service, hosted at www.globusonline.eu (US) But the files that the service moves among EGI sites DO NOT LEAVE Europe GridFTP «3rd party transfer» is used Files copied directly between the EGI endpoints
iRODS Provides high level abstraction layer on top of storage resources Users focus on their data, not on where they are on the data grid Provides native metadata catalogue Multiple authentication plugins (password, PAM, GSI… ) Multiple access protocols (POSIX, S3, RADOS… ) Rule-oriented approach: «policies» can be easily implemented as data management tasks Ongoing integration in the EGI infrastructure
FedCloud IaaS Capabilities Computing VM Management VM Marketplace Storage Block Storage Object Storage
Block Storage Persistent block level storage to use with VMs Use as any other block device from VMs Snapshotable Simple usage Consistent and low-latency performance SSDs (in some sites) High Performance From GB to TB Create and attach to VMs on demand Scale to your needs
Object Storage API Access Scalable Sharing Data storage infrastructure for storing and retrieving data from anywhere at any time Simple REST APIs for managing and accessing data API Access Store as much data as needed. Get accounted only for the space used. Scalable Define ACLs on each object, share publicly your data Sharing
Block Storage vs Object Storage Access only from within a VM only at the same site the VM is located from any device connected to the internet. Sharing not possible possible (data can be kept private or public) Accounting for the entire volume, regardless how much of it is actually used only for the data stored Integration easy with any application capable to write/read file from a local disk requires a client to be integrated within the application
Use Cases Block Storage Object Storage Application hosting Data Processing Database Large Data File Storage & Backup Static Content Media Serving & Sharing Big Data
in order to integrated a product in UMD please follow instructions onhttps://wiki.egi.eu/wiki/EGI_Software_Component_Delivery Questions?