1 SRM-Lite: overcoming the firewall barrier for data movement Arie Shoshani Alex Sim Viji Natarajan Lawrence Berkeley National Laboratory SDM Center All-Hands Meeting November, 2007
2 Outline What are Resource Storage Managers (SRM) Requirement of using SRM behind firewalls Satisfying the Requirements Architecture Potential uses
3 Storage Resource Managers SRMs are middleware components whose function is to provide:SRMs are middleware components whose function is to provide: dynamic space allocation AND file management in spaces for storage components on the local or wide-area network Based on a common standard SRM (BeStMan) client/user applications Unix-based Disk Pools Examples of storage systems currently supported by SRMs dCache CASTOR CCLRC RAL GPFS SRM (DPM) SRM (StoRM) SRM/ dCache SRM/ CASTOR SRM (StoRM) Unix-based Disk Pools
4 Storage Resource Managers: Main concepts Non-interference with local policies Advance space reservations Dynamic space management Pinning file in spaces Support abstract concept of a file name: Site URL (SURL) Temporary assignment of file names for transfer: Transfer URL (TURL) Directory Management and ACLs Multi-file requests (srmRquestToPut, srmRequestToGet, srmCopy) Transfer protocol negotiation Peer to peer request support Support for asynchronous multi-file requests Support abort, suspend, and resume operations SRM relies on other services for data movement (GridFTP, HTTPS, SCP, …)
5 Concepts: Site URL and Transfer URL Provide: Site URL (SURL) URL known externally – e.g. in Replica Catalogs e.g. srm://ibm.cnaf.infn.it:8444/dteam/test Get back: transfer URL (TURL) Path can be different than SURL – SRM internal mapping Protocol chosen by SRM based on request protocol preference e.g. gsiftp://ibm139.cnaf.infn.it:2811//gpfs/dteam/test One SURL can have many TURL Files can be replicated in multiple storage components Files may be in near-line and/or on-line storage In light-weight SRM (a single file system on disk) SURL can be the same as TURL except protocol File sharing is possible Same physical file, but many requests Needs to be managed by SRM
6 Tomcat servlet engine Tomcat servlet engine MCS Metadata Cataloguing Services MCS Metadata Cataloguing Services RLS Replica Location Services RLS Replica Location Services SOAP RMI MyProxy server MyProxy server MCS client RLS client MyProxy client GRAM gatekeeper GRAM gatekeeper CAS Community Authorization Services CAS Community Authorization Services CAS client disk MSS Mass Storage System HPSS High Performance Storage System disk HPSS High Performance Storage System disk DRM Storage Resource Management DRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server openDAPg server openDAPg server gridFTP Striped server gridFTP Striped server LBNL LLNL ISI NCAR ORNL ANL DRM Storage Resource Management DRM Storage Resource Management Earth Science Grid Analysis Environment (in production for 4 years) >5000 users160 TBs managed SRMs are used and inter-communicate in several sites SRMs
7 Robust Data Movement provided by SRMs and DataMover Problem: move thousands of files robustlyProblem: move thousands of files robustly Takes many hours Need error recovery Mass storage systems failures Network failures Solution: Use Storage Resource Managers (SRMs)Solution: Use Storage Resource Managers (SRMs) File streaming paradigm By reserving and releasing storage space automatically Problem: too slowProblem: too slow Solution:Solution: in GridFTP Use parallel streams Use large FTP windows Pre-stage files from MSS Use concurrent transfers NCAR Anywhere LBNL Disk Cache Disk Cache SRM-COPY (thousands of files) SRM-GET (one file at a time) DataMover SRM (performs writes) SRM (performs reads) GridFTP GET (pull mode) stage files archive files Network transfer Get list of files MSS Example setup for Earth System Grid (ESG)
8 File tracking shows recovery from transient failures Total: 45 GBs
9 Requirements for SRM-Lite Run SRM behind a firewall Cannot have third party transfers (source/target is local) May not be able to run GridFTP Remote site may not support it Some communities choose not to use GSI Need support for multi-file transfer Or entire directory Need support for asynchronous request Also support for intermediate status of request Need to support concurrent file transfers
10 Satisfying the Requirements: SRM-Lite Run SRM behind a firewall Must have a client tool (SRM-Lite) May not be able to run GridFTP Support high-performance SCP: Use HPN-SSS from Pittsburgh supercomputing Center But, also use other transfer protocols (GridFTP, bbcp, https, …) Need support for multi-file transfer Manage queues for large requests Need support for asynchronous request SRM-Lite returns a “request token”; token can be used for “request status” Need to support concurrent file transfers Use multi-threading to manage concurrent transfers Monitor transfers and recover from mid-transfer interruptions
11 Scenario A: firewall at one site Disk Cache SSH Server NERSC SSH Channel (SCP) GridFTP/FTP/ BBCP/HTTP transfers Process StepsProcess Steps Login to ORNL using OTP At ORNL invoke SRM-Lite User composes XML input file, srmlite.xml for selected files/directories to copy from/to another site Or, user gives command line option for a selected file/directory SRM-Lite uses srmlite.xml or command line input to automatically Push/Pull files to/from NERSC Use multiple threads for concurrent transfers Disk Cache ORNL SRM- Lite OTP Login srmlite.xml Local Commands And Protocols Put example: Source: file:////my_directory/file_foo Target: scp://host/target_dir/file_foo Get example: Source: GridFTP://host/target_dir/file_foo Target: file:////my_directory/file_foo
12 Scenario B: one end has a firewall, The other end has SRM Disk Cache HPSS SRM NERSC GridFTP/FTP/ SCP transfers Disk Cache ORNL SRM- Lite OTP Login srmlite.txt SRM Request Put example: Source: file:////my_directory/file_foo Target: srm://host/target_dir/file_foo
13 Scenario C: firewalls at both ends Disk Cache SSH Server SSH Channel (SCP) Process StepsProcess Steps Login to Site1 using OTP At site1 invoke SRM-Lite SRM-Lite at site1 uses SSH to invoke SRM-Lite at site2 Use SSH channel for SCP Same as before: User composes XML input file, srmlite.xml for selected files/directories to copy from/to another site Or, user gives command line option for a selected file/directory Disk Cache SRM- Lite OTP Login srmlite.xml SRM- Lite site2site1
14 Scenario C: SRM-Lite manages MSS access SSH Server SSH Channel (SCP) SRM- Lite OTP Login srmlite.xml SRM- Lite site2site1 Disk Cache HPSS Disk Cache HPSS
15 GUI for SRM-Lite Used in ESG Special version for data movement to user workstations Called DataMover-Lite Versions exist for Linux, PC, Mac
16 Usage Combustion project The Applied Partial Differential Equations Center (APDEC) John Bell Efficient, robust data movement from sites behind firewalls At DoE and DoD sites Kepler-SRM-Lite actor To be used for managing multi-file transfers from sites behind firewalls Launch SRM-Lite remotely through SSH Initial version – help from NCSU: Pierre Mouallem Two modes Entire request Streaming file requests To be used in CPES workflows first with Norbert’s help