Data Transport to the Cloud David Aikema University of Cape Town
Outline Brad Frank talked about ARCADE and MeerKAT Rob Simmonds discussed SKA regional centres and delivery system objectives Now a brief outline of a prototype for staging data from the MeerKAT archive for further analysis Scenario Why schedule data transfers? Related work Architecture Software
Scenario MeerKAT archive at CHPC Much of the data analysis to be done elsewhere IDIA / ARC ASTRON (Netherlands) Need to store produced Science Products from these facilities back in the archive
Why schedule data transfers? Allows priorities to be set on which data is moved next Adhere to user/project resource allocations Avoid starvation Manage network to maximize performance Handle congestion – particularly on long-distance links (ASTRON) Ensures that WAN is kept busy by keeping data in flight Use efficient WAN data transfer protocols Allows checks to see if data is available at other locations Support subscriptions to datasets
Related work CERN tools LIGO Data Replicator GridFTP / Globus NGAS Phedex, Rucio, FTS, … Somewhat relevant but closely tied to specific projects LIGO Data Replicator GridFTP / Globus NGAS Apache OODT (HT)Condor / Stork
Components Twisted Framework (Python) Rabbitmq queuing system Globus (Software-as-a-Service)
Overview Archive Interface Incoming request Request Handler Staging Queue Staging Agent Staging Buffer Remote Storage Distribution Policy Transfer Queue Transfer Agent Globus