Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS Use and Experience of FTS

Similar presentations


Presentation on theme: "ATLAS Use and Experience of FTS"— Presentation transcript:

1 ATLAS Use and Experience of FTS
FTS workshop 16 Nov 05

2 Outline Intro to ATLAS DDM How we use FTS
SC3 Tier 0 exercise experience Things we like Things we would like

3 ATLAS DDM System Moves from a file based system to one based on datasets Hides file level granularity from users A hierarchical structure makes cataloging more manageable However file level access is still possible Scalable global data discovery and access via a catalog hierarchy No global physical file replica catalog (but global dataset replica catalog and global logical file catalog) Datasets Sites Files

4 (Container) Dataset ‘B’
ATLAS DDM System As well as catalogs for datasets and locations we have ‘site services’ to replicate data We use ‘subscriptions’ of datasets to sites held in a global catalog Site services take care of the replica resolution, transfer and registration at the destination site Site ‘X’: Dataset ‘A’ (Container) Dataset ‘B’ Dataset ‘A’ | Site ‘X’ Dataset ‘B’ | Site ‘Y’ Site ‘Y’: Subscriptions: File1 File2 Data block1 Data block2

5 Subscription Agents Uses FTS here! File state Function Agents
(site local MySQL DB) Agents Function Fetcher Finds incomplete datasets unknownSURL ReplicaResolver Finds remote SURL knownSURL MoverPartitioner Assigns Mover agents assigned Mover Moves file Uses FTS here! toValidate ReplicaVerifier Verifies local replica validated BlockVerifier Verifies whole dataset complete done This is what runs on the VO Boxes

6 Within the Mover agent The python Mover agent reads in a XML file catalog of source files to copy The destination file name is based on the SRM endpoint + dataset name + source filename <File ID="bc340aff dcc-98aa c4bb07"> <physical> <pfn filetype="" name="srm://castorgridsc.cern.ch/castor/cern.ch/grid/atlas/ddm_tier0/perm/esd.0003/esd.0003._5645.1"/> </physical> <logical/> <metadata att_name="destination" att_value=" <metadata att_name="fsize" att_value=" "/> <metadata att_name="md5sum" att_value=""/> </File>

7 Within the Mover agent We create a file of source and dest SURLs and submit the bulk job to FTS (using CLI via python commands module) Then query every x seconds using glite-transfer-status to see if status changes ‘Done’: mark all files as successfully copied ‘Hold’, ‘Failed’: some or all files failed so look through the output for successes and failures In the case of failed file: The file is put back to the ‘unknownSURL’ state and goes again through the chain of agents (max 5 times x 3 FTS retries = 15 retries overall) Successful files: The destination file is validated by using SRM commands directly (getFileMetaData) to compare file size with source catalog file size Would like to know if this stage is really necessary or if FTS already does it (or will in future?) (more later…)

8 Using FTS within SC3 ATLAS’ SC3 is a Tier 0 exercise where we produce RAW data at CERN and replicate reconstructed data to Tier 1 sites (using FTS!) We started officially on 2nd Nov so been running for ~2 weeks now With ~1 month of small scale testing using the FTS pilot service - this was very useful for testing integration of FTS and debugging site problems with SRM paths etc..

9 Results so far 1 - 7 Nov

10 Results so far.. Put latest plots here Nov

11 What worked well The service is very reliable
virtually no failures connecting to service (apart from when CERN had unstable network) 99.9% of failures are problems with sites/humans It hasn’t lost any of our jobs information The interface is friendly and self-explanatory The throughput rate is fast enough, but we haven’t really stressed it so far Response to reported errors is good (fts-support)

12 What we would like Staging from tape
In theory this is not a problem for us in SC3 but will be in the future Would like FTS to deal with staging from tape properly (rather than giving SRM get timeouts), having a ‘staging’ status and perhaps enabling us to query through FTS whether files are on tape or disk Integration with replica catalogs We use LFC (LCG) and Oracle/Globus RLS (through POOL FC interface) (OSG) So we can say move LFN x from site y to site z and FTS calls a service that takes care of resolution and registration Bandwidth monitoring within FTS Error reporting lists again… would like to know who to tell in case of error. Can you give a hint based on the error?

13 What we would like TierX to TierY transfers handled by the network fabric, so channels between all sites should exist support priorities, with possibility to do late reshuffling plugins to allow interactions with experiment's services. Example of plug-ins - or experiment-specific services: catalog interactions (not exclusively grid catalogs) plugins to zip files on the fly (transparently to users but very good for MSS) - after transfer starts and/or before files are stored on storage an idea is for FTS to provide a callback? Must understand VO agents framework and what can be done with that! reliable: keep retrying until told to stop but allow real-time monitoring of errors for transfer (parseable errors preferable) so that we can do reshuffling of transfers, cancel them, etc signal conditions such as source missing, destination down, etc

14 Some Questions (maybe already answered today!)
Would like to understand how to optimise (no of files per bulk etc) Do you distinguish between permanent errors (channel doesn’t exist) and temporary errors (SRM timeout)? I.e. not retrying permanent errors and is there a way to report this to us so we don’t retry either? Do we need our own verification stage or are we just repeating what FTS does? ‘Duration’ - is this time from submission to completion or ‘Active’ time?

15 Conclusion We are happy with the FTS service so far - it’s given us some good results But we haven’t tested it til it breaks! Probably the most reliable part of SC3 in our experience We would like to see it integrated with more components to reduce our workload (staging, catalogs) Look forward to further developments!


Download ppt "ATLAS Use and Experience of FTS"

Similar presentations


Ads by Google