BNL FTS services Hironori Ito
File Transfer Service Lowest level data transfer/movement service for gLite. Provides asynchronous operation to users. The command ( glite-transfer-submit ) returns the prompt to a user immediately. It uses TOMCAT/JAVA web service with oracle database for transfer catalog. All storage sites must be recognized (services.xml) by FTS server. For transfer between two storage sites, a channel must be defined between those two sites. For each transfer channel, it can be considered as urlcopy or srmcopy. For each transfer channel, there are several adjustable parameters to tune performance. FTS CERN Castor DQ2 BNL dCache
FTS Configurations lcg03.usatlas.bnl.gov (alias fts.usatlas.bnl.gov) 2 3.4GHz 2GB RAM LCG box. FTS v1.5 Myproxy service (fts01.usatlas.bnl.gov, will be moved to different machine.) WEB service Interface to FTS services FileTransfer ChannelManagement Channel agents Individual directional channel services for two storage end points e.g. BNLDCACHE-UTASW2, RALSRM-BNLDCACHE VO agents VO specific tasks
Channel Management List of channels at BNL FTS All Tier 1 sites to BNL (SRMCOPY) e.g. CNAFSRM-BNLDCACHE All US Tier 2 sites to BNL (URLCOPY) e.g. UTASW2-BNLDCACHE (Two exceptions: UC and BU have also srm end points.) BNL to All US Tier 2 sites (URLCOPY) e.g. BNLDCACHE-UCTPGFTP Source STAR channels for All US Tier 2 sites (URLCOPY) e.g. STAR-BUATLASTIER2 Source STAR channels for BNL (URLCOPY) e.g. STAR-BNLDCACHE Destination STAR for MWT2 sites (URLCOPY) e.g. UCGFTP-STAR Destination STAR for BNL (URLCOPY) e.g. BNLDCACHE-STAR
Channel Functional Tests From all Tier1s to BNL channels have been tested and functional. To/From US Tier2 to BNL channels have been tested and functional. STAR-BNL channels have been tested for many non-US Tier2 sites. Lyon Tier2s and RAL Tier2s are tested and functional. FZK Tier2 are only functional for some sites. As DQ2 server is deployed, more channels are also tested via DQ2 transfer request. The functional test should be conducted once a day for every sites and report the problem to sites’ managers?
Channel Performance and Tuning List of available tuning parameters for each channels Band width: maximum band-width for channel Number of concurrent files: number of concurrent file transfer per channel Number of streams: number of TCP streams per one transfer TCP block size: tcp block size for stream Transfer timeout: timeout for one file transfer Number of retry: number of retry per request.
Channel Performance and Tuning continue… Tuning/Benchmarking has been done from BNL dCache to UC and UTA storages. Some parameters are not compatible for each sites For example, TCP block size of 1MB may cause the crash to UTA storage while it is appropriate for UC. BNL-UTA channel ~ 10 MB/s BNL-UC channel ~ at least 40 MB/s It could be done to all sites if necessary???
FTS Experience Current FTS server is quite capable for supporting the current level of usage. If necessary, each channel services can be separated to different machines. The retry intervals for FTS channels need to be investigated. When a FTS transfer fails with a lot of waiting queues, it is observed that the next queue starts too quickly, causing the DoS like attack to SRM server. This has been observed in a last few weeks after the global installation/operation of DQ2 services. For the transfer from BNL to ASGC, ASGC SRM was not working. This caused high rate of SRM request to BNL SRM server (dcsrm), making BNL SRM server almost inoperative. The rate of SRM requests were about 10 Hz from ASGC. The fact that the FTS channel used for BNL-ASGC had 50 concurrent channels made things worse. Similar incidents have been observed due to PIC SRM problem and BNL own SRM problem.
FTS Experience continue… Possible STAR channels problems At the time of ASGC-BNL problem, the transfer was using STAR-ASGC channel. This means that stopping this channel for BNL could have force ASGC to stop entire transfer to ASGC. (ASGC has created ASGC-BNL channel since that time.) It is highly preferable for Tier1 storage managers to be channel managers of all transfers in/out of own Tier1 storage. Proposed destination STAR channel will make management even more difficult in the case of DoS like condition since the transfer can be mediated by any FTS severs. (Stopping/slowing via firewall works, but turning a FTS channel off/slowing that channel is far more simple.) One user can dominate the entire transfer of one end storage end point if that site has only STAR-channel. No tuning is available for each site. Accounting/statistic of FTS transfers are not available currently.