BNL FTS services Hironori Ito.

Slides:



Advertisements
Similar presentations
Service Challenge Meeting “Service Challenge 2 update” James Casey, IT-GD, CERN IN2P3, Lyon, 15 March 2005.
Advertisements

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
FZU participation in the Tier0 test CERN August 3, 2006.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,
CCRC08-1 report WLCG Workshop, April KorsBos, ATLAS/NIKHEF/CERN.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
BNL DDM Status Report Hironori Ito Brookhaven National Laboratory.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
Testing the UK Tier 2 Data Storage and Transfer Infrastructure C. Brew (RAL) Y. Coppens (Birmingham), G. Cowen (Edinburgh) & J. Ferguson (Glasgow) 9-13.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Local Monitoring at SARA Ron Trompert SARA. Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs.
EGI-InSPIRE EGI-InSPIRE RI DDM Site Services winter release Fernando H. Barreiro Megino (IT-ES-VOS) ATLAS SW&C Week November
Alberto Aimar CERN – LCG1 Reliability Reports – May 2007
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
WLCG Service Report ~~~ WLCG Management Board, 16 th December 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
LCG Storage Workshop “Service Challenge 2 Review” James Casey, IT-GD, CERN CERN, 5th April 2005.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Distributed Data Management Miguel Branco ATLAS Tier-0 exercise Export report.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
WLCG Service Report ~~~ WLCG Management Board, 20 th January 2009.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
INFSO-RI Enabling Grids for E-sciencE FTS Administrators Tutorial for Tier-2s Paolo Badino
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
US ATLAS DDM Operations Alexei Klimentov, BNL US ATLAS Tier-2 Workshop UCSD, Mar 8 th 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Service Patricia Mendez Lorenzo CERN (IT-GD) / CNAF Tier 2 INFN - SC3 Meeting.
Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.
Servizi core INFN Grid presso il CNAF: setup attuale
a brief summary for users
Road map SC3 preparation
WLCG IPv6 deployment strategy
Computing Operations Roadmap
ATLAS Use and Experience of FTS
Status of the SRM 2.2 MoU extension
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
James Casey, IT-GD, CERN CERN, 5th September 2005
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Data Challenge with the Grid in ATLAS
SRM2 Migration Strategy
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
WLCG Management Board, 16th July 2013
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Brookhaven National Laboratory Storage service Group Hironori Ito
WLCG Service Report 5th – 18th July
Introduction to FTS Paolo Tedesco White Area Lecture 3 October 2008
Data Management cluster summary
lundi 25 février 2019 FTS configuration
FTS Issue in Beijing Erming PEI 2010/06/18.
IPv6 update Duncan Rand Imperial College London
Dirk Duellmann ~~~ WLCG Management Board, 27th July 2010
The LHCb Computing Data Challenge DC06
Presentation transcript:

BNL FTS services Hironori Ito

File Transfer Service Lowest level data transfer/movement service for gLite. Provides asynchronous operation to users. The command ( glite-transfer-submit ) returns the prompt to a user immediately. It uses TOMCAT/JAVA web service with oracle database for transfer catalog. All storage sites must be recognized (services.xml) by FTS server. For transfer between two storage sites, a channel must be defined between those two sites. For each transfer channel, it can be considered as urlcopy or srmcopy. For each transfer channel, there are several adjustable parameters to tune performance. FTS CERN Castor DQ2 BNL dCache

FTS Configurations lcg03.usatlas.bnl.gov (alias fts.usatlas.bnl.gov) 2 3.4GHz 2GB RAM LCG box. FTS v1.5 Myproxy service (fts01.usatlas.bnl.gov, will be moved to different machine.) WEB service Interface to FTS services FileTransfer ChannelManagement Channel agents Individual directional channel services for two storage end points e.g. BNLDCACHE-UTASW2, RALSRM-BNLDCACHE VO agents VO specific tasks

Channel Management List of channels at BNL FTS All Tier 1 sites to BNL (SRMCOPY) e.g. CNAFSRM-BNLDCACHE All US Tier 2 sites to BNL (URLCOPY) e.g. UTASW2-BNLDCACHE (Two exceptions: UC and BU have also srm end points.) BNL to All US Tier 2 sites (URLCOPY) e.g. BNLDCACHE-UCTPGFTP Source STAR channels for All US Tier 2 sites (URLCOPY) e.g. STAR-BUATLASTIER2 Source STAR channels for BNL (URLCOPY) e.g. STAR-BNLDCACHE Destination STAR for MWT2 sites (URLCOPY) e.g. UCGFTP-STAR Destination STAR for BNL (URLCOPY) e.g. BNLDCACHE-STAR

Channel Functional Tests From all Tier1s to BNL channels have been tested and functional. To/From US Tier2 to BNL channels have been tested and functional. STAR-BNL channels have been tested for many non-US Tier2 sites. Lyon Tier2s and RAL Tier2s are tested and functional. FZK Tier2 are only functional for some sites. As DQ2 server is deployed, more channels are also tested via DQ2 transfer request. The functional test should be conducted once a day for every sites and report the problem to sites’ managers?

Channel Performance and Tuning List of available tuning parameters for each channels Band width: maximum band-width for channel Number of concurrent files: number of concurrent file transfer per channel Number of streams: number of TCP streams per one transfer TCP block size: tcp block size for stream Transfer timeout: timeout for one file transfer Number of retry: number of retry per request.

Channel Performance and Tuning continue… Tuning/Benchmarking has been done from BNL dCache to UC and UTA storages. Some parameters are not compatible for each sites For example, TCP block size of 1MB may cause the crash to UTA storage while it is appropriate for UC. BNL-UTA channel ~ 10 MB/s BNL-UC channel ~ at least 40 MB/s It could be done to all sites if necessary???

FTS Experience Current FTS server is quite capable for supporting the current level of usage. If necessary, each channel services can be separated to different machines. The retry intervals for FTS channels need to be investigated. When a FTS transfer fails with a lot of waiting queues, it is observed that the next queue starts too quickly, causing the DoS like attack to SRM server. This has been observed in a last few weeks after the global installation/operation of DQ2 services. For the transfer from BNL to ASGC, ASGC SRM was not working. This caused high rate of SRM request to BNL SRM server (dcsrm), making BNL SRM server almost inoperative. The rate of SRM requests were about 10 Hz from ASGC. The fact that the FTS channel used for BNL-ASGC had 50 concurrent channels made things worse. Similar incidents have been observed due to PIC SRM problem and BNL own SRM problem.

FTS Experience continue… Possible STAR channels problems At the time of ASGC-BNL problem, the transfer was using STAR-ASGC channel. This means that stopping this channel for BNL could have force ASGC to stop entire transfer to ASGC. (ASGC has created ASGC-BNL channel since that time.) It is highly preferable for Tier1 storage managers to be channel managers of all transfers in/out of own Tier1 storage. Proposed destination STAR channel will make management even more difficult in the case of DoS like condition since the transfer can be mediated by any FTS severs. (Stopping/slowing via firewall works, but turning a FTS channel off/slowing that channel is far more simple.) One user can dominate the entire transfer of one end storage end point if that site has only STAR-channel. No tuning is available for each site. Accounting/statistic of FTS transfers are not available currently.