Download presentation
Presentation is loading. Please wait.
Published byMarilynn Dawson Modified over 9 years ago
1
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven National Lab
2
2 Services at BNL FTS (version 2.3.1) client + server and its backend Oracle and myproxy servers. FTS does the job of reliable file transfer from CERN to BNL. Most Functionalities were implemented. It became reliable in controlling data transfer after several rounds of redeployments for bug fixing: short timeout value causing excessive failures, incompatibility with dCache/SRM. Does not support DIRECT data transfer between CERN to BNL dCache data pool server (dCache SRM third party data transfer). The data transfers actually go through a few dCache GridFTP door nodes at BNL, which presents scalability issue. We had to move these door nodes to non-blocking networking ports to distribute traffic. Both BNL and RAL discovered that the number of streams per file could not be more than 10, (a bug)? Networking to CERN: Network for dCache was upgraded to 2*1Gpbs around June. Shared link with Long Round Trip Time: >140 ms, while RTT for Europe sites to CERN is about 20ms. Occasional packet losses were discovered along the path between BNL-CERN. 1.5 G bps aggregated bandwidth observed by iperf with 160 TCP streams.
3
3 Services at BNL dCache/SRM ( V1.6.5-2, with SRM 1.1 interface, Total 332 (3.06 Ghz, 2GByte Memory and 3 SCSI 140 Gbyte drives) nodes with about 170 TB disks, Multiple GridFTP, SRM, and dCap doors ): USATLAS production dCache system. q All nodes have Scientific Linux 3 with XFS module compiled. q Experienced High load on write pool serves during large amount data transfer. Was fixed by replacing the EXT file systems with XFS file system. q Core server crashed once. Reason was identified and fixed. q Small buffer space (1.0TB) for data written into dCache system. q dCache can now deliver up to 200MB/second for input/output (limited by network speed.) LFC (1.3.4) client and server was installed at BNL Replica Catalog Server. Server was installed. Tested the basic functionalities: lfc-ls, lfc-mkdir etc. Will populate LFC with the entries in our production globus RLS server. ATLAS VO Box (DDM + LCG VO box) was deployed at BNL.
4
4 Read pools DCap doors SRM door doors GridFTP doors doors Control Channel write pools Data Channel DCap Clients Pnfs ManagerPool Manager HPSS GridFTP Clientsd SRM Clients Oak Ridge Batch system DCache System BNL dCache Configuration
5
5 CERN Storage System
6
6 Data Transfer from CERN to BNL (ATLAS Tier 1)
7
7 Transfer Plots Castor2 LSF plugin problem plugin problem
8
8 BNL SC3 data transfer All data actually are routed through GridFtp doors SC3 Monitored at BNL
9
9 Data Transfer Status BNL stablized FTS data transfer with high successful completion rate, as shown in the left image. We have attained150 MB/second rate for about one hour with large number (> 50) of parallel file transfers. CERN FTS had the limit of 50 files per channel, which is not enough to fill up CERN BNL data channel.
10
10 Final Data Transfer Reports
11
11 Lessons Learned From SC2 Four file transfer servers with 1 Gigabit WAN network connection to CERN. Meet the performance/throughput challenges (70~80MB/second disk to disk). Enabled data transfer between dCache/SRM and CERN SRM at openlab Design our own script to control SRM data transfer. Enabled data transfer between BNL GridFtp servers and CERN openlab GridFtp servers controlled by Radiant software. Many components need to be tuned 250 ms RRT, high packet dropping rate, has to use multiple TCP streams and multiple file transfers to fill up network pipe. Sluggish parallel file I/O with EXT2/EXT3, lot of processes with “D” state, more file streams, worse the performance on file system. Slight improvement with XFS system. Still need to tune file system parameter
12
12 Some Issues Service Challenge also challenges resource: Tuned network pipes, optimized the configuration and performance of BNL production dCache system and its associate OS, file systems, Required more than one staff’s involvements to stabilize the newly deployed FTS, dCache and network infrastructure. Staffing level decreased as services became stable. Limited Resources are shared by experiments and users. At CERN, SC3 infrastructure are shared by multiple Tier 1 sites. Due to the heterogeneous nature of Tier 1 sites, data transfer for each site should be optimized non-uniformly based on site’s various aspects: i.e. network RRT, packet loss rates, experiment requirements etc. At BNL, network and dCache are also used by production users. Need to closely monitor the SRM and network to avoid impacting production activities. At CERN, James Casey alone handles answering email, setting up the system, reporting problems and running data transfer. He provides 7/16 support himself. How to scale to 7/24 production support/production center? How to handle the time difference between US and CERN? CERN Support Phone (Tried once, but the operator did not speak English)
13
13 What have been done. SC3 Tier 2 Data Transfer Data were transferred to three selected Tier 2 sites. SC3 Tape Transfer Tape Data Transfer was stablized at 60 MB/second with loaned tape resources. Met the goal defined at the beginning of Service Challenge. Full Chain of data transfer was exercised.
14
14 ATLAS SC3 Service Phase
15
15 ATLAS SC3 Service Phase goals Exercise ATLAS data flow Integration of data flow with the ATLAS Production System Tier-0 exercise More information: https://uimon.cern.ch/twiki/bin/view/Atlas/DDMSc3
16
16 ATLAS-SC3 Tier0 Quasi-RAW data generated at CERN and reconstruction jobs run at CERN No data transferred from the pit to the computer centre “Raw data” and the reconstructed ESD and AOD data are replicated to Tier 1 sites using agents on the VO Boxes at each site. Exercising use of CERN infrastructure … Castor 2, LSF and the LCG Grid middleware … FTS, LFC, VO Boxes Distributed Data Management (DDM) software
17
17 ATLAS Tier-0 EF CPU T1 castor RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s
18
18 ATLAS-SC3 Tier-0 Main goal is a 10% exercise Reconstruct “10%” of the number of events ATLAS will get in 2007 using “10%” of the full resources that will be needed at that time Tier-0 ~300 kSI2k “EF” to CASTOR: 32 MB/s Disk to tape: 44 MB/s (32 for raw and 12 for ESD+AOD) Disk to WN: 34 MB/s T0 to each T1: 72 MB/s 3.8 TB to “tape” per day Tier-1 (in average): ~8500 files per day At a rate of ~72 MB/s
19
19 24h before 4 day intervention 29/10 - 1/11 We achieved quite good rate in the testing phase (sustained 20-30 MB/s to three sites (PIC, BNL and CNAF). ATLAS DDM Monitoring
20
20 Data Distribution Use a generated “dataset” Contains 6035 files (3 TB) and we tried to replicate it to BNL, CNAF and PIC. BNL Data Transfer is under way. PIC: 3600 files copied and registered 2195 ‘failed replication’ after 5 retries by us x 3 FTS retries Problem under investigation 205 ‘assigned’ - still waiting to be copied 31 ‘validation failed’ since SE is down 4 ‘no replicas found’ LFC connection error CNAF: 5932 files copied and registered 89 ‘failed replication’ 14 ‘no replicas found’
21
21 General view of SC3 When everything is running smoothly ATLAS get good results The middleware (FTS) is stable but there were still lots of compatibility issues: FTS does not work new version of dCache/SRM (version 1.3). ATLAS DDM software dependencies can also cause problems when sites upgrade middleware not managed to exhaust anything production s/w; LCG m/w) Still far from concluding the exercise and not running stably in any way. Exercise will continue adding new sites.
22
22 SC3 re-run TWe will upgrade BNL dCache OS to RHEL 4 and dCache to 1.6.6 starting Dec/07/2005. TWe will add few more dCache pool nodes if the software upgrades did not meet our expectation. TFTS should be upgraded if the necessary fix to prevent channel blocking is ready before new year. TLCG BDII needs to report status of dCache, FTS. (before Christmas). TWe would like to schedule a test period at the Beginning of January for stability and scalability. TEverything should be ready by January 9. TRe-run will start at January 16.
23
23 BNL Service Challenge 4 Plan Several steps needed to set-up hardware or service (ex: choose, procure, start install, end install, make operational), starting at January, ending before the beginning of March. LAN, Tape system. FTS, LFC, DDM, LCG VO boxes and other base line sevices will be maintained with agreed SLA and supported by USATLAS VO. Dedicated LHC dCache/SRM write pool which provides up to 17 Tera bytes storage (24 hour worth data). (to be done synchronized with LAN, WAN). Deploy and strengthen necessary monitoring infrastructure based on ganglia, nagios, Monalisa and LCG-RGMA. (February). Drill for service integration (March) Simulate network failure, server crashes, and how support center will respond to the issues. Tier 0/Tier 1 End-to-End high performance network operational: bandwidth, stability and performance.
24
24 BNL Service Challenge 4 Plan April/2006, establish the stable data transfer in the speed of 200M Bytes/second to disks and 200 M Bytes/second to tape. May/2006, disk and computing farm upgrading. July/01/2006: stable data transfer driven by ATLAS production system and ATLAS data management infrastructure between T0~T1 (200M Bytes/second) and provide services to satisfy SLA (Service level agreement). Details of involving Tier 2 are in planning too. (February and March) Tier 2 dCache: UC dCache needs to be stabilize and operational in February, UTA and BU need to have dCache in March. Base line client tools should be deployed at Tier 2 centers. Base line services should support Tier 1~Tier2 data transfer before SC4 starts.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.