BNL Grid Projects. 2 OutLine  Network/dCache  USATLAS Tier 1 Network Design  TeraPaths  Service Challenge 3  Service Challenge 4 Planning  USATLS.

Slides:

Advertisements

Similar presentations

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.

Advertisements

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.

TeraPaths : A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research USATLAS Tier 1 & Tier 2 Network Planning Meeting December.

LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.

TeraPaths: A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research Bruce Gibbard & Dantong Yu High-Performance Network Research.

SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.

LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.

BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.

Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.

OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

BNL ATLAS Database service update Yuri Smirnov, Iris Wu BNL, USA LCG Database Deployment and Persistency Workshop, CERN, Geneva October 17-19, 2005.

BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.

BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.

INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

U.S. ATLAS Computing Facilities Bruce G. Gibbard GDB Meeting 16 March 2005.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.

Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,

Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.

BNL Grid Projects. 2 OutLine  Network/dCache  USATLAS Tier 1 Network Design  TeraPaths  Service Challenge 3  Service Challenge 4 Planning  USATLS.

Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.

8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Mario Reale – GARR NetJobs: Network Monitoring Using Grid Jobs.

CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.

TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.

LCG Service Challenge: Planning and Milestones

U.S. ATLAS Grid Production Experience

James Casey, IT-GD, CERN CERN, 5th September 2005

Service Challenge 3 CERN

Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.

Olof Bärring LCG-LHCC Review, 22nd September 2008

Presentation transcript:

BNL Grid Projects

2 OutLine  Network/dCache  USATLAS Tier 1 Network Design  TeraPaths  Service Challenge 3  Service Challenge 4 Planning  USATLS OSG Configuration  LCG 2 Status  3D (Distributed Deployment of Database) Project  PHENIX Data Transfer (Non-USATLAS)

Network/dCache

4 Current Network Configuration  How is (was) our network configured for SC3? What performance did we observe? What adjustments did we make? How significant is (or has been) the firewall? How many servers of what kind did we use for dCache?

5 Network in the Past  The performance for SC3 throughput: the peak performance for several hour is 150M bytes/second. The average data transfer rate is 120M Bytes. During SC3 service phase, we re-installed the dCache system and tuned the system during service phase, we experienced some data transfer problem. But we can still maintain the data transfer rate around 100M byte/second for several hours. This time  Several adjustments that we made after SC throughput phase:  September: The disk dCache Write Pool was changed from RAID 0 to RAID 5 to add redundancy to the precious data. The file system was switched to EXT3 due to that a XFS bug crashed the RAID5 based disk. The performance was degraded for past several weeks.  December: we upgrade the OS system of dCache to RHEL 4.0 and redeploy XFS in dCache write pool nodes.  We constantly hit the performance bottleneck of 1 Gbps. We found that there was excessive traffic between door nodes (SW9) and pool nodes (SW7). The traffic was already put on aggregated ethernet channels (3*1Gbps) between two ATLAS switches. We found that the hashing algorithm always sent traffic to one physical network fiber, therefore led to in-balanced load distribution.  We finally relocated all dCache servers into one network switch to avoid inter-switch traffic.  We did not find any performance issues associated with firewall, but firewall indeed drops some packages between 2 ATLAS subnets ( and ), which prevents the job submission from ATLAS grid gatekeeper to the condor pool. This problem does not affect SC3 data transfer to BNL dCache system.

6 Current dCache Configuration  dCache consists of write pool nodes, read pool nodes and core services: (courtesy of Zhenping)  PNFS Core server node 1 (dedicated) RHEL 4.0, DELL 3.0Ghz  SRM server (door) node 1 (dedicated) RHEL 4.0, DELL 3.0Ghz  GridFTP and DCAP Core server nodes (doors) 4 (dedicated) RHEL 4 Dell 3.0Ghz  Internal/External Read pool nodes 322 (shared) 145 TB SL3, mix of Penguin 3.0 and Dell 3.4 Ghz.  Internal/External write pool nodes 8 (dedicated) 1 TB, RHEL 4.0, Dell 3.0GHz  Total TB

7 Read pools DCap doors SRM door doors GridFTP doors doors Control Channel write pools Data Channel DCap Clients Pnfs ManagerPool Manager HPSS GridFTP Clientsd SRM Clients Oak Ridge Batch system DCache System One BNL dCache Instance

8 Future Network/dCache Plan  The design of USATLAS is in the following slides.  The network bandwidth to ACF will be 20Gpbs redundant network bandwidth to external. BNL to CERN connection is 10Gpbs.  dCache should be expanded to accommodate LHC data. We tried to avoid mixing of LHC data traffic with the remaining ATLAS production traffic. Either we created a dedicated dCache instance, or we dedicate fraction of dCache resource (separated dCache write pool group) to LHC data transfer. Zhenping and I prefer to have a dedicated dCache instance since the number of nodes in BNL dCache managed by the current dCache technology is running into the limitation. Anyway, in the next several month, LHC fraction of dCache should be able to handle 200MB/seconds, one day worth of disk space (16.5 Tera). We need to have 20TB (20% will be used as redundancy in RAID 5) local disk space.  10 Nodes, each with 2TB local disks.

USATLAS Tier 1 Network Design

10 Current Unsolved/Unsettled Issues  LHCOPN did not address Tier 2 sites issues. What is the policy on trusting non-US Tier 2 sites? We simplify the issue and treat these non- US Tier 2 sites as regular internet end points.  LHCOPN include T0, All T1 sites and their existing connection: All T0- BNL and other ATLAS T1s-BNL traffic will be treated as LHCOPN traffic and they could share network resource provided by US LHCnet.  If one Tier 1 goes down, its LHC traffic will be routed via another Tier 1 and use fraction of network resource owned by the Tier 1. This type of traffic does not affect BNL internal network design. The AUP should be negotiated between Tier 1 sites. It is not done yet.

11 User Scenarios 1.LHC data is transferred via LHCOPN from CERN to BNL. Data is transferred into dCache, then migrated into HPSS. A small fraction of data will be read immediately by users at Tier 2s. (Volume_{LHC}) 2.All of Tier 2s upload their simulation/analysis data to Tier 1 dCache site. The data will be immediately replicated to the dCache cluster and migrated into HPSS. (Volume_{Tier 2}) 3.Physicists at Tier 3 read data (Input data) from Tier 1 dCache Read Pool, run analysis/transformation on their home institution, upload the result data to Tier 1 dCache write pool. Then the results will be immediately replicated into dCache read pool and archived into HPSS system. (Volume_{Physicists} = Volume_{Inputs}+Volumes_{Results}) 4.BNL owns fraction of ATLAS reconstruction data, ESD, AOD/Tag data. This data will be read from dCache and send to other Tier 1 sites. Similarly, BNL needs to read the same type of data from other Tier 1. (Volume_{T1}=Volume_{in}+Volume_{out}. 5.European Tier 2, 2+ sites needs to read data from BNL, the traffic will be treated as regular internet Traffic.  The total data volumes that we put on network links and backplane:  Volume_{Total}= 2*Volume_{LHC}+3*Volume_{Tier 2}+ Volume_{Inputs}+3*Volumes_{Results}+Volume_{T1}+ Volume_{Others}.

12 Requirements  dCache brings the subsystems (grid and computing cluster) in ACF even closer, in which computing cluster serves as data storage system). Data will be constantly replicated among them. Any connection restriction (firewall conduits) among them will potentially impact the functionality and performance.  We should isolate the internal ATLAS traffic within ATLAS network domain.  We needs to optimize the network traffic volume between BNL campus/ACF.  What fraction of data are we going to filtered by firewall? Item 1, 2, 3…… Any traffic that we plan to firewall, then we might double or triple the tax on the link between BNL Campus/ACF.  Any operation issues in BNL campus network should not impact the ACF internal network traffic between different USATLAS subnets.  We should not overload the BNL firewall with large data volumes of physics data.

13 USATLAS/BNL LAN

14 dCacheHPSS ACF farm ATLAS DL2 CERN ACLPolicyRouting Internet/Analysis CERN LHCOPN traffic Internet Traffic All traffic between any two hosts in ACF, routed or switched. Tier 2s US ATLAS Tier 2s Option 1 Tier 1s

15 ATLAS DL2 CERN ACLPolicyRouting Internet LHC/SC4dCache GridHPSS ACF farm Tier 2s CERN LHCOPN traffic Internet Traffic LHC data to HPSS is internal to ATLAS. It never leaves ATLAS router. USATLAS Tier 2s Option 2 Tier 1s

16 ATLAS DL2 CERN ACL Policy Routing Internet Single Network cable LHCdCache GridHPSS ACF farm Tier 2s CERN LHCOPN traffic Internet Traffic LHC data to HPSS is external to ATLAS. It leaves ATLAS router. USATLAS Tier 2s Option 3 Tier 1s

17 ATLAS DL2 CERN Internet CERN LHCOPN traffic Internet Traffic LHC data to HPSS is routed via DL2, The traffic needs to leaves ATLAS router. LHCdCache HPSS ACF farm Option 4  Disadvantage: All ATLAS traffic may double, or triple tax the BNL/USATLAS link.  Put All Traffic router via DL2.  Network Management is not easy?  Firewall becomes the bottleneck.  Does not utilize ATLAS routing capability. GridSystem Tier 2s USATLAS Tier2 Taffic Tier 1s

TeraPaths

19 QoS/MPLS  QoS/MPLS technology can be manually deployed into BNL campus/USATLAS network now. The behavior is well understood and LAN QoS expertise are handy now.  The TeraPaths software system is under intensive re- development to approach product quality. It will be ready by the end of February. We will need one month (March) to verify and deploy it into our production network infrastructure. When SC4 starts, we can quantitatively manage BNL LAN to send and receive data. The following month will be focusing on deploying the software package into Tier 2 sites participating SC4.

20  This Project Investigates the Integration and Use of LAN QoS and MPLS Based Differentiated Network Services in the ATLAS Data Intensive Distributed Computing Environment As a Way to Manage the Network As a Critical Resource.  The Collaboration Includes BNL and University of Michigan, and Other Collaborators From OSCAR (ESNET), Lambda Station (FNAL), and TeraPaths monitoring project (SLAC). What Is TeraPaths?

21 TeraPaths System Architecture Site A (initiator) Site B (remote) WAN web services WAN monitoring WAN web services route planner scheduler user manager site monitor router manager hardware drivers route planner scheduler user manager site monitor router manager Web page APIs Cmd line QoS requests

22 TeraPaths SC2005  Two bbcp periodically copied data from BNL disk to UMICH disk. One used class 2 traffic (200Mbps) and another used class EF (expedite forwarding: 400Mbps). Iperf sent out background traffic. The allocated network resource is 800Mbps.  We could quantitatively control shared network resource for mission critical tasks.  Verified the Effectiveness of MPLS/LAN QoS and Its Impact to prioritized traffic, background best effort traffic, and overall Network Performance.

Service Challenge 3

24 What was the SC3 configuration, hardware, software, middleware we used?

25 Services at BNL  FTS client + server (FTS 1.3) and its backend Oracle and myproxy servers.  FTS does the job of reliable file transfer from CERN to BNL.  Most Functionalities were implemented. It became reliable in controlling data transfer after several rounds of redeployments for bug fixing: short timeout value causing excessive failures, incompatibility with dCache/SRM.  Does not support DIRECT data transfer between CERN to BNL dCache data pool server (dCache SRM third party data transfer). The data transfers actually go through a few dCache GridFTP door nodes at BNL, which presents scalability issue. We had to move these door nodes to non-blocking networking ports to distribute traffic.  Both BNL and RAL discovered that the number of streams per file could not be more than 10, (a bug)?  Networking to CERN:  Network for dCache was upgraded to 2*1Gpbs around June.  Shared link with Long Round Trip Time: >140 ms, while RTT for Europe sites to CERN is about 20ms.  Occasional packet losses were discovered along the path between BNL-CERN.  1.5 G bps aggregated bandwidth observed by iperf with 160 TCP streams.

26 Services Used at BNL SC3  dCache/SRM ( V , with SRM 1.3 interface). The detailed configuration can be found in Slide 6. q All read pool nodes have Scientific Linux 3 with XFS module compiled. q Experienced High load on write pool serves during large amount data transfer. Was fixed by replacing the EXT file systems with XFS file system. q Core server crashed once. Reason was identified and fixed. q Small buffer space (1.0TB) for data written into dCache system. q dCache can now deliver up to 200MB/second for input/output (limited by network speed.)  LFC (1.3.4) client and server was installed at BNL Replica Catalog Server.  Server was installed. Tested the basic functionalities: lfc-ls, lfc-mkdir etc.  Will populate LFC with the entries in our production globus RLS server.  ATLAS VO Box (DDM + LCG VO box) was deployed at BNL.  Two Instances of Distributed Data Management (DDM) software (DQ2) were deployed at BNL, one for Panda Production and one for SC3 service phase.

27 how did SC3 infrastructure evolve?  FTS was upgraded from 1.2 to 1.3.  dCache was upgraded from to (Dec/7/2005).  Write Pool File System was migrated from EXT3 to XFS before Service challenge 3 throughput phase. After SC3 throughput phase, we migrated the underlying disk from RAID0 to RAID 5 for better reliability. But it triggered the XFS file system bug when using RAID 5 disk and crashed server. We had to switch back to EXT3 file system. It fixed the bug, but significantly reduced the performance. The recent OS upgrade on dCache write pool and core servers alleviated the XFS bug (did not fix it), we migrated it back to XFS for better performance.  dCache software on read pool was upgraded as well. OS in Read Pool Nodes did not change after May/June Upgrade.

28 BNL SC3 data transfer All data actually are routed through GridFtp doors SC3 Monitored at BNL and CERN are consistent.

29 Data Transfer Status  BNL stablized FTS data transfer with high successful completion rate, as shown in the left image during Throughput Phase.  We have attained150 MB/second rate for about one hour with large number (> 50) of parallel file transfers During SC3 throughput Phase.

30 Final SC3 Throughput Data Transfer Results

31 Lessons Learned From SC2  Four file transfer servers with 1 Gigabit WAN network connection to CERN.  Meet the performance/throughput challenges (70~80MB/second disk to disk).  Enabled data transfer between dCache/SRM and CERN SRM at openlab  Design our own script to control SRM data transfer.  Enabled data transfer between BNL GridFtp servers and CERN openlab GridFtp servers controlled by Radiant software.  Many components need to be tuned  Long Round Trip Time, high packet dropping rate, has to use multiple TCP streams and multiple file transfers to fill up network pipe.  Sluggish parallel file I/O with EXT2/EXT3, lot of processes with I/O wait state, more file streams, worse the performance on file system.  Slight improvement with XFS system. Still need to tune file system parameter

32 Some Issues During SC3 Throughput Phase  Service Challenge also challenges resource:  Tuned network pipes, optimized the configuration and performance of BNL production dCache system and its associate OS, file systems,  Required more than one staff’s involvements to stabilize the newly deployed FTS, dCache and network infrastructure.  Staffing level decreased as services became stable.  Limited Resources are shared by experiments and users.  At CERN, SC3 infrastructure are shared by multiple Tier 1 sites.  Due to the heterogeneous nature of Tier 1 sites, data transfer for each site should be optimized non-uniformly based on site’s various aspects: i.e. network RRT, packet loss rates, experiment requirements etc.  At BNL, network and dCache are also used by production users.  Need to closely monitor the SRM and network to avoid impacting production activities.  At CERN, James Casey alone handles answering , setting up the system, reporting problems and running data transfer. He provides 7/16 support himself.  How to scale to 7/24 production support/production center?  How to handle the time difference between US and CERN?  CERN Support Phone (Tried once, but the operator did not speak English)

33 Some Issues During SC3 Service Phase  FTS was changed from version 1.3 to 1.4 at CERN. FTS version 1.4 was supposed to support the direct third party transfer. When the direct data transfer into the pool without going through door was used, it could not handle the long wait, led to channel lockup. Therefore we had to switch to glite-url-copy which ad-hoc handles transferring into dCache.  dCache was constantly improved for better performance and reliability during past several month, reached a stable dCache recently.  SC3 service phase exposed several problems when it started. We took this opportunity to find the problems and fixed them. The performance and stability were continuously improved over the course of SC3. We was able to achieve high performance by the end of SC3. A good learning experience indeed.  SC Operation needs to be improved to timely problem reports.

34 What have been done.  SC3 Throughput Phase showed good data transfer bandwidth.  SC3 Tier 2 Data Transfer  Data were transferred to three selected Tier 2 sites.  SC3 Tape Transfer  Tape Data Transfer was stablized at 60 MB/second with loaned tape resources.  Met the goal defined at the beginning of Service Challenge.  Full Chain of data transfer was exercised.  SC3 Service Phase: we showed very good peak performance.

35 General view of SC3  When everything is running smoothly BNL got very good results 100M Byte/seconds  The middleware (FTS) is stable but there were still lots of compatibility issues:  FTS does not work effectively with the new version of dCache/SRM (version 1.3).  We had to turn off FTS controlled direct data transfer into dCache Pool since lots of time out errors completely blocked the data transfer channel.  We need to improve SC operation which included performance monitoring and timely problem reporting for preventing from deteriorating and quick fixing.  We fixed many dCache issues after its upgrade. We also tuned the dCache system to work under FTS/ATLAS DDM system (DQ2).  We achieved the best performance among the dCache sites which participated ATLAS SC3 service phase. 15 TB data was transferred to BNL. Sites using CASTOR SRM showed better performance.

SC3 re-run and SC4 Planning

37 SC3 re-run TWe upgraded BNL dCache core server OS to RHEL 4 and dCache to starting Dec/07/2005. TWe will add few more dCache pool nodes if the software upgrades did not meet our expectation. TFTS should be upgraded if the necessary fix to prevent channel blocking is ready before new year. TLCG BDII needs to report status of dCache, FTS. (before Christmas). TWe would like to schedule a test period at the Beginning of January for stability and scalability. TEverything should be ready by January 9. TRe-run will start at January 16.

38 What will our SC4 configuration look like, network, servers, software, etc?  The physical network location for SC4 is shown in Slide 15.  We subscribed two subnet to LHCOPN ( /24 and /23). The current dCache instance will be on these two subnet. The new dCache instance for LHC/SC4 will be in /24 exclusively).  10 dCache Write/Read Pool Servers.  4 Door servers (RAL already merged door nodes with pool nodes. We will evaluated whether it is doable in BNL).  2 core servers. (dCache PNFS manager and SRM server).  The newest dCache production release: dCache

39 BNL Service Challenge 4 Plan  Several steps needed to set-up hardware or service (ex: choose, procure, start install, end install, make operational), starting at January, ending before the beginning of March.  LAN, Tape system.  FTS, LFC, DDM, LCG VO boxes and other base line sevices will be maintained with agreed SLA and supported by USATLAS VO.  Dedicated LHC dCache/SRM write pool which provides up to 17 Tera bytes storage (24 hour worth data). (to be done synchronized with LAN, WAN).  Deploy and strengthen necessary monitoring infrastructure based on ganglia, nagios, Monalisa and LCG-RGMA. (February).  Drill for service integration (March)  Simulate network failure, server crashes, and how support center will respond to the issues.  Tier 0/Tier 1 End-to-End high performance network operational: bandwidth, stability and performance.

40 BNL Service Challenge 4 Plan  April/2006, establish the stable data transfer in the speed of 200M Bytes/second to disks and 200 M Bytes/second to tape.  May/2006, disk and computing farm upgrading.  July/01/2006: stable data transfer driven by ATLAS production system and ATLAS data management infrastructure between T0~T1 (200M Bytes/second) and provide services to satisfy SLA (Service level agreement).  Details of involving Tier 2 are in planning too. (February and March)  Tier 2 dCache: UC dCache needs to be stabilize and operational in February, UTA and BU need to have dCache in March.  Base line client tools should be deployed at Tier 2 centers.  Base line services should support Tier 1~Tier2 data transfer before SC4 starts.

3D project

42 Oracle part  Tie0 – Tie1  Oracle  Oracle streams replication  BNL joined to the 3D replicatoin testbed  Streams replication was setup between CERN and BNL successfully in Oct 2005  Several experiments foresee Oracle clusters for online systems  Focus on Oracle database clusters as main building block for Tie0 and Tie1  Propose to setup pre-production services for March and full service after 6 months deployment experience

43 BNL 3D Oracle Production Schedule  Dec 2005: h/w setup (Done)  Two nodes with 500GB fibre channel storage  Jan 2006: h/w acceptance tests, RAC(real application cluster) setup  March 2006: service starts  May 2006: service review ---> h/w defined for full production  September 2006: full database service in place

44 MySQL Database replication at BNL  Oracle – MySQL replication:  DataBase: ATLAS TAG DB  DB server at BNL: dbdevel2 (MySQL )  use case : Oracle CERN to MySQL BNL (push)  tool: Octopus replicator ( Java-based extraction, transformation and loading)  thanks to Julius Hrivnac (LAL,Orsay) and Kristo Karr (ANL) for successful collaboration  More details in Twiki:

45 MySQL Database replication at BNL  MySQL – MySQL replication:  DataBases:  Geometry DB ATLASDD  MySQL conditions DBs LArNBDC2 and LArIOVDC2  MySQL DB servers at BNL:  dbdevel1.usatlas.bnl.gov (MySQL )  db1.usatlas.bnl.gov (MySQL )  collected the first experience with CERN-BNL ATLAS DB replication  procedure using both mysqldump and on-line replication  Current versions correspond to most recent ATLAS production release

LCG 2 at BNl

47 Summary  LCG setup at BNL is partially functional. LCG-VO box was used in SC3. There is no technical difficulties/hurdles preventing the CE and SE from fully functional.  Deployed at mixed of hardware: Dell 3.0 Ghz, and some VA linux nodes: we deployed CE, RB, SE, Proxy server, Monitoring nodes (R-GMA), and a collection of worker node. Some systems are combined into a single server.

48 Progress and To Do  OS and LCG system installation and configuration is automatic, can be reinstalled on new hardware with 2 hours  Managed via RPM and updatable via a local YUM repositories which are automatically rebuilt from CERN and else where source.  GUMS controls LCG grid mapfile.  Site information is being published correctly, and some SFT (site functional tests) run from Operation CERN can complete successfully.  Still need to configure LCG to run condor at ATLAS pool.

BNL USATLAS Grid Testbed

50 Internet HPSS Submit Grid Jobs OSG Gatekeepers Disks RHIC/USATLA S Job scheduler NFS HPSS MOVER SRM/GridFtp SERVERS GridFtp Panasas BNL USATLAS OSG Configuration Grid Users Grid Job Requests Condor and dCache

PHENIX Data Transfer Activities

52 Courtesy of Y. Watanabe

53

54 Data Transfer to CCJ  2005 RHIC run ended on June 24, Above shows the last day of RHIC Run.  Total data transfer to CCJ (Computer Center in Japan) is 260 TB (polarized p+p raw data)  100% data transferred via WAN, Tool used here: GridFtp. No 747 involved.  Average Data Rate: 60~90MB/second, Peak Performance: 100 Mbytes/second recorded in Ganglia Plot! About 5TB/day! Courtesy of Y. Watanabe

55 Network Monitoring on NAT Box

56 Month and Year

57 Network Monitoring at Perimeter Router

58 Network Monitoring at CCJ, JAPAN

59 Our Role  Provide effective and efficient Network/Grid Solutions for Data Transfer.  Install Grid Tools on the PHENIX Buffer boxes.  Tune performance of network path along PHENIX Counting House/RCF/BNL LAN.  Install Ganglia monitoring tools for data transfer.  Diagnose problems and provide fix.  For future PHENIX data transfer, we continue to play these role. We will Integrate dCache/SRM into the future data transfer and automate the data transfer.  Ofer maintains the PHENIX dCache/SRM pools. He works on pilot transfer data from PHENIX dCache/SRM to CCJ.

60 Lessons Learned  Four monitor systems: BNL NAT ganglia, Router MRTG (Multi-Router Traffic Grapher), CCJ ganglia and Data Transfer Monitoring, caught errors in early stage.  EXT3 file system is not designed for high performance data transfer.  XFS has much better performance in disk I/O with high bandwidth, this experience was used in LHC service challenge 3 for ATLAS experime nt.  Broadcom BCM95703 copper gigabit network card has much less packet erros than Intel Pro1000.  Several ES-net/SINET network outages, traffic was rerouted to alternative paths. Problems were promptly discovered and resolved by on-call personnel and network engineers. Because of large disk cache at both ends, no data were lost due to network outages.