U.S. ATLAS Facilities Jim Shank Boston University (Danton Yu, Rob Gardner, Kaushik De, Torre Wenaus, others)

Slides:

Advertisements

Similar presentations

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Advertisements

Southwest Tier 2 Center Status Report U.S. ATLAS Tier 2 Workshop - Harvard Mark Sosebee for the SWT2 Center August 17, 2006.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.

Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

USATLAS Network/Storage and Load Testing Jay Packard Dantong Yu Brookhaven National Lab.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

BNL DDM Status Report Hironori Ito Brookhaven National Laboratory.

CERN Physics Database Services and Plans Maria Girone, CERN-IT

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.

BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.

ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,

Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.

ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

U.S. ATLAS Computing Facilities Bruce G. Gibbard GDB Meeting 16 March 2005.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.

ATLAS Midwest Tier2 University of Chicago Indiana University Rob Gardner Computation and Enrico Fermi Institutes University of Chicago WLCG Collaboration.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,

Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.

Database CNAF Barbara Martelli Rome, April 4 st 2006.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.

RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

Atlas Tier 3 Overview Doug Benjamin Duke University.

A Scalable and Resilient PanDA Service for Open Science Grid Dantong Yu Grid Group RHIC and US ATLAS Computing Facility.

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

U.S. ATLAS Grid Production Experience

U.S. ATLAS Tier 2 Computing Center

5th DOSAR Workshop Louisiana Tech University Sept. 27 – 28, 2007

Readiness of ATLAS Computing - A personal view

Southwest Tier 2.

Presentation transcript:

U.S. ATLAS Facilities Jim Shank Boston University (Danton Yu, Rob Gardner, Kaushik De, Torre Wenaus, others)

2 Outline  U.S. Facilities: Computing and storage.  SRM.  ATLAS Distributed Data Manamgement.  ATLAS Production Support.  Service Challenge and Data Challenges.  Problems to be discussed.

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 3 U.S. Facilities  Tier 1 – Brookhaven National Laboratory  Northeast Tier 2 – Boston University, Harvard University  Midwest Tier 2 – University of Chicago, Indiana University  Southwest Tier 2 – University of Texas at Arlington, Oklahoma University, Langston University  Western Tier 2 – SLAC + others  Great Lakes Tier 2 – University of Michigan, Michigan State University  Also a muon calibration center  All U.S. facilities use integrated Panda/DQ2 system for managed production and distributed analysis  All U.S. facility are dedicated to ATLAS, but run other VO jobs through the Open Science Grid.

4 US Facilities and Open Science Grid

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 5  Includes internationally managed ATLAS resources plus some capacity retained under US ATLAS control to support local analysis needs US ATLAS Capacities

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 6 Capacity Projections for US Tier 2’s  Totals include dedicated capacities committed to international ATLAS plus those retained under US control for local physicists

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 7 U.S. Sites Availability  Tier 1 is available 24x7 except for scheduled downtimes  Some Tier 2’s are available 24x7, others best effort basis  Past experience from ATLAS Computing System Commissioning (CSC) production  Lost production time ~1% for U.S. facilities combined in 2006  Panda recovery mechanism allows sites to operate independently for many hours in the event of short Tier 1 or Tier 0 downtime – jobs are recovered when services resume (for example, see spikes in walltime usage plots after each valley)  U.S. shift team coordinates all production (managed, DA…) and helps site by reporting and debugging problems

8 BNL Computing Facility Upgrade  Addition of 4.5 People  45% increase, focus on Data Management and Operation.  Equipment upgrade  CPU Farm: 700 kSI2k  1300 kSI2k (factor of ~2): 160 dual core dual cpu AMD nodes were added due to better power consumption than Intel Pre-woodcrest based on Xeon processor.  dCache managed Linux Farm mounted disk  140 TB  420 TB  Upgraded ATLAS LAN backbone to fully redundant 20 Gb/sec  Replaced 25 TB of obsolescing NFS served Fiber Channel / RAID disk with new equipment  Major mass storage upgrade, a new storage silo with 8 drives (2.6 PB capacity at ~400 MB/sec)

9 20 Gb/s NSF RAID (20 TB) HPSS Mass Storage System Gridftp (2 nodes / 0.8 TB local) HRM SRM (1 node) dCache SRM (1 node) Gridftp door (4 nodes) WAN 2x10 Gb/s LHC OPN VLAN 2 x 1 Gb/s 1 Gb/s Write Pool (10 nodes / 2.1 RAID5 TB) Farm Pool (474 nodes / 460 TB) 5 x 1 Gb/s Tier 1 VLANS 20 Gb/s 5 x 1 Gb/s dCache.... N x 1 Gb/s Gb/s Logical Connections BNL Tier 1 WAN Storage Interfaces and Logic View

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 10 Midwest Tier2 Hardware Profile Processors –72 Dual CPU, dual core AMD Opteron 285 (2.6 GHz): 154k SI2K Storage –500 GB local scratch –5 x 500GB Hardware RAID5 / node (2.5TB/node dCache pools) –65 TB dCache Edge servers –for dCache, DQ2, NFS (OSG, /home), mgt services Interconnect and 10G WAN connectivity –Cisco 6509/UC, Force10/IU; 10G blades (for four hosts, 2 at each site) Cluster management –Cyclades terminal servers for console logging –Ethernet accessible power distribution units for power management

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 11 Layout Dual role for worker nodes –Four processing cores –dCache R/W pool (2.5 TB) –500 GB scratch Edge servers –3 dCache services nodes dc1: gridFTP, dcap, SRM dc2: pnfs server, Postgres dc3: admin, gridFTP, dcap –DQ2 –OSG gatekeeper –Login Network –UC: Cisco, w/10G iWIRE to Starlight –IU: Force10, w/10G iLIGHT to Starlight Other services deployed: –OpenPBS, Ganglia, Nagios IU site same except all nodes on public network and Force10 switch

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 12 10G Network Tests Tests using griftpPRO using several hosts at each end Plots show copy rates ~200 MB/s IU to UC Another test UC to IU ~400 MB/s One 30 minute interval achieved 539 MB/s

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 13 Network Testing II 10 simultaneous transfers executed during each iteration (there were 256 iterations in total) was based on a bbftPRO file transfer command. Each transfer has used 10 parallel streams to transfer a 1.7 GB file. Each file transfer was performed between two different hosts, one at UC, the other at IU. Each host had a 1Gbs capable NIC Have not adjusted TCP window size, MTU limits, etc.

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 14 Tier2 Operations Model Developing an operations model of weekly shifts –one person designated as a primary shift operator to proactively monitor equipment, cluster services, and ATLAS jobs and DDM services. –A secondary person serves as backup Interface to ATLAS production system group through and RT trouble ticketing system Coverage is during normal working hours –with best effort on weekends and evenings One day a month scheduled downtime for maintenance

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 15 PanDA

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 16 CPU Usage/day For Successful Jobs ATLAS Computing System Commissioning (CSC) exercise

17 CPU Usage/day For Successful Jobs

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 18 CSC: What Worked Well  U.S. Production Shift Team  Strong coordination between Tier 1 and all Tier 2 sites (regularly scheduled meetings (weekly phone, face-face every 6 months)  Coordinated Tier 2 purchases (sometimes from same vendor), sharing of experience  Coordinated OSG deployment (phased for Tier 2’s if possible to avoid production interruption)  Coordinated DQ2 deployment – but all sites run independent DQ2 site services for robustness

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 19 CSC: What Needs Improvement  Data management  We are still experimenting with different storage solutions – dcache, IBRIX, GFS, NFS…  Not every site has SRM/DRM solution implemented yet  DDM operations and scalability issues  Central services  Database services at Tier 1  Scaling to 2008 capacities  Gradually ramping up in 2007

20 SRM Throughput Performance  During Service Challenge Throughput, we observed our system reach 250M byte/second for one day. (intensive write into BNL)  Intensive Monitoring and System Tuning. Experts run the system.  Many manual interventions and coordination between CERN, BNL, and Other Tier 1.  When coupled Service Challenge, data migration to HPSS and farm nodes, along with USATLAS production, the dCache system performance can sustain 120 M Byte/second. This is real life performance. (intensive reads+writes)  Problems:  Pnfs & SRM performance bottlenecks.  Linux SCSI driver/buffer cache cannot efficiently handle parallel read stream and write stream.  Exclusive read or write performance is good, but we see a 50% ~ 80% performance degradation when mixing read and write streams.  Software Raid, Linux Volume Manager, and file system affect disk I/O performance too, but are relatively minor compared to Linux kernel buffer cache problem.

21 Solutions to Problem  Filesystem: write pools ext3 -> xfs (More important on RHEL3, less improvement in RHEL4). Other sites verified this?  Tune Postgres Database on PNFS and SRM to improve performance (Postgres shm buffers, DB and core services split, HW RAID disk).  Reconfigure the Linux Kernel Buffer and Cache? One topic to be discussed  Avoid Mixing Read and Write Operations to disks  dCache has a central flushing system which alternates between data writing into dCache and data migrations into HPSS  Put SRM database to memory to improve transaction rate since currently SRM DB is transient, no history needs to be kept.  Multiple SRM servers (DNS Round Robin or IP Virtual Server)  Alternative OS to improve the disk I/O performance. BNL did intensive Sun Thumper Test (Ofer Rind) as shown in the following slides. Any site is interested in this?

22 SUN Thumper Test Results  150 clients sequentially reading 5 random 1.4G files.  Throughput is 350 MB/s for almost 1 hour, test is done for dccp, GridFtp shows the same results:   75 clients sequentially writing 3x1.4G files and 75 clients sequentially reading 4x1.4G randomly selected files.  dccp Throughput is 200 MB/s write & 100 MB/s read.  srmcp: Throughput is 100MB/s read & 170MB/s write.

23 USATLAS DDM Operation  USATLAS used DQ2 to manage production more than a year ago.  BNL physicists started using DQ2 to transfer data.  Four instances were deployed at BNL. Other Tier 2 has one or two instances.  Tier 1 added many customizations for DDM to clean up failed/stuck data transfer in FTS/dCache and invalid datasets and their registrations in LRC.  Running DQ2 still relies on DQ2 experts, which requires large DDM operation team. BNL has 2.5 FTE on DQ2 operation and maintenance.  The Reliability/stability of DDM needs to be improved for the ramping up data transfer. Intensive monitoring by DQ2 experts during nights and weekends.  New Operation page is being developed to ease the basic DQ2 operation, and involve regular RHIC/USATLAS operators to monitor and emergency recovery.

24 ATLAS Production Support at BNL  USATLAS production used high performance MySQL cluster.  Never underestimate users capacity to use up your DB.  Work with user interactively to develop insight into user requirement and then provide solution. Plan ahead!

25 MySQL DB Status

26 MySQL DB for Panda Production  Joint effort between ACF(T1) and USATLAS Computing group.  Memory based MySQL Cluster with 2 front end nodes and two memory base storage nodes.  User will experience DB slowness and instability with each “Lock time out”.  As far as query performance goes, here are the top 5 queries by time with query time before and after the addition of indexes. Lock time is is the order of 0.01s, v.s. 1~2 second before index. Server load from 2.25 to 1.5, CPU utilization has gone from 30% to 15 %. Table Op Code Tag Time: before after ======================================================================== filesTable4 UPDATE DBProxy.updateInFilesReturnPandaIDs <1 jobsArchived SELECT No tag. (PandaOverview.getErrors) Datasets SELECT DBProxy.queryDatasetWithMap filesTable4 SELECT DBproxy.countFilesWithMap jobsActive4 UPDATE DBProxy.getJobs (important one to speed up job dispatch) Table Op Code Tag Time: before after ======================================================================== filesTable4 UPDATE DBProxy.updateInFilesReturnPandaIDs <1 jobsArchived SELECT No tag. (PandaOverview.getErrors) Datasets SELECT DBProxy.queryDatasetWithMap filesTable4 SELECT DBproxy.countFilesWithMap jobsActive4 UPDATE DBProxy.getJobs (important one to speed up job dispatch)  Will continue to scale database performance with two orders of magnitude. Four options to further improve MySQL DB: 1)InnoDB engine. 2) Master and Slave architecture. 3) FroNtier Project, 4) Continue to optimize DB programming in Panda Production System, 5) Oracle RAC

27 WLCG Service Challenge 4 (SC4)

28 ATLAS SC4 Service Phase Activity

29 Various Measurements on USATLAS FTS Data Transfer Load generator (not limit) Normal US production The file transfer from Tier2 to BNL (MB/sec) by US production (measured in BNL FTS.) The typical transfer is several MB/sec from Tier2s to BNL. Note: It is nowhere close to the limit of BNL facility as the load generator can produce larger transfer rate. Dec 1, 2006 Jan 22, 2007

30 Number of Files Transferred Dec 1, 2006 The number of files with different sizes transferred per day from Tier2s to BNL by US production. DQ2 transfers several thousand files a day for US production. Normal US production Load generator (not limit) Jan 22, 2007

31 Summary  Developed useful dCache Benchmark (throughput, number of dCache dccp/SRM transaction per second).  Backup slides with benchmark results (see Danton Yu).  BNL dCache will be upgraded to dCache 1.7 on Jan/30.  Develop Operation Document to allow the existing operations to monitor Grid-based USATLAS production, report and solve problems, to improve and meet the service level agreement.

32 SRM Performance Issues  Cleanup of SRM DB showed significant performance improvement:  Before cleanup, 40 simultaneous SRM operations, we observed large number of SRM errors, the system performance dramatically decreased.  After cleanup, 70 simultaneous SRM operations, dCache still sustains stable data transfers. Further intensive tests needed to show threshold.  SRM Transaction rate is determined by SRM load. Copy 450 short files (1KB per file) with different client concurrencies:  10 clients: 120 SRM transactions per minute.  50 clients: 30 SRM transactions per minute.  70 clients: 26 SRM transactions per minute.  Tested new hardware & tried in memory DB (tmpfs):  60 concurrent file transfers (disk): 46 transactions per minute.  60 concurrent file transfers (tmpfs): 63 transactions per minute (40% better).

33 Backup slides

34 Current Usage continues… Dec 1, 2006 Normal US Production Load generator (not limit) Jan 22, 2007

35 Current Usage continues…

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 36 CSC Production in 2006

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 37 U.S. ATLAS Production

38 dccp Performance Issues  dccp test on small files (1KB):  10 concurrent files: 222 files per minute  20 concurrent files: 150 files per minute

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 39 Software Profile Platform: SLC4 –Linux uct2-grid EL.cernsmp #1 SMP Fri Oct 6 12:07:54 CEST 2006 i686 athlon i386 GNU/Linux –xfs filesystem: benchmarked at 133 MB/s R/W OpenPBS –Simple: one queue with a 72 hour wall-time limit Cluster management tools from ACT –Image “cloner” and “beo_exec” command script dCache full bundle (server, client, postgres, dcap) OSG GUMS –Configured to authorize atlas and OSG proxies ATLAS –Releases: kitval –DQ2 site services installed via dq2.sh

40 Service Challenge 4 Disk/Tape Split

41 January BNL / Tier 2 Transfer Stability Exercise

42 BNL / Tier 2 Production Traffic Monitored by NetFlow

43 BNL and Tier 2 Production Data Transfer AOD 141K files 139 TB ESD 128K files 91 TB total 1372K files 550 TB

J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 44 Tier 1 Utilization (2)  Heavy Utilization and Demand