Presentation is loading. Please wait.

Presentation is loading. Please wait.

U.S. ATLAS Facilities Jim Shank Boston University (Danton Yu, Rob Gardner, Kaushik De, Torre Wenaus, others)

Similar presentations


Presentation on theme: "U.S. ATLAS Facilities Jim Shank Boston University (Danton Yu, Rob Gardner, Kaushik De, Torre Wenaus, others)"— Presentation transcript:

1 U.S. ATLAS Facilities Jim Shank Boston University (Danton Yu, Rob Gardner, Kaushik De, Torre Wenaus, others)

2 2 Outline  U.S. Facilities: Computing and storage.  SRM.  ATLAS Distributed Data Manamgement.  ATLAS Production Support.  Service Challenge and Data Challenges.  Problems to be discussed.

3 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 3 U.S. Facilities  Tier 1 – Brookhaven National Laboratory  Northeast Tier 2 – Boston University, Harvard University  Midwest Tier 2 – University of Chicago, Indiana University  Southwest Tier 2 – University of Texas at Arlington, Oklahoma University, Langston University  Western Tier 2 – SLAC + others  Great Lakes Tier 2 – University of Michigan, Michigan State University  Also a muon calibration center  All U.S. facilities use integrated Panda/DQ2 system for managed production and distributed analysis  All U.S. facility are dedicated to ATLAS, but run other VO jobs through the Open Science Grid.

4 4 US Facilities and Open Science Grid

5 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 5  Includes internationally managed ATLAS resources plus some capacity retained under US ATLAS control to support local analysis needs US ATLAS Capacities

6 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 6 Capacity Projections for US Tier 2’s  Totals include dedicated capacities committed to international ATLAS plus those retained under US control for local physicists

7 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 7 U.S. Sites Availability  Tier 1 is available 24x7 except for scheduled downtimes  Some Tier 2’s are available 24x7, others best effort basis  Past experience from ATLAS Computing System Commissioning (CSC) production  Lost production time ~1% for U.S. facilities combined in 2006  Panda recovery mechanism allows sites to operate independently for many hours in the event of short Tier 1 or Tier 0 downtime – jobs are recovered when services resume (for example, see spikes in walltime usage plots after each valley)  U.S. shift team coordinates all production (managed, DA…) and helps site by reporting and debugging problems

8 8 BNL Computing Facility Upgrade  Addition of 4.5 People  45% increase, focus on Data Management and Operation.  Equipment upgrade  CPU Farm: 700 kSI2k  1300 kSI2k (factor of ~2): 160 dual core dual cpu AMD nodes were added due to better power consumption than Intel Pre-woodcrest based on Xeon processor.  dCache managed Linux Farm mounted disk  140 TB  420 TB  Upgraded ATLAS LAN backbone to fully redundant 20 Gb/sec  Replaced 25 TB of obsolescing NFS served Fiber Channel / RAID disk with new equipment  Major mass storage upgrade, a new storage silo with 8 drives (2.6 PB capacity at ~400 MB/sec)

9 9 20 Gb/s NSF RAID (20 TB) HPSS Mass Storage System Gridftp (2 nodes / 0.8 TB local) HRM SRM (1 node) dCache SRM (1 node) Gridftp door (4 nodes) WAN 2x10 Gb/s LHC OPN VLAN 2 x 1 Gb/s 1 Gb/s Write Pool (10 nodes / 2.1 RAID5 TB) Farm Pool (474 nodes / 460 TB) 5 x 1 Gb/s Tier 1 VLANS 20 Gb/s 5 x 1 Gb/s dCache.... N x 1 Gb/s.... 20 Gb/s Logical Connections BNL Tier 1 WAN Storage Interfaces and Logic View

10 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 10 Midwest Tier2 Hardware Profile Processors –72 Dual CPU, dual core AMD Opteron 285 (2.6 GHz): 154k SI2K Storage –500 GB local scratch –5 x 500GB Hardware RAID5 / node (2.5TB/node dCache pools) –65 TB dCache Edge servers –for dCache, DQ2, NFS (OSG, /home), mgt services Interconnect and 10G WAN connectivity –Cisco 6509/UC, Force10/IU; 10G blades (for four hosts, 2 at each site) Cluster management –Cyclades terminal servers for console logging –Ethernet accessible power distribution units for power management

11 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 11 Layout Dual role for worker nodes –Four processing cores –dCache R/W pool (2.5 TB) –500 GB scratch Edge servers –3 dCache services nodes dc1: gridFTP, dcap, SRM dc2: pnfs server, Postgres dc3: admin, gridFTP, dcap –DQ2 –OSG gatekeeper –Login Network –UC: Cisco, w/10G iWIRE to Starlight –IU: Force10, w/10G iLIGHT to Starlight Other services deployed: –OpenPBS, Ganglia, Nagios IU site same except all nodes on public network and Force10 switch http://plone.mwt2.org/monitors

12 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 12 10G Network Tests Tests using griftpPRO using several hosts at each end Plots show copy rates ~200 MB/s IU to UC Another test UC to IU ~400 MB/s One 30 minute interval achieved 539 MB/s

13 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 13 Network Testing II 10 simultaneous transfers executed during each iteration (there were 256 iterations in total) was based on a bbftPRO file transfer command. Each transfer has used 10 parallel streams to transfer a 1.7 GB file. Each file transfer was performed between two different hosts, one at UC, the other at IU. Each host had a 1Gbs capable NIC Have not adjusted TCP window size, MTU limits, etc.

14 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 14 Tier2 Operations Model Developing an operations model of weekly shifts –one person designated as a primary shift operator to proactively monitor equipment, cluster services, and ATLAS jobs and DDM services. –A secondary person serves as backup Interface to ATLAS production system group through email and RT trouble ticketing system Coverage is during normal working hours –with best effort on weekends and evenings One day a month scheduled downtime for maintenance

15 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 15 PanDA

16 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 16 CPU Usage/day For Successful Jobs ATLAS Computing System Commissioning (CSC) exercise

17 17 CPU Usage/day For Successful Jobs

18 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 18 CSC: What Worked Well  U.S. Production Shift Team  Strong coordination between Tier 1 and all Tier 2 sites (regularly scheduled meetings (weekly phone, face-face every 6 months)  Coordinated Tier 2 purchases (sometimes from same vendor), sharing of experience  Coordinated OSG deployment (phased for Tier 2’s if possible to avoid production interruption)  Coordinated DQ2 deployment – but all sites run independent DQ2 site services for robustness

19 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 19 CSC: What Needs Improvement  Data management  We are still experimenting with different storage solutions – dcache, IBRIX, GFS, NFS…  Not every site has SRM/DRM solution implemented yet  DDM operations and scalability issues  Central services  Database services at Tier 1  Scaling to 2008 capacities  Gradually ramping up in 2007

20 20 SRM Throughput Performance  During Service Challenge Throughput, we observed our system reach 250M byte/second for one day. (intensive write into BNL)  Intensive Monitoring and System Tuning. Experts run the system.  Many manual interventions and coordination between CERN, BNL, and Other Tier 1.  When coupled Service Challenge, data migration to HPSS and farm nodes, along with USATLAS production, the dCache system performance can sustain 120 M Byte/second. This is real life performance. (intensive reads+writes)  Problems:  Pnfs & SRM performance bottlenecks.  Linux SCSI driver/buffer cache cannot efficiently handle parallel read stream and write stream.  Exclusive read or write performance is good, but we see a 50% ~ 80% performance degradation when mixing read and write streams.  Software Raid, Linux Volume Manager, and file system affect disk I/O performance too, but are relatively minor compared to Linux kernel buffer cache problem.

21 21 Solutions to Problem  Filesystem: write pools ext3 -> xfs (More important on RHEL3, less improvement in RHEL4). Other sites verified this?  Tune Postgres Database on PNFS and SRM to improve performance (Postgres shm buffers, DB and core services split, HW RAID disk).  Reconfigure the Linux Kernel Buffer and Cache? One topic to be discussed  Avoid Mixing Read and Write Operations to disks  dCache 1.7.0 has a central flushing system which alternates between data writing into dCache and data migrations into HPSS  Put SRM database to memory to improve transaction rate since currently SRM DB is transient, no history needs to be kept.  Multiple SRM servers (DNS Round Robin or IP Virtual Server)  Alternative OS to improve the disk I/O performance. BNL did intensive Sun Thumper Test (Ofer Rind) as shown in the following slides. Any site is interested in this?

22 22 SUN Thumper Test Results  150 clients sequentially reading 5 random 1.4G files.  Throughput is 350 MB/s for almost 1 hour, test is done for dccp, GridFtp shows the same results:   75 clients sequentially writing 3x1.4G files and 75 clients sequentially reading 4x1.4G randomly selected files.  dccp Throughput is 200 MB/s write & 100 MB/s read.  srmcp: Throughput is 100MB/s read & 170MB/s write.

23 23 USATLAS DDM Operation  USATLAS used DQ2 to manage production more than a year ago.  BNL physicists started using DQ2 to transfer data.  Four instances were deployed at BNL. Other Tier 2 has one or two instances.  Tier 1 added many customizations for DDM to clean up failed/stuck data transfer in FTS/dCache and invalid datasets and their registrations in LRC.  Running DQ2 still relies on DQ2 experts, which requires large DDM operation team. BNL has 2.5 FTE on DQ2 operation and maintenance.  The Reliability/stability of DDM needs to be improved for the ramping up data transfer. Intensive monitoring by DQ2 experts during nights and weekends.  New Operation page is being developed to ease the basic DQ2 operation, and involve regular RHIC/USATLAS operators to monitor and emergency recovery.

24 24 ATLAS Production Support at BNL  USATLAS production used high performance MySQL cluster.  Never underestimate users capacity to use up your DB.  Work with user interactively to develop insight into user requirement and then provide solution. Plan ahead!

25 25 MySQL DB Status

26 26 MySQL DB for Panda Production  Joint effort between ACF(T1) and USATLAS Computing group.  Memory based MySQL Cluster with 2 front end nodes and two memory base storage nodes.  User will experience DB slowness and instability with each “Lock time out”.  As far as query performance goes, here are the top 5 queries by time with query time before and after the addition of indexes. Lock time is is the order of 0.01s, v.s. 1~2 second before index. Server load from 2.25 to 1.5, CPU utilization has gone from 30% to 15 %. Table Op Code Tag Time: before after ======================================================================== filesTable4 UPDATE DBProxy.updateInFilesReturnPandaIDs 12.44 <1 jobsArchived SELECT No tag. (PandaOverview.getErrors) 120.82 2.67 Datasets SELECT DBProxy.queryDatasetWithMap 13.01 11.0 filesTable4 SELECT DBproxy.countFilesWithMap 8.92 2.17 jobsActive4 UPDATE DBProxy.getJobs 17.94 3.25 (important one to speed up job dispatch) Table Op Code Tag Time: before after ======================================================================== filesTable4 UPDATE DBProxy.updateInFilesReturnPandaIDs 12.44 <1 jobsArchived SELECT No tag. (PandaOverview.getErrors) 120.82 2.67 Datasets SELECT DBProxy.queryDatasetWithMap 13.01 11.0 filesTable4 SELECT DBproxy.countFilesWithMap 8.92 2.17 jobsActive4 UPDATE DBProxy.getJobs 17.94 3.25 (important one to speed up job dispatch)  Will continue to scale database performance with two orders of magnitude. Four options to further improve MySQL DB: 1)InnoDB engine. 2) Master and Slave architecture. 3) FroNtier Project, 4) Continue to optimize DB programming in Panda Production System, 5) Oracle RAC

27 27 WLCG Service Challenge 4 (SC4)

28 28 ATLAS SC4 Service Phase Activity

29 29 Various Measurements on USATLAS FTS Data Transfer Load generator (not limit) Normal US production The file transfer from Tier2 to BNL (MB/sec) by US production (measured in BNL FTS.) The typical transfer is several MB/sec from Tier2s to BNL. Note: It is nowhere close to the limit of BNL facility as the load generator can produce larger transfer rate. Dec 1, 2006 Jan 22, 2007

30 30 Number of Files Transferred Dec 1, 2006 The number of files with different sizes transferred per day from Tier2s to BNL by US production. DQ2 transfers several thousand files a day for US production. Normal US production Load generator (not limit) Jan 22, 2007

31 31 Summary  Developed useful dCache Benchmark (throughput, number of dCache dccp/SRM transaction per second).  Backup slides with benchmark results (see Danton Yu).  BNL dCache will be upgraded to dCache 1.7 on Jan/30.  Develop Operation Document to allow the existing operations to monitor Grid-based USATLAS production, report and solve problems, to improve and meet the service level agreement.

32 32 SRM Performance Issues  Cleanup of SRM DB showed significant performance improvement:  Before cleanup, 40 simultaneous SRM operations, we observed large number of SRM errors, the system performance dramatically decreased.  After cleanup, 70 simultaneous SRM operations, dCache still sustains stable data transfers. Further intensive tests needed to show threshold.  SRM Transaction rate is determined by SRM load. Copy 450 short files (1KB per file) with different client concurrencies:  10 clients: 120 SRM transactions per minute.  50 clients: 30 SRM transactions per minute.  70 clients: 26 SRM transactions per minute.  Tested new hardware & tried in memory DB (tmpfs):  60 concurrent file transfers (disk): 46 transactions per minute.  60 concurrent file transfers (tmpfs): 63 transactions per minute (40% better).

33 33 Backup slides

34 34 Current Usage continues… Dec 1, 2006 Normal US Production Load generator (not limit) Jan 22, 2007

35 35 Current Usage continues…

36 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 36 CSC Production in 2006

37 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 37 U.S. ATLAS Production

38 38 dccp Performance Issues  dccp test on small files (1KB):  10 concurrent files: 222 files per minute  20 concurrent files: 150 files per minute

39 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 39 Software Profile Platform: SLC4 –Linux uct2-grid6 2.6.9-42.0.3.EL.cernsmp #1 SMP Fri Oct 6 12:07:54 CEST 2006 i686 athlon i386 GNU/Linux –xfs filesystem: benchmarked at 133 MB/s R/W OpenPBS –Simple: one queue with a 72 hour wall-time limit Cluster management tools from ACT –Image “cloner” and “beo_exec” command script dCache 1.6.6 full bundle (server, client, postgres, dcap) OSG 0.4.1 GUMS –Configured to authorize atlas and OSG proxies ATLAS –Releases: 11.0.3 11.0.42 11.0.5 12.0.3 12.0.31 12.3.0 kitval –DQ2 site services installed via dq2.sh

40 40 Service Challenge 4 Disk/Tape Split

41 41 January BNL / Tier 2 Transfer Stability Exercise

42 42 BNL / Tier 2 Production Traffic Monitored by NetFlow

43 43 BNL and Tier 2 Production Data Transfer AOD 141K files 139 TB ESD 128K files 91 TB total 1372K files 550 TB

44 J. Shank US ATLAS Status wLCG Workshop 24 Jan., 2007 CERN 44 Tier 1 Utilization (2)  Heavy Utilization and Demand


Download ppt "U.S. ATLAS Facilities Jim Shank Boston University (Danton Yu, Rob Gardner, Kaushik De, Torre Wenaus, others)"

Similar presentations


Ads by Google