U.S. ATLAS Computing Facilities U.S. ATLAS Physics & Computing Review Bruce G. Gibbard, BNL January 2000
11 January, 2000 U.S. ATLAS Physics & Computing Review 2 US ATLAS Computing Facilities Facilities procured, installed and operated –…to meet US ‘MOU’ Obligations Direct IT responsibility (Monte Carlo, for example) Support for detector construction, testing, & calib. Support for software development and testing –…to enable effective participation by US physicists in ATLAS physics program ! Direct access to and analysis of physics data sets Support simulation, re-reconstruction, and reorganization of data associated with that analysis
11 January, 2000 U.S. ATLAS Physics & Computing Review 3 Setting the Scale Uncertainties in Defining Requirements –Five years of detector, algorithm & software development –Five years of computer technology evolution Start from ATLAS estimate & rules of thumb Adjust for US ATLAS perspective (experience and priorities) Adjust for details of architectural model of US ATLAS facilities
11 January, 2000 U.S. ATLAS Physics & Computing Review 4 Atlas Estimate & Rules of Thumb Tier 1 Center in ‘05 should include... –30,000 SPECint95 for Analysis –10-20,000 SPECint95 for Simulation – TBytes/year of On-line (Disk) Storage –200 TBytes/year of Near-line (Robotic Tape) Storage –100 Mbit/sec connectivity to CERN Assume no major raw data processing or handling outside of CERN
11 January, 2000 U.S. ATLAS Physics & Computing Review 5 US ATLAS Perspective US ATLAS facilities must be adequate to meet any reasonable U.S. ATLAS computing needs ( U.S. role in ATLAS should not be constrained by a computing shortfall, rather the U.S. role should be enhanced by computing strength ) –Store & re-reconstruct 10-30% of events –Take high end of simulation capacity range –Take high end of disk capacity range –Augment analysis capacity –Augment CERN link bandwidth
11 January, 2000 U.S. ATLAS Physics & Computing Review 6 Adjusted For US ATLAS Perspective US ATLAS Tier 1 Center in ‘05 should include... –10,000 SPECint95 for Re-reconstruction –50,000 SPECint95 for Analysis –20,000 SPECint95 for Simulation –100 TBytes/year of On-line (Disk) Storage –300 TBytes/year of Near-line (Robotic Tape) Storage –Dedicate OC12, 622 Mbit/sec to CERN
11 January, 2000 U.S. ATLAS Physics & Computing Review 7 Architectural Model Consists of Transparent Hierarchically Distributed Grid Connected Computing Resources –Primary ATLAS Computing Centre at CERN –US ATLAS Tier 1 Computing Center at BNL National in scope at ~20% of CERN –US ATLAS Tier 2 Computing Centers Six, each regional in scope at ~20% of Tier 1 Likely one of them at CERN –US ATLAS Institutional Computing Facilities Local LAN in scope, not project supported –US ATLAS Individual Desk Top Systems
11 January, 2000 U.S. ATLAS Physics & Computing Review 8 Schematic of Model
11 January, 2000 U.S. ATLAS Physics & Computing Review 9 Distributed Model Rationale (benefits) –Improved user access to computing resources Local geographic travel Higher performance regional networks –Enable local autonomy Less widely shared More locally managed resources –Increased capacities Encourage integration of other equipment & expertise –Institutional, base program Additional funding options –Com Sci, NSF
11 January, 2000 U.S. ATLAS Physics & Computing Review 10 Distributed Model But increase vulnerability (Risk) –Increased dependence on network –Increased dependence on GRID infrastructure R&D –Increased dependence on facility modeling tools –More complex management Risk / benefit analysis must yield positive result
11 January, 2000 U.S. ATLAS Physics & Computing Review 11 Adjusted For Architectural Model US ATLAS facilities in ‘05 should include... –10,000 SPECint95 for Re-reconstruction –85,000 SPECint95 for Analysis –35,000 SPECint95 for Simulation –190 TBytes/year of On-line (Disk) Storage –300 TBytes/year of Near-line (Robotic Tape) Storage –Dedicated OC Mbit/sec Tier 1 connectivity to each Tier 2 –Dedicate OC Mbit/sec to CERN
11 January, 2000 U.S. ATLAS Physics & Computing Review 12 GRID Infrastructure GRID infrastructure software must supply –Efficiency (optimizing hardware use) –Transparency (optimizing user effectiveness) Projects –PPDG : Distributed data services - Later talk by D. Malon –APOGEE: Complete GRID infrastructure including: distributed resources management, modeling, instrumentation, etc. –GriPhyN: Staged development toward delivery of a production system Alternative to success with these projects is a difficult to use and/or inefficient overall system U.S. ATLAS involvement includes - ANL, LBNL, LBNL
11 January, 2000 U.S. ATLAS Physics & Computing Review 13 Facility Modeling Performance of Complex Distribute System is Difficult but Necessary to Predict MONARC - LHC centered project –Provide toolset for modeling such systems –Develop guidelines for designing such systems –Currently capable of relevant analyses U.S. ATLAS Involvement –Later talk by K. Sliwa
11 January, 2000 U.S. ATLAS Physics & Computing Review 14 Components of Model: Tier 1 Full Function Facility –Dedicated Connectivity to CERN –Primary Site for Storage/Serving Cache/Replicate CERN data needed by US ATLAS Archive and Serve WAN all data of interest to US ATLAS –Computation Primary Site for Re-reconstruction (perhaps only site) Major Site for Simulation & Analysis (~2 x Tier 2) –Repository of Technical Expertise and Support Hardware, OS’s, utilities, and other standard elements of U.S. ATLAS Network, AFS, GRID, & other infrastructure elements of WAN model
11 January, 2000 U.S. ATLAS Physics & Computing Review 15 Components of Model: Tier 2 Limit personnel and maintenance support costs Focused Function Facility –Excellent connectivity to Tier 1 (Network + GRID) –Tertiary storage via Network at Tier 1 (none local) –Primary Analysis site for its region –Major Simulation capabilities –Major online storage cache for its region Leverage local expertise and other resources –Part of site selection criteria, ~1 FTE contributed, for example
11 January, 2000 U.S. ATLAS Physics & Computing Review 16 Technology Trends & Choices CPU –Range: Commodity processors -> SMP servers –Factor 2 decrease in price/performance in 1.5 years Disk –Range: Commodity disk -> RAID disk –Factor 2 decrease in price/performance in 1.5 years Tape Storage –Range: Desktop storage -> High-end storage –Factor 2 decrease in price/performance in years
11 January, 2000 U.S. ATLAS Physics & Computing Review 17 Price/Performance Evolution From Harvey Newman presentation, Third LCB Workshop, Marseilles, Sept As of Dec 1996
11 January, 2000 U.S. ATLAS Physics & Computing Review 18 Technology Trends & Choices For Costing Purpose –Start with familiar established technologies –Project by observed exponential slopes This is a Conservative Approach –There are no known near term show stoppers to these established technologies –A new technology would have to be more cost effective to supplant projection of an established technology
11 January, 2000 U.S. ATLAS Physics & Computing Review 19 Technology Trends & Choices CPU Intensive processing –Farms of commodity processors - Intel/Linux I/O Intensive Processing and Serving –Mid-scale SMP’s (SUN, IBM, etc.) Online Storage (Disk) –Fibre Channel Connected RAID Nearline Storage (Robotic Tape System) –STK / 9840 / HPSS LAN –Gigabit Ethernet
11 January, 2000 U.S. ATLAS Physics & Computing Review 20 Composition of Tier 1 Commodity processor farms (Intel/Linux) Mid-scale SMP servers (SUN) Fibre Channel connected RAID disk Robotic tape / HSM system (STK / HPSS)
11 January, 2000 U.S. ATLAS Physics & Computing Review 21 Current Tier 1 Status U.S. ATLAS Tier 1 facility is currently operating as a small, ~5 %, adjunct to the RHIC Computing Facility (RCF) Deployment includes –Intel/Linux farms (28 CPU’s) –Sun E450 server (2 CPU’s) –200 Mbytes of Fibre Channel RAID Disk –Intel/Linux web server –Archiving via low priority HPSS Class of Service –Shared use of an AFS server (10 GBytes)
11 January, 2000 U.S. ATLAS Physics & Computing Review 22 Current Tier 1 Status These RCF chosen platforms/technologies are common to ATLAS –Allows wide range of services with only 1 FTE of sys admin contributed (plus US ATLAS librarian) –Significant divergence of direction between US ATLAS and RHIC has been allowed for –Complete divergence, extremely unlikely, would exceed current staffing estimates
11 January, 2000 U.S. ATLAS Physics & Computing Review 23
11 January, 2000 U.S. ATLAS Physics & Computing Review 24 RAID Disk Subsystem
11 January, 2000 U.S. ATLAS Physics & Computing Review 25 Intel/Linux Processor Farm
11 January, 2000 U.S. ATLAS Physics & Computing Review 26 Intel/Linux Nodes
11 January, 2000 U.S. ATLAS Physics & Computing Review 27 Composition of Tier 2 (Initial One) Commodity processor farms (Intel/Linux) Mid-scale SMP servers Fibre Channel connected RAID disk
11 January, 2000 U.S. ATLAS Physics & Computing Review 28 Staff Estimate (In Pseudo Detail)
11 January, 2000 U.S. ATLAS Physics & Computing Review 29 Time Evolution of Facilities Tier 1 functioning as early prototype –Ramp up to meet needs and validate design Assume 2 years for Tier 2 to fully establish –Initiate first Tier 2 in 2001 True Tier 2 prototype Demonstrate Tier 1 - Tier 2 interaction –Second Tier 2 initiated in 2002 (CERN?) –Four remaining initiated in 2003 Fully operational by 2005 Six are to be identical (CERN exception?)
11 January, 2000 U.S. ATLAS Physics & Computing Review 30 Staff Evolution
11 January, 2000 U.S. ATLAS Physics & Computing Review 31 Network Tier 1 connectivity to CERN and to Tier 2’s is critical –Must be guaranteed and allocable (dedicated and differentiate) –Must be adequate (Triage of functions is disruptive) –Should grow with need; OC12 should be practical by 2005 when serious data will flow
11 January, 2000 U.S. ATLAS Physics & Computing Review 32 WAN Configurations and Cost (FY 2000 k$)
11 January, 2000 U.S. ATLAS Physics & Computing Review 33 Annual Equipment Costs for Tier 1 Center (FY 2000 k$)
11 January, 2000 U.S. ATLAS Physics & Computing Review 34 Annual Equipment Costs Tier 2 Center (FY 2000 k$)
11 January, 2000 U.S. ATLAS Physics & Computing Review 35 Integrated Facility Capacities by Year
11 January, 2000 U.S. ATLAS Physics & Computing Review 36 US ATLAS Facilities Annual Costs (FY2000 k$)
11 January, 2000 U.S. ATLAS Physics & Computing Review 37 Major Milestones