U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Mid-year Review of U.S. LHC Software and Computing Projects NSF Headquarters,

U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Mid-year Review of U.S. LHC Software and Computing Projects NSF Headquarters, Arlington, Virginia July 8, 2003

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 2 Mission of US ATLAS Computing Facilities  Supply capacities to the ATLAS Distributed Virtual Offline Computing Center  At levels agreed to in a computing resource MoU (Yet to be written)  Guarantee the Computing Required for Effective Participation by U.S. Physicists in the ATLAS Physics Program  Direct access to and analysis of physics data sets  Simulation, re-reconstruction, and reorganization of data as required to support such analyses

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 3 ATLAS Facilities Model  ATLAS Computing Will Employ the ATLAS Virtual Offline Computing Facility to process and analyze its data  “Cloud” mediated set of resources including:  CERN Tier 0  All Regional Facilities (Tier 1’s) - Typically ~200 users each  Some National Facilities (Tier 2’s)  All members of ATLAS Virtual Organization (VO) must contribute in funds or in kind (personnel, equipment), proportional to author count  All members of ATLAS VO will have defined access rights  Typically only a subset of resources at a regional or national center are Integrated into the Virtual Facility  Non-integrated portion over which regional control is retained is expected to be used to augment resources supporting analyses of region interest

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 4 Analysis Model: All ESD Resident on Disk  Enables ~24 hour selection/regeneration passes (versus ~month if tape stored) – faster, better tuned, more consistent selection  Allows navigation for individual events (to all processed, though not Raw, data) without recourse to tape and associated delay – faster more detailed analysis of larger consistently selected data sets  Avoids contention between analyses over ESD disk space and the need to develop complex algorithms to optimize management of that space – better result with less effort  Complete set on disk at US Tier 1 cost impact discussed later  Reduced sensitivity to performance of multiple Tier 1’s, intervening network (transatlantic) & middleware – improved system reliability, availability, robustness and performance – cost impact discussed later

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 5 2.3 US ATLAS Facilities  A Coordinated Grid of Distributed Resources Including … Rich Baker / Bruce Gibbard  Tier 1 Facility at Brookhaven – Rich Baker / Bruce Gibbard  Currently operational at ~1% of required 2008 capacity Saul Youssef  5 Permanent Tier 2 Facilities – Saul Youssef  Scheduled for selection beginning in 2004  Currently there are 2 Prototype Tier 2’s  Indiana U – Fred Luehring / University of Chicago – Rob Gardner  Boston U – Saul Youssef  7 Currently Active Tier 3 (Institutional) Facilities Shawn McKee  WAN Coordination Activity – Shawn McKee Rob Gardner  Program of Grid R&D Activities – Rob Gardner  Based on Grid Projects ( PPDG, GriPhyN, iVDGL, EU Data Grid, EGEE, etc.) Kaushik De/Pavel Nevski  Grid Production & Production Support Effort – Kaushik De/Pavel Nevski

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 6 Facilities Organization Chart

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 7

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 8 WBS 2.3 Personnel Increase for FY ‘04 ( ) Important not fully funded request

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 9 2.3.1 Tier 1 Facility  Functions  Primary U.S. data repository for ATLAS  Programmatic event selection and AOD & DPD regeneration from ESD  Chaotic high level analysis by individuals  Especially for large data set analyses  Significant source of Monte Carlo  Re-reconstruction as needed  Technical support for smaller US computing resource centers  Co-located and operated with the RHIC Computing Facility  To date a very synergistic relationship  Some recent increased divergence  Substantial benefit from cross use of idle resources (2000 CPU’s)

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 10 Tier 1 Facility Evolution for FY ‘04  No staff increase nor equipment procurement for FY ‘03  Only new equipment for FY ‘02 based on DOE end-of-year funding supplement; 10 TBytes disk addition & upgrade of single tape drive  Result has been capacities lower than expected and needed  Compute capacities applied to ATLAS Data Challenge 1 (DC 1) were ~x 2 less than expected by ATLAS based on US author count  Only very efficient facility utilization and supplemental production at Tier 2’s & 3’s resulted in an acceptable level of US contribution  Modest equipment upgrades planned for FY ’04 (for DC 2)  Disk: 12 TBytes  25 TBytes (factor of 2)  CPU Farm: 30 kSPECint2000  130 kSPECint2000 (factor of 4)  First processor farm upgrade since FY ’01 (3 years)  Robotic Tape Storage: 30 MBytes/sec  60 MBytes/sec (factor of 2)

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 11 Capital Equipment

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 13 Need for Tier 1 Facility Staff Increase  Procurement, Installation and Operation of additional equipment  Need for ATLAS specific Linux OS - RH 7.3 versus RHIC RH 9  Investigation of alternate disk technologies  In particular CERN Linux disk server-like approaches  Increased complexity of cyber security and AAA for Grid  Major increases in user base and level of activity in 2004  Grid 3/PreDC2, Grid demonstration exercise in preparation for DC2  DC2, ATLAS Data Challenge 2  LHC Computing Grid (LCG) deployment ( LCG-0  LCG-1)

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 16 Cost Impact of All ESD on Local Disk  Assumptions  Increase from 480 TB to 1 PB of total disk  Some associated increase in CPU and infrastructure  Simple extension of current technology  Using a conservative technology so cost may be over estimated  Personnel requirement unchanged  Alternative is effort spent optimizing transfer and caching schemes  Tier 1 Facility cost differential through 2008 (First full year of LHC operation)  Since facility cost is not dominated by hardware, …reduction to “1/3 disk model” certainly reduces cost but not dramatically

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 17 2.3.2 Tier 2 Facilities  5 Permanent Tier 2 Facilities  Primary resource for simulation  Empower individual institutions and small groups to do autonomous analyses using more directly accessible and locally managed resources  2 Prototype Tier 2’s selected for ability to rapidly contribute to Grid development  Indiana University / (effective FY ‘03) University of Chicago  Boston University  Permanent Tier 2 will be selected to leverage strong institutional resources  Selection of first two scheduled for spring 2004  Currently 7 active Tier 3’s in addition to prototype Tier 2’s; all candidates Tier 2’s  Aggregate of 5 permanent Tier 2’s will be comparable to Tier 1 in CPU

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 18 Tier 2 Facilities Evolution  First Significant iVDGL Funded Equipment Procurements Now Underway – (Moore’s law  Don’t buy it until you need it)  Second Round Scheduled for Summer FY ’04  At time of DC2, aggregate Tier 2 capacities comparable to those of Tier 1; later in 2004, very significantly more

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 19 2.3.3 Networking  Responsible for:  Specifying both the national and international WAN requirements of US ATLAS  Communicating requirement to appropriate Network Infrastructure suppliers (ESnet, Internet 2, etc.)  Monitoring the extent to which WAN requirements …  … are currently being met  … will continue to be met as they increase in the future  Small base program support effort includes:  Interacting with ATLAS facility site managers and technical staff  Participating in HENP networking forums  Adopt/adapt/develop, deploy, & operate WAN monitoring tools  WAN upgrades not anticipate during next year  Currently Tier 1 & 2 sites are at OC12 except UC, now planning OC3  OC12 by Fall  Upcoming exercises require ~1 TByte/day (~15% of OC12 theoretical capacity)  RHIC competitive utilization at BNL current also in ~15% range

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 20 2.3.4 Grid Tools & Services  Responsible for development, evaluation, and creation of integrated Grid-based system for distributed production processing and user analysis  Primary point of contact and coordination with Grid projects ( PPDG, GriPhyN, iVDGL, EDG, EGEE, etc.)  Accept, evaluate, and integrate tools & services from Grid projects  Transmit requirements and feedback to Grid projects  Responsibility for supporting the integration of ATLAS application with Grid tools & services

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 22 2.3.5 Grid Production  Responsible for deploying, production scale testing & hardening, operating, monitoring and documenting the performance of systems for production processing & user analysis  Primary point of contact to ATLAS production activities including the transmission of …  … production requests to, and facilities availability from, the rest of US ATLAS computing management  … requirements to ATLAS production for optimal use of US resources  … feedback to Tools & Service effort regarding production scale issues  Responsible for integration, on an activity by activity basis, of US ATLAS production contributions into overall ATLAS production  Requested increase by 2.65 Project supported FTE’s for FY ’04 to address growing production demands but budget supports only 1.65

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 23 Increasing Grid Production  Two significant production activities in FY ’04 (Only DC1 in FY ’03)  Grid3/PreDC2 exercise  DC2  While each is anticipated to be a few months in duration, experience from DC1 indicates that near continuous ongoing production is more likely  Production is moving from being centric to being centric  Production is moving from being Facility centric to being Grid centric  In its newness, Grid computing is a more complex and less stable production environment and currently requires more effort  Level of effort  During DC1 (Less than 50% Grid using 5 sites) – 3.35 FTE’s (0.85 Project)  For Grid3/PerDC2/DC2 (~100% Grid using 11 sites) – Minimum of 6 FTE’s  Reductions below this level (forced by budget constraint)  Will reduce efficiency of resource utilization  Will force some fallback from Grid to Facility type production to meet commitments

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 25 3 Major Near Term Milestones  LCG Deployment including US Tier 1  LCG-0, exercise deployment mechanisms – completed May ’03  Substantial comment on mechanisms offer and seemed well received  LCG-1, initial deployment beginning – July ‘03  LCG-1, full function, reliable, manageable service – Jan ‘04  PreDC2/Grid3 exercise – Nov ‘03  Full geographic chain Tier 2  Tier 1  Tier 0  Tier 1  Tier 2 + analysis  Goals: Test DC2 model, forge Tier0 / Tier1 staff link, initiate Grid analysis  ATLAS DC2 – April ’04 (Slippage by ~3 months is not unlikely)  DC1 scale in number of events ~10 7 but x 2 in CPU & storage for G  ant 4  Exercising complete geographic chain (Tier 2  Tier 1  Tier 0  Tier 1  Tier 2)  Goal: Use of LCG-1 for Grid computing as input to Computing Model Document

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 26 Near Term Schedule   

8 July 2003 B. Gibbard Review of U.S. LHC Computing - Arlington VA 28 A U.S. ATLAS Physics Analysis Center at BNL  Motivation:  Position the U.S. to insure active participation in ATLAS physics analysis  Builds on existing Tier 1 ATLAS Computing Center, CORE Software leadership at BNL, and theorists who already are working closely with experimentalists.  This BNL Center will become a place where U.S. physicists come with their students and post-docs.  Scope and Timing:  Hire at least 1 key physicist/year starting in 2003 to add to excellent existing staff to cover all aspects of ATLAS physics analysis: tracking, calorimetry, muons, trigger, simulation, etc.  Expect the total staff including migration from D0 will reach ~25 by 2007  First hire will arrive on August 26, 2003  The plan is to have a few of the members in residence at CERN for 1-2 years on a rotating basis.  Cost: base funding  Will need DOE increment to the declining BNL HEP base program. Additional base funding of ~$200k/year FY03 => $1.5M in FY07. H. Gordon, BNL DOE Annual HEP Program Review, April 22, 2002

U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Mid-year Review of U.S. LHC Software and Computing Projects NSF Headquarters,

Similar presentations

Presentation on theme: "U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Mid-year Review of U.S. LHC Software and Computing Projects NSF Headquarters,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Mid-year Review of U.S. LHC Software and Computing Projects NSF Headquarters,

Similar presentations

Presentation on theme: "U.S. ATLAS Computing Facilities Bruce G. Gibbard Brookhaven National Laboratory Mid-year Review of U.S. LHC Software and Computing Projects NSF Headquarters,"— Presentation transcript:

Similar presentations

About project

Feedback