U.S. ATLAS Tier-1 Site Report Michael Ernst U.S. ATLAS Facilities Workshop – March 23, 2015 1.

Slides:



Advertisements
Similar presentations
Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
1 October 2013 APF Summary Oct 2013 John Hover John Hover.
An Approach to Secure Cloud Computing Architectures By Y. Serge Joseph FAU security Group February 24th, 2011.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
 Formed in the mid-1990’s to provide centralized computing resources for the four RHIC experiments (BRAHMS, PHOBOS, STAR, PHENIX)  Role was expanded.
Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
No Idle Cores DYNAMIC LOAD BALANCING IN ATLAS & OTHER NEWS WILLIAM STRECKER-KELLOGG.
Ian Alderman A Little History…
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
UMD TIER-3 EXPERIENCES Malina Kirn October 23, 2008 UMD T3 experiences 1.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Michael Ernst U.S. ATLAS Tier-1 Network Status Evolution of LHC Networking – February 10,
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
1 11 March 2013 John Hover Brookhaven Laboratory Cloud Activities Update John Hover, Jose Caballero US ATLAS T2/3 Workshop Indianapolis, Indiana.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Review of the WLCG experiments compute plans
Virtualization and Clouds ATLAS position
ATLAS Cloud Operations
INFN Computing infrastructure - Workload management at the Tier-1
Provisioning 160,000 cores with HEPCloud at SC17
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
ATLAS Sites Jamboree, CERN January, 2017
WLCG Collaboration Workshop;
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Cloud Computing R&D Proposal
Internet and Web Simple client-server model
Specialized Cloud Architectures
Presentation transcript:

U.S. ATLAS Tier-1 Site Report Michael Ernst U.S. ATLAS Facilities Workshop – March 23,

2 Capacity planning based on Pledges (23% author share) + 20% for US Physicists FY15 Equipment Deployment – Equipment (i.e. CPU, disk, central servers) replenished after 4-5 years of operation Tier-1 High Value Equipment Deployment Plan

Tier-1 Middleware Deployment (1/2) Below Middleware services are running on VMs 8 CEs, two of them accept both OSG and ATLAS jobs, the other six are dedicated to ATLAS jobs: – gridgk01 (BNL_ATLAS_1) is GRAM CE, for all OSG VOs, OSG release – gridgk02 (BNL_ATLAS_2) is HTCondor CE, for all OSG VOs, OSG release – gridgk03 (BNL_ATLAS_3) is GRAM CE, OSG release – gridgk04 (BNL_ATLAS_4) is GRAM CE, OSG release – gridgk05 (BNL_ATLAS_5) is GRAM CE, OSG release – gridgk06 (BNL_ATLAS_6) is GRAM CE, OSG release – gridgk07 (BNL_ATLAS_7) is HTCondor CE, OSG release – gridgk08 (BNL_ATLAS_8) is GRAM CE, OSG release

Tier-1 Middleware Deployment (2/2) we have two GUMS servers gums.racf.bnl.gov, GUMS gumsdev.racf.bnl.gov, GUMS They are configured identical, although one is production and the other is dev. we have one RSV monitoring host that runs RSV probes against all CEs and the dCache SE. services02.usatlas.bnl.gov, OSG redundant Condor-G submit hosts for ATLAS APF 4

5

6

7

Previous, Ongoing and New ATLAS cloud efforts/activities John Hover Previous/Ongoing Some smaller-scale commercial cloud R&D (Amazon, Google) performed at BNL and CERN Currently running at medium scale (1000 jobs) on ~10 academic clouds using University of Victoria Cloud Scheduler. Heavy investment/conversion to Openstack at CERN, majority of Central Services now running on Agile Infrastructure Openstack cluster at Brookhaven National Lab (720 cores) New Collaboration with AWS and ESnet on large-scale exploration of AWS resources for ATLAS Production and Analysis (~30k concurrent jobs in Nov) Aiming at scaling up to ~100k concurrent jobs running in 3 AWS regions, 100G AWS  ESnet Network Peerings Use of S3 Storage AWS-internally and between AWS and ATLAS object Storage Instances

BeStMan SE Distributed among 3 EC2 VMs 1 SRM 2 GridFTP server Simple Storage Service (S3) Amazon Web Services ATLAS PRODUCTON DATA DISTRIBUTION SERVICES SRM/GridFTP protocol data transmission S3 / HTTP(S) direct access via FTS or APIs S3/ HTTP(S) via S3FS S3fs (Fuse based file system) 3 buckets mapped into 3 mount points per VM / region: ATLASUSERDISK ATLASPRODDISK ATLASDATADISK Example us-east-1 region

“Impedance mismatch” between commercial and scientific computing

ATLAS Autonomous Multicore Provisioning Dynamic Allocation with Condor William Strecker-Kellogg

ATLAS Tree Structure Use hierarchical group-quotas in Condor – Leaf-nodes in the hierarchy get jobs submitted to them and correspond 1:1 with panda-queues – Surplus resources from underutilized queues are automatically allocated to other, busier queues Quotas determine steady-state allocation when all queues are busy – Quota of parent groups are the sum of their children’s quotas (see next slide for diagram)

ATLAS Tree Structure atlas (12000) analysis (2000) prod (10000) himem (1000) single (3500) mcore (5500) short (1000)long (1000) grid (40) (quota)

Surplus Sharing Surplus sharing is controlled by boolean accept_surplus flag on each queue – Quotas / surplus are normalized in units of CPUs Groups with flag can share with their siblings – Parent groups with flag allow surplus to “flow down” the tree from their siblings to their children – Parent groups without accept_surplus flag constrain surplus-sharing to among their children

Surplus Sharing Scenario: analysis has quota of 2000 and no accept_surplus; short and long have a quota of 1000 each and accept_surplus on – short=1600, long=400…possible – short=1500, long=700…impossible (violates analysis quota)

Partitionable Slots Each batch node is configured to be partitioned into arbitrary slices of CPUs – Condor terminology: Partitionable slots are automatically sliced into dynamic slots Multicore jobs are thus accommodated with no administrative effort – Farm is filled depth first (default is breadth first) to reduce fragmentation Only minimal (~1-2%) defragmentation necessary

Where’s the problem? Everything works perfectly with all single-core However… Multicore jobs will not be able to compete for surplus resources fairly – Negotiation is greedy, if 7 slots are free, they won’t match an 8-core job but will match 7 single-core jobs in the same cycle If any multicore queues compete for surplus with single core queues, the multicore will always lose A solution outside Condor is needed – Ultimate goal is to maximize farm utilization

Dynamic Allocation A script watches panda queues for demand – Queues that have few or no pending jobs are considered empty – Short spikes are smoothed out in demand calculation Script is aware of Condor’s group-structure – Builds tree dynamically from database This facilitates altering the group hierarchy with no rewriting of the script

Dynamic Allocation Script figures out which queues are able to accept_surplus – Based on comparing “weight” of queues Weight defined as size of job in queue (# cores) – Able to cope with any combination of demands – Prevents starvation by allowing surplus into “heaviest” queues first Avoids both single-core and multicore queues competing for the same resources – Can shift balance between entire sub-trees in hierarchy (e.g. analysis production)

Results

Dips in serial production jobs (magenta) are filled in by multicore jobs (pink) – Some inefficiency remains due to fragmentation There is some irreducible average wait-time for 8 cores on a single machine to become free Results look promising, will even allow opportunistic workload to backfill if all ATLAS queues drain – Currently impossible as Condor doesn’t support preemption of dynamic slots… the Condor team is close to providing a solution

OSG Opportunistic Usage at the Tier-1 Center Simone Campana - ATLAS SW&C Week 23 Bo Jayatilaka

OSG Opportunistic Usage at the Tier-1 Center Simone Campana - ATLAS SW&C Week 24

Tier-1 Production Network Connectivity BNL connected to ESnet at 200G Have Tier-1 facility connected to ESnet at 200G via BNL Science DMZ 30G OPN production circuit 10G OPN backup circuit 40G General IP Services 100G LHCONE production circuit All circuits can “burst” to the maximum of 200G, depending on available b/w

OPN R&E + Virtual Circuits LHCONE ATLAS Software and Computing Week - October 24, Gbps 3.5 Gbps BNL WAN Connectivity in 2013

BNL Perimeter and Science DMZ 27

Current Implementation

12PB WNs

LAN connectivity WN  T1 Disk Storage 30 All WN connected at 1 Gbps Typical bandwidth 10–20 MB/s, Peak at 50 MB/s Analysis queues configured for Direct read

BNL in ATLAS Distributed Data Management (Apr – Dec 2012) Data Transfer activities between BNL and other ATLAS Sites PB Monthly average transfer rate up to 800 MB/s; daily peaks have been observed 5 times higher BNL to ATLAS T1s & T2s 2PB BNL in navy blue ATLAS Distributed Data Management Dashboard MB/s Data Export Data Import

CERN/T1 -> BNL Transfer Performance Regular ATLAS Production + Test Traffic Observations (all in the context of ATLAS) – Never exceeded ~50 Gbits/sec – CERN (ATLAS EOS) -> BNL limited at ~1.5 GB/s Achieved >60 Gbits/s between 2 CERN and BNL

FTS 3 Service at BNL 33

From BNL to T1s 34

35

From T1s to BNL 36

From T1s and T2s to BNL in