ATLAS Distributed Computing in LHC Run2

Slides:

Advertisements

Similar presentations

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Advertisements

MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

David Cameron Riccardo Bianchi Claire Adam Bourdarios Andrej Filipcic Eric Lançon Efrat Tal Hod Wenjing Wu on behalf of the ATLAS Collaboration CHEP 15,

Requirements for a Next Generation Framework: ATLAS Experience S. Kama, J. Baines, T. Bold, P. Calafiura, W. Lampl, C. Leggett, D. Malon, G. Stewart, B.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.

9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

Event Service Intro, plans, issues and objectives for today Torre Wenaus BNL US ATLAS S&C/PS Workshop Aug 21, 2014.

Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

Future of Distributed Production in US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13,

ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko

16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 

Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

Review of the WLCG experiments compute plans

Data Formats and Impact on Federated Access

Status of WLCG FCPPL project

Ian Bird WLCG Workshop San Francisco, 8th October 2016

Virtualization and Clouds ATLAS position

Computing models, facilities, distributed computing

Cluster Optimisation using Cgroups

Simone Campana CERN-IT

SuperB and its computing requirements

Future of WAN Access in ATLAS

Multicore Computing in ATLAS

for the Offline and Computing groups

Fine grained processing with an Event Service

The Data Lifetime model

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Readiness of ATLAS Computing - A personal view

The ADC Operations Story

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Thoughts on Computing Upgrade Activities

Simulation use cases for T2 in ALICE

ALICE Computing Model in Run3

ALICE Computing Upgrade Predrag Buncic

This work is partially supported by projects InterExcellence (LTT17018), Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1.

Exploring Multi-Core on

Workflow and HPC erhtjhtyhy Doug Benjamin Argonne National Lab.

Presentation transcript:

ATLAS Distributed Computing in LHC Run2 Simone Campana – CERN on behalf of the ATLAS collaboration Simone Campana – CHEP2015 14/04/2015

e.g. tracking, calorimeters The Run-2 Challenge Trigger rate: ~400 Hz Pile-up: ~20 Trigger rate: ~1k Hz Pile-up: ~40 A new detector Resources constrained by “flat budget” (no increase in funding for computing) e.g. tracking, calorimeters Simone Campana – CHEP2015 14/04/2015

How to face the Run-2 challenge New ATLAS distributed computing systems Rucio for Data Management Prodsys-2 for Workload Management FAX and Event Service to optimize resource usage More efficient utilization of resources Improvements in Simulation/Reconstruction Limit resource consumption (e.g. memory sharing in multicore) Optimize workflows (Derivation Framework/Analysis Model) Leveraging opportunistic resources additionally to pledged ones Grid, Cloud, HPC, Volunteer Computing New data lifecycle management model Simone Campana – CHEP2015 14/04/2015

New ATLAS distributed computing systems Simone Campana – CHEP2015 14/04/2015

Distributed Data Management: Rucio The new ATLAS Data Management system, Rucio[1], is in production since 1st December 2014 Rucio: a sophisticated system (offers more features than the previous one) Transferred Files vs time Transferred Volume vs time 1M files/day 2PB/week Already at early stage, equivalent performance as previous DDM in core functionalities Rucio deletion rate vs time DQ2 deletion rate vs time 5M files/day Most of Rucio potential (still unexplored) will be leveraged in production during Run-2 5M files/day Simone Campana – CHEP2015 14/04/2015

Remote data access: the Xrootd ATLAS Federation (FAX) Goal reached ! ~100% data covered We deployed a Federate Storage Infrastructure: all data accessible from any location Increase resiliency against storage failures: FAILOVER Jobs can run at sites w/o data but with free CPUs: OVERFLOW (up to 10% of jobs) Simone Campana – CHEP2015 14/04/2015

Remote data access: FAX FAX site reliability FAX failover (jobs/day) 1000 jobs/day recovered (1% of failures) FAX overflow CPU/WCT efficiency FAX overflow: job efficiency Local: 83% FAX: 76% Local: 84% FAX: 43% Simone Campana – CHEP2015 14/04/2015

Distributed Production and Analysis We developed a new service for simulated and detector data processing: Prodsys-2[2] Prodsys-2 core components Request I/F: allows production managers to define a request DEFT: translates user request into task definitions JEDI: generates the job definitions PanDA: executes the jobs in the distributed infrastructure JEDI+PanDA will provide the new framework for Distributed Analysis Simone Campana – CHEP2015 14/04/2015

Prodsys-2 is in production since 1st December 2014 JEDI is in use for analysis since 8th August 2014 Prodsys-2 and JEDI offer an extremely large set of improvements # cores of running jobs vs time Built-in file merging capability Dynamic job definition optimizing resource scheduling Automated recovery of lost data Advanced task management interface New monitoring 150k Migration to JEDI Completed analysis jobs vs time 01/05/14 01/08/14 01/12/14 01/05/15 Prodsys-1 + PanDA Prodsys-1 + JEDI PanDA Prodsys-2 + JEDI PanDA 01/07/14 30/08/14 Simone Campana – CHEP2015 14/04/2015

More efficient utilization of resources Simone Campana – CHEP2015 14/04/2015

Simulation Simulation is CPU intensive Integrated Simulation Framework Mixing of full GEANT & fast simulation within an event Work in progress, target is 2016 More events per 12h job, larger output files, less transfers/merging, less I/O Or shorter, more granular jobs for opportunistic resources Simone Campana – CHEP2015 14/04/2015

Reconstruction Reconstruction is memory eager and requires non negligible CPU (40% w.r.t. simulation, 20% of ATLAS CPU usage) Athena memory Profile AthenaMP[3]: multi-processing reduces the memory footprint MP Serial 2GB/core Running Jobs Reconstruction Time (s/event) Single Core Time (a.u.) Multi Core Code and algorithms optimization largely reduced CPU needs in reconstruction[4] Simone Campana – CHEP2015 14/04/2015

Analysis Model Common analysis data format: xAOD replacement of AOD & group ntuple of any kind Readable both by Athena & ROOT Data reduction framework[5] Athena to produce group derived data sample (DxAOD) Centrally via Prodsys Based on train model one input, N outputs from PB to TB Simone Campana – CHEP2015 14/04/2015

Leveraging opportunistic resources Almost 50% of ATLAS production at peak rate relies on opportunistic resources # of cores for ATLAS running jobs 200k Efficient utilization of the largest variety of opportunistic resources is vital for ATLAS pledge 100k Enabling utilization of non-Grid resources is a long term investment (beyond opportunistic use) 01/05/14 01/03/15 Simone Campana – CHEP2015 14/04/2015

(Opportunistic) Cloud Resources We invested a lot of effort in enabling usage of Cloud resources[6] The ATLAS HLT farm at the CERN ATLAS pit (P1) for example was instrumented with a Cloud interface in order to run simulation: Sim@P1[7] #events vs time T2s 20M events/day T1s 4 days sum CERN P1 (approx 5%) P1 07/09/14 04/10/14 The HLT farm was dynamically reconfigured to run reconstruction on multicore resources (Reco@P1). We expect to be able to do the same with other clouds Simone Campana – CHEP2015 14/04/2015

HPCs High Performance Computers were designed for massively parallel applications (different from HEP use case) but we can parasitically benefit from empty cycles that others can not use (e.g. single core job slots) The ATLAS production system has been extended to leverage HPC resources[8] Running jobs vs time 24h test at Oak Ridge Titan system (#2 world HPC machine, 299,008 cores). ATLAS event generation: 200,000 CPU hours on 90K parallel cores (equivalent of 70% of our Grid resources) EVNT,SIMUL,RECO jobs @ MPPMU, LRZ and CSCS Average 1,700 running jobs Mira@ARGONNE: Sherpa Generation using 12244 nodes with 8 threads per node, so 97,952 parallel Sherpa processes. 08/09/14 05/10/14 The goal is to validate as many workflows as possible. Today approximately 5% of ATLAS production runs on HPCs Simone Campana – CHEP2015 14/04/2015

Enabling users laptops and desktops to run ATLAS simulation[9] Volunteer Computing Enabling users laptops and desktops to run ATLAS simulation[9] http://atlasathome.cern.ch/ # running jobs vs time # users/hosts vs time 4k 16k 14/04/14 09/03/15 06/02/15 02/03/15 Simone Campana – CHEP2015 14/04/2015

Event Service Efficient utilization of opportunistic resources implies short payloads (get out quickly from the resources if the owner needs it) We developed a system to deliver payloads as short as the single event: the Event Service[10]. The Event Service will be commissioned during 2015 Simone Campana – CHEP2015 14/04/2015

Event Service Schematic Event IDs Event requester Fine grained dispatcher intelligently manages… Event level bookeeping Event dispatcher …requests every few min per node… Event list Event data Event data fetch …assigned events are efficiently fetched, local or WAN… Async data cache Data repositories Event data service …buffered asynchronously… Event loop Parallel payload …processed free of fetch latency… Output files Worker Out Worker Out Merge …outputs uploaded in ~real time… Object store Output events Output stager …and merged on job complete. Simone Campana – CHEP2015 14/04/2015

New data lifecycle management model a. k. a New data lifecycle management model a.k.a. “you can get unpledged CPU but not so much unpledged disk” Simone Campana – CHEP2015 14/04/2015

Dynamic Data Replication and Reduction Data Popularity Dynamic Replication Dynamic Reduction Cache Pinned Simone Campana – CHEP2015 14/04/2015

8 PB of data on disk never been touched 18 months ago … Disk occupancy at T1s vs time 23PB on disk, created in the last 3 months and never accessed Primary (pinned) Default (pinned) 8 PB of data on disk never been touched T1 dynamically managed space (green) is unacceptably small It compromises our strategy of dynamic replication and cleaning of popular/unpopular data Large fraction of primary space is occupied by old and unused data Simone Campana – CHEP2015 14/04/2015

The new data lifecycle model Every dataset has a lifetime set at creation The lifetime can be infinite (e.g. RAW data) The lifetime can be extended if the dataset is accessed Datasets with expired lifetime can disappear at any time from disk and tape. ATLAS Distributed Computing flexibly manages data replication and reduction, within the boundaries of lifetime and retention Increase/reduce the number of copies based on data popularity Re-distribute data at T2s rather than T1s and viceversa Move data to tape and free up disk space Simone Campana – CHEP2015 14/04/2015

Implications of the model We will use more tapes Access to tape remains “centralized” For the first time we will “delete” tapes In the steady flow we will approximately delete as much as we will write Access through storage backdoors is today not accounted We will improve this, but most people use official tools (PanDA/Rucio) Simone Campana – CHEP2015 14/04/2015

After the first (partial) run of the model … T1 tape occupancy vs time Number of dataset accesses T1 disk occupancy vs time pinned 1.2 PB never accessed, older than 1 year. It was 8 PB before cached Simone Campana – CHEP2015 14/04/2015

Conclusions A lot of hard work went in preparing the ATLAS Software and Computing for Run-2 A balanced mixture between evolution and revolution Commissioning of new systems was carried on in non disruptive manner Our systems are ready for the new challenges Still we have not yet explored many new capabilities Simone Campana – CHEP2015 14/04/2015

References to relevant ATLAS contributions [1] CHEP ID 205 - The ATLAS Data Management system - Rucio: commissioning, migration and operational experiences (Vincent Garonne) [2] CHEP ID 100 - Scaling up ATLAS production system for the LHC Run 2 and beyond: project ProdSys2 (Alexei Klimentov) [3] CHEP ID 165 - Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP) (Vakhtang Tsulaia) [4] CHEP ID 147 - Preparing ATLAS Reconstruction for LHC Run 2 (Jovan Mitrevski) [5] CHEP ID 164 - New Petabyte-scale Data Derivation Framework for ATLAS (James Catmore) [6] CHEP ID 146 - Evolution of Cloud Computing in ATLAS (Ryan Taylor) Simone Campana – CHEP2015 14/04/2015

References to relevant ATLAS contributions [7] CHEP ID 169 - Design, Results, Evolution and Status of the ATLAS simulation in Point1 project (Franco Brasolin) [8] CHEP ID 92 - ATLAS computing on the HPC Piz Daint machine (Michael Arthur Hostettler [8] CHEP ID 153 - Bringing ATLAS production to HPC resources - A use case with the Hydra supercomputer of the Max Planck Society (Luca Mazzaferro) [8] CHEP ID 152 - Integration of PanDA workload management system with Titan supercomputer at OLCF (Sergey Panitkin) [8] CHEP ID 140 - Fine grained event processing on HPCs with the ATLAS Yoda system (Vakhatang Tsulaia) [9] CHEP ID 170 - ATLAS@Home: Harnessing Volunteer Computing for HEP (David Cameron) [10] CHEP ID 183 - The ATLAS Event Service: A new approach to event processing (Torre Wenaus) Simone Campana – CHEP2015 14/04/2015