ATLAS Distributed Computing in LHC Run2

ATLAS Distributed Computing in LHC Run2
Simone Campana – CERN on behalf of the ATLAS collaboration Simone Campana – CHEP2015 14/04/2015

e.g. tracking, calorimeters
The Run-2 Challenge Trigger rate: ~400 Hz Pile-up: ~20 Trigger rate: ~1k Hz Pile-up: ~40 A new detector Resources constrained by “flat budget” (no increase in funding for computing) e.g. tracking, calorimeters Simone Campana – CHEP2015 14/04/2015

How to face the Run-2 challenge
New ATLAS distributed computing systems Rucio for Data Management Prodsys-2 for Workload Management FAX and Event Service to optimize resource usage More efficient utilization of resources Improvements in Simulation/Reconstruction Limit resource consumption (e.g. memory sharing in multicore) Optimize workflows (Derivation Framework/Analysis Model) Leveraging opportunistic resources additionally to pledged ones Grid, Cloud, HPC, Volunteer Computing New data lifecycle management model Simone Campana – CHEP2015 14/04/2015

New ATLAS distributed computing systems
Simone Campana – CHEP2015 14/04/2015

Distributed Data Management: Rucio
The new ATLAS Data Management system, Rucio[1], is in production since 1st December 2014 Rucio: a sophisticated system (offers more features than the previous one) Transferred Files vs time Transferred Volume vs time 1M files/day 2PB/week Already at early stage, equivalent performance as previous DDM in core functionalities Rucio deletion rate vs time DQ2 deletion rate vs time 5M files/day Most of Rucio potential (still unexplored) will be leveraged in production during Run-2 5M files/day Simone Campana – CHEP2015 14/04/2015

Remote data access: the Xrootd ATLAS Federation (FAX)
Goal reached ! ~100% data covered We deployed a Federate Storage Infrastructure: all data accessible from any location Increase resiliency against storage failures: FAILOVER Jobs can run at sites w/o data but with free CPUs: OVERFLOW (up to 10% of jobs) Simone Campana – CHEP2015 14/04/2015

Remote data access: FAX
FAX site reliability FAX failover (jobs/day) 1000 jobs/day recovered (1% of failures) FAX overflow CPU/WCT efficiency FAX overflow: job efficiency Local: 83% FAX: 76% Local: 84% FAX: 43% Simone Campana – CHEP2015 14/04/2015

Distributed Production and Analysis
We developed a new service for simulated and detector data processing: Prodsys-2[2] Prodsys-2 core components Request I/F: allows production managers to define a request DEFT: translates user request into task definitions JEDI: generates the job definitions PanDA: executes the jobs in the distributed infrastructure JEDI+PanDA will provide the new framework for Distributed Analysis Simone Campana – CHEP2015 14/04/2015

Prodsys-2 is in production since 1st December 2014 JEDI is in use for analysis since 8th August 2014
Prodsys-2 and JEDI offer an extremely large set of improvements # cores of running jobs vs time Built-in file merging capability Dynamic job definition optimizing resource scheduling Automated recovery of lost data Advanced task management interface New monitoring 150k Migration to JEDI Completed analysis jobs vs time 01/05/14 01/08/14 01/12/14 01/05/15 Prodsys-1 + PanDA Prodsys-1 + JEDI PanDA Prodsys-2 + JEDI PanDA 01/07/14 30/08/14 Simone Campana – CHEP2015 14/04/2015

More efficient utilization of resources
Simone Campana – CHEP2015 14/04/2015

Simulation Simulation is CPU intensive Integrated Simulation Framework
Mixing of full GEANT & fast simulation within an event Work in progress, target is 2016 More events per 12h job, larger output files, less transfers/merging, less I/O Or shorter, more granular jobs for opportunistic resources Simone Campana – CHEP2015 14/04/2015

Reconstruction Reconstruction is memory eager and requires non negligible CPU (40% w.r.t. simulation, 20% of ATLAS CPU usage) Athena memory Profile AthenaMP[3]: multi-processing reduces the memory footprint MP Serial 2GB/core Running Jobs Reconstruction Time (s/event) Single Core Time (a.u.) Multi Core Code and algorithms optimization largely reduced CPU needs in reconstruction[4] Simone Campana – CHEP2015 14/04/2015

Analysis Model Common analysis data format: xAOD
replacement of AOD & group ntuple of any kind Readable both by Athena & ROOT Data reduction framework[5] Athena to produce group derived data sample (DxAOD) Centrally via Prodsys Based on train model one input, N outputs from PB to TB Simone Campana – CHEP2015 14/04/2015

Leveraging opportunistic resources
Almost 50% of ATLAS production at peak rate relies on opportunistic resources # of cores for ATLAS running jobs 200k Efficient utilization of the largest variety of opportunistic resources is vital for ATLAS pledge 100k Enabling utilization of non-Grid resources is a long term investment (beyond opportunistic use) 01/05/14 01/03/15 Simone Campana – CHEP2015 14/04/2015

(Opportunistic) Cloud Resources
We invested a lot of effort in enabling usage of Cloud resources[6] The ATLAS HLT farm at the CERN ATLAS pit (P1) for example was instrumented with a Cloud interface in order to run simulation: #events vs time T2s 20M events/day T1s 4 days sum CERN P1 (approx 5%) P1 07/09/14 04/10/14 The HLT farm was dynamically reconfigured to run reconstruction on multicore resources We expect to be able to do the same with other clouds Simone Campana – CHEP2015 14/04/2015

HPCs High Performance Computers were designed for massively parallel applications (different from HEP use case) but we can parasitically benefit from empty cycles that others can not use (e.g. single core job slots) The ATLAS production system has been extended to leverage HPC resources[8] Running jobs vs time 24h test at Oak Ridge Titan system (#2 world HPC machine, 299,008 cores). ATLAS event generation: 200,000 CPU hours on 90K parallel cores (equivalent of 70% of our Grid resources) EVNT,SIMUL,RECO jobs @ MPPMU, LRZ and CSCS Average 1,700 running jobs Sherpa Generation using nodes with 8 threads per node, so 97,952 parallel Sherpa processes. 08/09/14 05/10/14 The goal is to validate as many workflows as possible. Today approximately 5% of ATLAS production runs on HPCs Simone Campana – CHEP2015 14/04/2015

Enabling users laptops and desktops to run ATLAS simulation[9]
Volunteer Computing Enabling users laptops and desktops to run ATLAS simulation[9] # running jobs vs time # users/hosts vs time 4k 16k 14/04/14 09/03/15 06/02/15 02/03/15 Simone Campana – CHEP2015 14/04/2015

Event Service Efficient utilization of opportunistic resources implies short payloads (get out quickly from the resources if the owner needs it) We developed a system to deliver payloads as short as the single event: the Event Service[10]. The Event Service will be commissioned during 2015 Simone Campana – CHEP2015 14/04/2015

Event Service Schematic
Event IDs Event requester Fine grained dispatcher intelligently manages… Event level bookeeping Event dispatcher …requests every few min per node… Event list Event data Event data fetch …assigned events are efficiently fetched, local or WAN… Async data cache Data repositories Event data service …buffered asynchronously… Event loop Parallel payload …processed free of fetch latency… Output files Worker Out Worker Out Merge …outputs uploaded in ~real time… Object store Output events Output stager …and merged on job complete. Simone Campana – CHEP2015 14/04/2015

New data lifecycle management model a. k. a
New data lifecycle management model a.k.a. “you can get unpledged CPU but not so much unpledged disk” Simone Campana – CHEP2015 14/04/2015

Dynamic Data Replication and Reduction
Data Popularity Dynamic Replication Dynamic Reduction Cache Pinned Simone Campana – CHEP2015 14/04/2015

8 PB of data on disk never been touched
18 months ago … Disk occupancy at T1s vs time 23PB on disk, created in the last 3 months and never accessed Primary (pinned) Default (pinned) 8 PB of data on disk never been touched T1 dynamically managed space (green) is unacceptably small It compromises our strategy of dynamic replication and cleaning of popular/unpopular data Large fraction of primary space is occupied by old and unused data Simone Campana – CHEP2015 14/04/2015

The new data lifecycle model
Every dataset has a lifetime set at creation The lifetime can be infinite (e.g. RAW data) The lifetime can be extended if the dataset is accessed Datasets with expired lifetime can disappear at any time from disk and tape. ATLAS Distributed Computing flexibly manages data replication and reduction, within the boundaries of lifetime and retention Increase/reduce the number of copies based on data popularity Re-distribute data at T2s rather than T1s and viceversa Move data to tape and free up disk space Simone Campana – CHEP2015 14/04/2015

Implications of the model
We will use more tapes Access to tape remains “centralized” For the first time we will “delete” tapes In the steady flow we will approximately delete as much as we will write Access through storage backdoors is today not accounted We will improve this, but most people use official tools (PanDA/Rucio) Simone Campana – CHEP2015 14/04/2015

After the first (partial) run of the model …
T1 tape occupancy vs time Number of dataset accesses T1 disk occupancy vs time pinned 1.2 PB never accessed, older than 1 year. It was 8 PB before cached Simone Campana – CHEP2015 14/04/2015

Conclusions A lot of hard work went in preparing the ATLAS Software and Computing for Run-2 A balanced mixture between evolution and revolution Commissioning of new systems was carried on in non disruptive manner Our systems are ready for the new challenges Still we have not yet explored many new capabilities Simone Campana – CHEP2015 14/04/2015

References to relevant ATLAS contributions
[1] CHEP ID The ATLAS Data Management system - Rucio: commissioning, migration and operational experiences (Vincent Garonne) [2] CHEP ID Scaling up ATLAS production system for the LHC Run 2 and beyond: project ProdSys2 (Alexei Klimentov) [3] CHEP ID Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP) (Vakhtang Tsulaia) [4] CHEP ID Preparing ATLAS Reconstruction for LHC Run 2 (Jovan Mitrevski) [5] CHEP ID New Petabyte-scale Data Derivation Framework for ATLAS (James Catmore) [6] CHEP ID Evolution of Cloud Computing in ATLAS (Ryan Taylor) Simone Campana – CHEP2015 14/04/2015

References to relevant ATLAS contributions
[7] CHEP ID Design, Results, Evolution and Status of the ATLAS simulation in Point1 project (Franco Brasolin) [8] CHEP ID 92 - ATLAS computing on the HPC Piz Daint machine (Michael Arthur Hostettler [8] CHEP ID Bringing ATLAS production to HPC resources - A use case with the Hydra supercomputer of the Max Planck Society (Luca Mazzaferro) [8] CHEP ID Integration of PanDA workload management system with Titan supercomputer at OLCF (Sergey Panitkin) [8] CHEP ID Fine grained event processing on HPCs with the ATLAS Yoda system (Vakhatang Tsulaia) [9] CHEP ID Harnessing Volunteer Computing for HEP (David Cameron) [10] CHEP ID The ATLAS Event Service: A new approach to event processing (Torre Wenaus) Simone Campana – CHEP2015 14/04/2015

ATLAS Distributed Computing in LHC Run2

Similar presentations

Presentation on theme: "ATLAS Distributed Computing in LHC Run2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ATLAS Distributed Computing in LHC Run2

Similar presentations

Presentation on theme: "ATLAS Distributed Computing in LHC Run2"— Presentation transcript:

Similar presentations

About project

Feedback