Simone Campana CERN-IT

Slides:

Advertisements

Similar presentations

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

Update on replica management

CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.

Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.

EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.

ATLAS Distributed Computing in LHC Run2

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

Bringing Dynamism to OPNFV

Review of the WLCG experiments compute plans

Data Formats and Impact on Federated Access

Computing Operations Roadmap

Ian Bird WLCG Workshop San Francisco, 8th October 2016

Virtualization and Clouds ATLAS position

Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017

Computing models, facilities, distributed computing

SuperB and its computing requirements

evoluzione modello per Run3 LHC

Data Challenge with the Grid in ATLAS

Fine grained processing with an Event Service

The Data Lifetime model

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

PanDA in a Federated Environment

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Readiness of ATLAS Computing - A personal view

The ADC Operations Story

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

ATLAS Sites Jamboree, CERN January, 2017

Artem Trunov and EKP team EPK – Uni Karlsruhe

Simulation use cases for T2 in ALICE

R&D for HL-LHC from the CWP

Ákos Frohner EGEE'08 September 2008

ALICE Computing Model in Run3

ALICE Computing Upgrade Predrag Buncic

CPU efficiency Since May CMS O+C has launched a dedicated task force to investigate CMS CPU efficiency We feel the focus is on CPU efficiency is because.

New strategies of the LHC experiments to meet

Grid Canada Testbed using HEP applications

This work is partially supported by projects InterExcellence (LTT17018), Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1.

ExaO: Software Defined Data Distribution for Exascale Sciences

ATLAS DC2 & Continuous production

The ATLAS Computing Model

Exploring Multi-Core on

Presentation transcript:

Simone Campana CERN-IT Run-2 Computing Model Simone Campana CERN-IT An asterisk (*) in this presentation indicates that the item will be covered in details in a jamboree session Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Intro: the challenges of Run2 LHC operation Trigger rate 1 kHz (~400) Pile-up up above 30 (~20) 25 ns bunch spacing (~50) Centre-of-mass energy x ~2 Different detector Constraints of ‘flat budget’ Limited increase of resources And we still have data from Run1 Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14 2

How to face the challenge New ATLAS distributed computing systems Rucio for Data Management Prodsys-2 for Workload Management FAX and Event Service to optimize resource usage More efficient utilization of resources More flexibility in the computing model (Clouds/Tiers) Limit avoidable resource consumption (multicore) Optimize workflows (Derivation Framework/Analysis Model) Leveraging opportunistic resources Grid, Cloud, HPC New data lifecycle management model Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

New ATLAS distributed computing systems Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Distributed Data Management: Rucio Rucio builds on data management learning from Run-1 Space optimization and fragmentation of space tokens Rucio implements multiple ownerships for files and logical quota We will be able to gradually eliminate many space tokens (*) Integrating “new” technologies and protocols With Rucio we can use other protocols other than SRM for data transfers and storage management (*) Better support for metadata Gradually introduced in the next months Rucio is in production since Monday Dec. 1st Leverage new Rucio features once comfortable with core functionalities Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Remote data access: FAX Goal reached ! >96% data covered We deployed a Federate Storage Infrastructure (*): all data accessible from any location Analysis (and production) will be able to access remote (offsite) files Jobs can run at sites w/o data but with free CPUs. We call this “overflow”. Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Workload Management: Prodsys-2 Prodsys-2 relies on the JEDI/PanDA core Same engine for analysis and production Allows to optimize resource scheduling (*) MCORE vs SCORE, Analysis vs Production, HIGH vs LOW mem Minimizes data traffic Merging at T2s New monitoring system Integrated with Rucio Prodsys-2 is in production since Mon. Dec 1st JEDI already in production since summer Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Event Service (*) Efficient utilization of opportunistic resources implies short payloads (get out quickly from the resources if the owner needs it) We developed a system to deliver payloads as short as the single event: the Event Service. Based on core components such as: A new ‘JEDI’ extension to PanDA allows it to manage fine grained workloads The new parallel framework athenaMP brings multi/many-core concurrency to ATLAS processing, can manage independent streams of events in parallel Newly available object stores provide highly scalable cloud storage for small event-scale outputs Usage is surely to be extended beyond opportunistic resources. Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

More efficient utilization of resources Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Flexible utilization of resources (*) Run-1 model defines rigidly T0/T1/T2 roles. We need more flexibility Examples: Different kind of jobs (e.g. Reco) can run at various sites regardless the tier level Custodial copies of data can be hosted at various sites regardless the tier level T0 will be able to spill over into the Grid in case of resources shortage at CERN We will use AthenaMP for production and Athena/ROOT for analysis: need flexibly to use both multi and single core resources. Some T2s are equivalent to T1s in term of disk storage & CPU power In general, today sites are connected by fast and reliable networks. Use it. Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Running AthenaMP on the grid MultiCore resources scheduling (*) is not an easy tasks Statically allocating resources for multicore jobs is not what sites want To limit inefficiencies, dynamic allocation needs a steady flow of long multicore jobs a steady flow of short single core jobs Target would be Much of the production on multicore Analysis on single core Need to work in this direction and get on board all sites We (naively?) expect no loss of resources because we try to allocate them in 8 cores slots Today in SCORE we get 30%+ more resources than in MCORE Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Simone Campana – ATLAS Computing Jamboree, December 2014 Analysis Model Common analysis data format: xAOD replacement of AOD & group ntuple of any kind Readable both by Athena & ROOT Data reduction framework Athena to produce group derived data sample (DxAOD) Centrally via Prodsys Based on train model one input, N outputs from PB to TB Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14 12

LOCALGROUPDISK (same size as today) GROUPDISK (5% of today’s size) DATADISK DATATAPE Size DxAOD = 1% size xAOD DxAOD User Analysis PanDA/JEDI xAOD DxAOD User Analysis PanDA/JEDI Derivation Framework (Prodsys-2) DxAOD Group Analysis PanDA/JEDI 100 derivations DxAOD Group Analysis PanDA/JEDI DATADISK GROUPDISK (5% of today’s size) Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Leveraging opportunistic resources Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

(Opportunistic) Cloud Resources We invested a lot of effort in enabling usage of Cloud resources The HLT farm for example was has been instrumented with a Cloud interface in order to run simulation: Sim@P1 20M events/day 4 days sum CERN-P1 (approx 5%) CERN-P1 The HLT farm was dynamically reconfigured to run reconstruction on multicore resources (Reco@P1?). We expect to be able to do the same with other clouds Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

The goal is to validate as many workflows as possible HPCs High Performance Computers were designed for massively parallel applications (different from HEP use case) but we can parasitically benefit from empty cycles that others can not use (e.g. single core job slots) The ATLAS production system has been extended to leverage HPC resources 24h test at Oak Ridge Titan system (#2 world HPC machine, 299,008 cores). ATLAS event generation: 200,000 CPU hours on 90K parallel cores (equivalent of 70% of our Grid resources) EVNT,SIMUL,RECO jobs @ MPPMU, LRZ and CSCS Average 1,700 running jobs Mira@ARGONNE: Sherpa Generation using 12244 nodes with 8 threads per node, so 97,952 parallel Sherpa processes. The goal is to validate as many workflows as possible Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

New data lifecycle management model Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Space management crisis Disk occupancy at T1s 23PB on disk, created in the last 3 months and never accessed Primary (pinned) Default (pinned) 8 PB of data on disk never been touched T1 dynamically managed space (green) is unacceptably small It compromises our strategy of dynamic replication and cleaning of popular/unpopular data A lot of the primary space is occupied by old and unused data Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

The new data lifecycle model Every dataset will have a lifetime set at creation The lifetime can be infinite (e.g. RAW data) The lifetime can be extended E.g. if the dataset is recently accessed. Or if there is a known exception Every dataset will have a retention policy E.g. RAW need at least 2 copies on tape. Need at least one copy of AODs on tape. Lifetime being agreed with ATLAS Computing Resources management Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Effect of the data lifecycle model Datasets with expired lifetime can disappear at any time from (data)disk and datatape groupdisk and localgroupdisk exempt “Organized” expiration lists will be distributed to groups ATLAS Distributed Computing will flexibly manage data replication and reduction Within the boundaries of lifetime and retention For example Increase/reduce the number of copies based on data popularity Re-distribute data at T2s rather than T1s and viceversa Move data to tape and free up disk space Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Cautious Implementation – First Dry Run (TB) Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Cautious Implementation – First Dry Run X X Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Cautious Implementation – First Dry Run X X T1 Disk: delete 3 of 18 PB Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Cautious Implementation – First Dry Run X X T1 Tape: delete 10 of 26 PB Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Cautious Implementation – First Dry Run X X Months Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Cautious Implementation – First Dry Run X X Months T1 Disk: delete 0.3 of 17 PB Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Simone Campana – ATLAS Computing Jamboree, December 2014 Further Implications We will use more tapes (*) Both in terms of volume and number of accesses Access to tape remains “centralized” Through PanDA + Rucio For the first time we will “delete” tapes We should discuss how to do this efficiently In the steady flow, we will approximately delete as much as we will write From both disk and tape. How to do this efficiently? Access through storage backdoors is today not accounted We will improve this, but watch out the deletion lists! And preferably use official tools (PanDA/Rucio) Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Impact: staging from tape Data staged per week (TB) What happens if we remove all “unused” data from disk and keep it on tape? “unused” here = not accessed in 9 months 15TB Simulation based on last year’s data access Tape access from Reconstruction and Reprocessing in 2014 750TB We would have to restage from tape 20TB/week, compare with 1PB/week for reco/repro (2% increase). In terms of number of files, it is a 10% increase Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14

Simone Campana – ATLAS Computing Jamboree, December 2014 Conclusions For many topics, I just gave an introduction More discussions should follow in the next sessions/days Simone Campana – ATLAS Computing Jamboree, December 2014 03/12/14