Review of the WLCG experiments compute plans

Slides:



Advertisements
Similar presentations
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Advertisements

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
1 The Adoption of Cloud Technology within the LHC Experiments Laurence Field IT/SDC 17/10/2014.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
– Past, Present, Future Volunteer Computing at CERN Helge Meinhard, Nils Høimyr / CERN for the CERN BOINC service team H. Meinhard et al. - Volunteer.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Accounting John Gordon WLC Workshop 2016, Lisbon.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
CEPC software & computing study group report
Dynamic Extension of the INFN Tier-1 on external resources
Grid and Cloud Computing
WLCG IPv6 deployment strategy
Status of WLCG FCPPL project
Laurence Field IT/SDC Cloud Activity Coordination
Organizations Are Embracing New Opportunities
ALICE & Clouds GDB Meeting 15/01/2013
Use of HLT farm and Clouds in ALICE
Volunteer Computing for Science Gateways
AWS Integration in Distributed Computing
Sviluppi in ambito WLCG Highlights
Virtualization and Clouds ATLAS position
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Dag Toppe Larsen UiB/CERN CERN,
Computing models, facilities, distributed computing
Dag Toppe Larsen UiB/CERN CERN,
Design rationale and status of the org.glite.overlay component
The “Understanding Performance!” team in CERN IT
Methodology: Aspects: cost models, modelling of system, understanding of behaviour & performance, technology evolution, prototyping  Develop prototypes.
ATLAS Cloud Operations
WLCG Manchester Report
INFN Computing infrastructure - Workload management at the Tier-1
PanDA setup at ORNL Sergey Panitkin, Alexei Klimentov BNL
Scientific Computing Strategy (for HEP)
DIRAC services.
How to enable computing
WLCG: TDR for HL-LHC Ian Bird LHCC Referees’ meting CERN, 9th May 2017.
Grid Computing.
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
Univ. of Texas at Arlington BigPanDA Workshop, ORNL
R&D for HL-LHC from the CWP
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
WLCG Collaboration Workshop;
Cloud Computing R&D Proposal
New strategies of the LHC experiments to meet
This work is partially supported by projects InterExcellence (LTT17018), Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1.
LHC Data Analysis using a worldwide computing grid
Ivan Reid (Brunel University London/CMS)
Backfilling the Grid with Containerized BOINC in the ATLAS computing
Exploit the massive Volunteer Computing resource for HEP computation
Workflow and HPC erhtjhtyhy Doug Benjamin Argonne National Lab.
EGI High-Throughput Compute
Presentation transcript:

Review of the WLCG experiments compute plans Simone Campana (CERN) Simone Campana – WLCG workshop 19/06/17

Evolution of the WLCG infrastructure Evolution in the direction of Network centric model Consolidation of storage Diversification of facilities … WLCG at HL-LHC … diversification of compute resources No need to wait 2026 for this WLCG at HL-LHC (I. Fisk’s representation) Simone Campana – WLCG workshop 19/06/17

Compute Resources and experiments Computing Model Diversification of Compute Resources Provisioning interface: Grid CE, cloud web service, login to batch head-node, none … Resource availability: pledged vs opportunistic, flat vs elastic, … Resource retention: from long living resources to highly volatile Diversification of Computing Models LHC experiments are different. More obvious for Alice and LHCb (special physics focus) Also true for ATLAS and CMS (different detector layouts and different sociologies) This has an impact on the computing models, and motivates different choices and different focuses Simone Campana – WLCG workshop 19/06/17

Simone Campana – WLCG workshop Job Requirements LHC experiments run a mix of single and multi core jobs Generally the jobs require largely less than 2GB of memory but there are special cases (e.g. Heavy Ions and Upgrade samples) requiring more Flexibility to provision what you need at low/no extra cost will be increasingly important A challenge for site administrators and for the batch systems Do we have the right (modern) tools? http://cern.ch/go/jx9S Simone Campana – WLCG workshop 19/06/17

Simone Campana – WLCG workshop Containers A lightweight technology to provide isolation, with many benefits for experiments and sites. E.g. : Payload isolation from pilot credentials SL6 environment on other distributions A lot of (coherent) interest from the experiments Focus on Singularity for WN environment This workshop is a great opportunity to discuss and push this further Simone Campana – WLCG workshop 19/06/17

Simone Campana – WLCG workshop Clouds LHC experiments leverage cloud resources for data processing. Two options: The experiment workflow management system instantiates VMs (generally through Condor) and the VMs join an experiment Condor pool to which pilots are submitted A grid site instantiates VMs and the VMs join the site batch system Effectively the same thing, but in (2) the facility administers the WNs while in (1) the experiment operates the WNs I prefer (2) Condor generalizes the provisioning and access to cloud processing units and proved to be reliable. A standard de facto. Simone Campana – WLCG workshop 19/06/17

Simone Campana – WLCG workshop Elastic Resources Different interests due to different needs ATLAS needs large CPU resources for relatively low priority G4 simulation (Calorimeters) => High priority tasks pushed through processing shares with flat provisioning CMS G4 simulation is faster, so a big interest in provisioning for peaks for bursty activities CMS: Provisioning for Peak ATLAS: flat provisioning Simone Campana – WLCG workshop 19/06/17

Simone Campana – WLCG workshop Elastic Resources All experiments started integrating cloud resources for simulation But the goal and real advantage consist in leveraging cloud flexibility for all use cases. Including the challenging ones High memory, many cores, long jobs, .. CMS run the full production chain on commercial clouds, carefully considering also the economic model Simone Campana – WLCG workshop 19/06/17

Simone Campana – WLCG workshop HPCs Rather complicated resources to exploit … The main challenge comes from diversity (e.g. site policies and CPU architectures) HPCs are built for a use case not very suitable for our embarrassingly parallel use case … but potentially a large number. So experiments invest on integrating them Invest in common tools. Software distribution, data handling, edge services. Try to negotiate consensually policies Simone Campana – WLCG workshop 19/06/17

Rationalize provisioning layer In ATLAS, the strong need to rationalize resource provisioning at HPCs initiated Harvester project An edge service, creating a communication channel between resources and WMS (PanDA) A plugin-based architecture allows to leverage many of the functionalities for Grids and Clouds as well Simone Campana – WLCG workshop 19/06/17

Simone Campana – WLCG workshop Volatile resources Considerable processing capacity can be exploited for short periods of time (or with reduced QoS) HLT farms between fills, Spot Market on Clouds, Backfill in HPCs, … In many cases, for such resources one can expect pre-emption at any time Could be softer (SIGTERM followed by SIGKILL) or harder (SIGKILL with no merci nor regret) Several solutions on the table, with different advantages and different challenges Machine Job Feature to size at runtime the workload depending on what the resource can offer Reduce the data processing granularity in a check-pointable way to one(few) event(s) Ad-hoc short jobs + extra care in retry, monitoring and allarming Simone Campana – WLCG workshop 19/06/17

Lightweight sites: Vac and @HOME Vac - clusters of autonomous hypervisors manage VMs In production at many UK sites, VMs in production for ALICE, ATLAS, LHCb, and VOs of the GridPP DIRAC service Major component for LHCb integrating Clouds, HLT, Generic VMCondor can connect VMs to an HTCondor pool Report usage to APEL as “virtual” CEs LHC@HOME - interesting outreach project Provisions non negligible processing capacity (1-2% in ATLAS) and a technology for lightweight sites Based on Boinc and various solutions for data handling Simone Campana – WLCG workshop 19/06/17