Ian Bird, CERN 2 nd February 2016 2 Feb 2016

Slides:

Advertisements

Similar presentations

High-Performance Computing

Advertisements

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.

Ian Bird LCG Project Leader LHCC + C-RSG review. 2 Review of WLCG  To be held Feb 16 at CERN  LHCC Reviewers:  Amber Boehnlein  Chris.

DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

1 Towards an Upgrade TDR: LHCb Computing workshop May 2015 Introduction to Upgrade Computing Session Peter Clarke Many peoples ideas Vava, Conor,

Ian Bird WLCG Workshop Okinawa, 12 th April 2015.

Ian Bird WLCG Management Board CERN, 17 th February 2015.

Assessment of Core Services provided to USLHC by OSG.

GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.

Ian Bird LHCC Referees’ meeting; CERN, 11 th June 2013 March 6, 2013

LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Ian Bird LHCC Referee meeting 23 rd September 2014.

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.

The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.

Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.

ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.

Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013

Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.

CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM.

Ian Bird LHCC Referees’ meeting; CERN, 3 rd March 2015.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.

Ian Bird, WLCG MB; 27 th October 2015 October 27, 2015

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.

Ian Bird GDB CERN, 9 th September Sept 2015

Evolution of storage and data management Ian Bird GDB: 12 th May 2010.

Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.

Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.

LHC Computing, CERN, & Federated Identities

U.S. ATLAS Computing Facilities Overview Bruce G. Gibbard Brookhaven National Laboratory U.S. LHC Software and Computing Review Brookhaven National Laboratory.

Ian Bird WLCG Workshop, Barcelona, 9 th July July 2014 Ian Bird; Barcelona1.

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

Data management demonstrators Ian Bird; WLCG MB 18 th January 2011.

Ian Bird CERN, 17 th July 2013 July 17, 2013

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.

Ian Bird, CERN WLCG LHCC Referee Meeting 1 st December 2015 LHCC; 1st Dec 2015 Ian Bird; CERN1.

16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Ian Bird, CERN 1 st February Dec 2015

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB

Hall D Computing Facilities Ian Bird 16 March 2001.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

Evolution of storage and data management

Ian Bird WLCG Workshop San Francisco, 8th October 2016

Computing models, facilities, distributed computing

The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez

Scientific Computing Strategy (for HEP)

for the Offline and Computing groups

WLCG: TDR for HL-LHC Ian Bird LHCC Referees’ meting CERN, 9th May 2017.

ALICE, ATLAS, CMS & LHCb joint workshop on DAQ

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Thoughts on Computing Upgrade Activities

Vision for CERN IT Department

Strategy document for LHCC – Run 4

The latest developments in preparations of the LHC community for the computing challenges of the High Luminosity LHC Dagmar Adamova (NPI AS CR Prague/Rez)

New strategies of the LHC experiments to meet

Input on Sustainability

Presentation transcript:

Ian Bird, CERN 2 nd February Feb 2016

Introduction  Brainstorming on longer term – HL-LHC era  LHCC: agreed that by the final LHCC meeting of 2016 we would propose a concrete timetable for working towards an upgrade TDR for HL-LHC by ~2020.  May seem that this aims mainly at ATLAS and CMS, since the planning for LHCb and ALICE for the phase 1 upgrades are more advanced.  However, we should really aim to be as inclusive as possible and indeed to have an eye on the broader HEP and global science communities as we think about this. 2 Feb 2016

Assumptions and boundaries  The current guidance for Run 2 and the foreseeable future is that we should plan around flat budgets for computing. There is presently no indication that this situation is likely to change on any timescale, but with some potential concern that it might decrease.  However, we should first understand, and state, what we think the cost of HL-LHC computing will be to do the advertised physics  Anticipated data volumes seem to be on the order of Exabyte of raw data per year combined across 4 experiments, without major changes in philosophy of data selection/filtering.  Compute load with the naïve extrapolations so far made, have computing needs between 5-10 times in excess of what may be affordable as anticipated by technology evolution (which of course itself is very uncertain).  Given these two conditions, we must plan for innovative changes in the computing models from DAQ to final analysis, in order to optimise the physics output. 2 Feb 2016

HL-LHC computing cost parameters 2 Feb 2016 Software performance Federated, Distributed computing infrastructure Experiment core algorithms Business of the experiments: amount of Raw data, thresholds, but also reconstruction, and simulation algorithms Cost of simulation/processing should be a design criterion of detectors Performance/architectures/memory etc.; Tools to support: automated build/validation Collaboration with externals – via HSF New grid/cloud models; optimize CPU/disk/network; economies of scale via clouds, joint procurements etc. COST OF COMPUTING

Cost drivers – 1  The amount of raw data to be processed. Can this be more intelligent – e.g. by moving the reconstruction into the “online” (which does not necessarily have to be at the pit – but make a decision based on accurate reconstruction before recording)?  Actually is it even realistic now to assume that we can blindly write raw data at a maximum rate and assume it can be sorted out later?  Storage cost reduction with compression (lossless and lossy) – this will not reduce the offline load, but may help manage the flow.  Clearly most of this implies in-line automated calibrations and alignments, and single pass high quality reconstruction.  Ultimately data volume management might require physics choices.  Consideration of computing costs (esp for simulation) during the design of detectors for the upgrades. (e.g. ATLAS calorimeter is expensive to simulate). 2 Feb 2016

Cost drivers – 2  New algorithms and re-thinking of how reconstruction and simulation are performed.  This area needs some inspired thinking and innovation around algorithms.  Adaptation and optimisation of code for different architectures which will continually evolve.  Rethinking of how data & I/O is managed during processing to ensure optimal use of the available processing power.  All of this probably requires a continual process of automation of build systems and physics validation, since it is highly likely that key algorithms must be continually optimised across architectures and compilers. The current situation of single executable for all architectures will not be optimal. 2 Feb 2016

Cost drivers – 3  Optimisation of infrastructure cost. Many parameters that could be investigated here.  Reduce the amount of disk we need, to be able to acquire more CPU/tape. Today disk is the largest single cost, and is mostly driven by the number of replicas of data.  Can we get economies of scale through joint procurements, large scale cloud providers, etc. On this timescale we may expect commercial computing costs (of CPU and cache) may be much cheaper that our internal costs, if we can reach a large enough scale.  Likely we will have significant network bandwidths (1-10 Tb/s) at reasonable commercial costs.  Should think about very different distributed computing models – for example a large scale (virtual) data centre integrating our large centres and commercially procured resources. Data would not be moved out of that “centre” but rather processed in-situ and available remotely to the physics community. Such a model would minimise storage costs, and enable potentially different analysis models since all of the data could be (virtually) co-located. Simulation load would be maintained at remote centres. 2 Feb 2016

Goal  Assume we need to save factor 10 in cost over what we may expect from Moore’s law  1/3 from reducing infrastructure cost  1/3 from software performance (better use of clock cycles, accelerators, etc. etc)  1/3 from more intelligence – write less data, move processing closer to experiment (keep less) - writing lots of data is not a goal 2 Feb 2016

Other considerations  In this timescale, computing (and thus physics) is going to be limited by cost, thus we must make every effort to maximise what can be done,  in terms of infrastructure,  but also where it costs real skilled effort (such as algorithms and performance optimisation).  This requires cross-boundary collaboration (experiments, countries, etc.)  There is significant scope here for common projects and developments, common libraries, frameworks, etc.  A significant level of commonality is going to be required by our reviewers (scientific and funding).  Four solutions are not going to be sustainable, but we must justify where differences need to be made. 2 Feb 2016

Radical changes?  Move processing that produces science data (for physics analysis) e.g. AOD’s or ntuples(?) as close to DAQ as possible before distribution  Serious cost benefit, but serious political problem to convince FA’s (this is why we have a grid now)  More or less the LHCb and ALICE models for Run 3  Limit how much data we ship around (to limit disk costs – today 2/3 of budget is disk)  The idea of a “virtual data centre” where the bulk data is processed and stored  Potentially a way to get economies of scale, etc 2 Feb 2016

2 Feb Tb/s DC Compute Client Simulation Factory/HPC DC primarily academic, allow hybrid Data replicated across all Compute: commercial or private – cost/connectivity Data input from external sources: instruments, simulation factories Clients see as a single DC, access data products: existing, on-demand Allows new types of analysis models A Virtual Data Centre for Science? Compute

Zephyr  Was a (failed) proposal to the EC last year to build a distributed data infrastructure along these lines  Was to develop a prototype “Zettabyte” data infrastructure (EC call)  Essentially a “RAID” between data centres (storage systems), along lines of previous slide  Policy driven replication of data between centres  Model allows to use commercial DC’s in a ”transparent” way if desired  Some interest expressed by OB in building such a prototype anyway with Tier 1s 2 Feb 2016

HEP Facility timescale Physics Construct Physics DesignProtoConstructPhysics ConstructPhysics Construct Physics Construct Physics LHC SuperKEKB HL-LHC FCC LBNF ILC Design Integrated view between Europe (ESPP), USA (P5), Japan 2 Feb 2016 Science SKA LSST CTA

Longer term  Should WLCG evolve to a broader scope for HEP in general that covers the evolution and needs of planned and future facilities and experiments?  Other astro(-particle) experiments will share many facilities of HEP  Need to ensure that facilities don’t have to support many different infrastructures 2 Feb 2016

Today’s goals  Open brainstorming - seeded by questions and brief inputs  Not answers!  ideas and suggestions  Feel free to provide input – e.g. 1 slide of ideas  Will need to be followed up after the meeting  Would like to have a document that  Outlines areas of R&D work that we need to do  Perhaps 2-3 strawman models of what computing models could look like in 10 years 2 Feb 2016

Agenda topics  Initial thoughts from experiments on main areas of concern  Infrastructure models – what could we do?  Workflows and tools – what convergence is possible?  Technology evolution – how to track/predict?  How to support HSF effort  Modelling of infrastructure/sites etc – we need something to test ideas (at gross level), can we build a cost model?  Software (tomorrow) – commonalities, how can HSF help, etc.? 2 Feb 2016

Timeline  Need fairly quickly (~1 month) outline of areas of work and R&D effort  Will discuss outcome of this workshop with LHCC in March  By end of 2016 need to have a timeline that leads to a TDR by ~2020  Timeline of R&D, testing, prototyping etc  Hopefully good ideas can be deployed earlier! 2 Feb 2016