Ian Bird WLCG Workshop, Copenhagen 12 th November 2013 12 Nov 2013 Ian Bird; WLCG Workshop1.

Slides:



Advertisements
Similar presentations
1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
Advertisements

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Ian Bird WLCG Workshop Okinawa, 12 th April 2015.
Assessment of Core Services provided to USLHC by OSG.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
Ian Bird LHCC Referees’ meeting; CERN, 11 th June 2013 March 6, 2013
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Ian Bird LHCC Referee meeting 23 rd September 2014.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
SuperBelle Collaboration Meeting December 2008 Martin Sevior University of Melbourne A Computing Model for SuperBelle This is an idea for discussion only!
ALICE Offline Week | CERN | November 7, 2013 | Predrag Buncic AliEn, Clouds and Supercomputers Predrag Buncic With minor adjustments by Maarten Litmaath.
Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013
Ian Bird Trigger, Online, Offline Computing Workshop CERN, 5 th September 2014.
Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
LHC Computing, CERN, & Federated Identities
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Ian Bird CERN, 17 th July 2013 July 17, 2013
Ian Bird, CERN 2 nd February Feb 2016
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Computing Fabrics & Networking Technologies Summary Talk Tony Cass usual disclaimers apply! October 2 nd 2010.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Evolution of storage and data management
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Computing models, facilities, distributed computing
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
for the Offline and Computing groups
WLCG: TDR for HL-LHC Ian Bird LHCC Referees’ meting CERN, 9th May 2017.
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Strategy document for LHCC – Run 4
New strategies of the LHC experiments to meet
Computing at the HL-LHC
Presentation transcript:

Ian Bird WLCG Workshop, Copenhagen 12 th November Nov 2013 Ian Bird; WLCG Workshop1

Topics Summary of computing model update Longer term – HL-LHC What should HEP computing look like in 10 years -How should we address the problem? 12 Nov 2013 Ian Bird; WLCG Workshop2

Computing Model update Requested by LHCC -Initial draft delivered in September; final version due for LHCC meeting in December Goals: -Optimise use of resources -Reduce operational costs (effort at grid sites, ease of deployment and operation, support and maintenance of grid middleware) Evolution of computing models – significant improvements that have already been done; areas of work now and anticipated; including several common projects Evolution of grid model: use of new technologies -Cloud/virtualisation -Data federations, intelligent data placement/caching, data popularity service October 29, 2013

Contents: Outline: -Experiment computing models Ongoing changes wrt original model – plans for the future Structured to allow comparison across the experiments -Technology review What is likely for CPU, disk, tape, network technologies; expected cost evolutions -Resource requirements during Run 2 -Software performance General considerations and experiment-specific actions; in particular optimising experiment software – what has already been done, what is anticipated -Evolution of the grid and data management services Aim to reduce costs of operation and support October 29, 2013

Computing models Focus on use of resources/capabilities rather than “Tier roles” -Already happening: LHCb use of Tier 2s for analysis, CMS use for MC reconstruction; use of Tier 1s for prompt reconstruction, etc -Data access peer-peer: removal of hierarchical structure Data federations – based on xrootd – many commonalities -Optimizing data access from jobs: remote access, remote I/O -More intelligent data placement/caching; pre-placement vs dynamic caching -Data popularity services being introduced Reviews of (re-)processing passes; numbers of data replicas Use of HLT and opportunistic resources now important October 29, 2013

Software Moore’s law only helps us if we can make use of the new multi-core CPUs with specialised accelerators etc. (Vectorisation, GPUs, …) -No longer benefit from simple increases in clock speed Ultimately this requires HEP software to be re-engineered to make use of parallelism at all levels -Vectors, instruction pipelining, instruction level pipelining, hardware threading, multi-core, multi-socket. Need to focus on commonalities: -GEANT, ROOT, build up common libraries This requires significant effort and investment in the HEP community -Concurrency forum already initiated -Ideas to strengthen this as a collaboration to provide roadmap and incorporate & credit additional effort October 29, 2013

Distributed computing Drivers: -Operational cost of grid sites -Ability to easily use opportunistic resources (commercial clouds, HPC, clusters, …) with ~zero configuration -Maintenance cost of grid middleware Simplifying grid middleware layer -Complexity has moved to the application layer where it better fits Ubiquitous use of pilot jobs, etc. -Cloud technologies give a way to implement job submission and management -Run 2 will see a migration to more cloud-like model -Centralisation of key grid services – already happening -Leading to more lightweight and robust implementation of distributed computing October 29, 2013

Technology outlook Effective yearly growth: CPU 20%, Disk 15%, Tape 15% Assumes: -75% budget additional capacity, 25% replacement -Other factors: infrastructure, network & increasing power costs Ian Bird; e-Science, Beijing8 23 October 2013

Evolution of requirements 23 October 2013 Ian Bird; e-Science, Beijing9 Estimated evolution of requirements (NB. Does not reflect outcome of current RSG scrutiny) : Actual deployed capacity Line: extrapolation of actual resources Curves: expected potential growth of technology with a constant budget (see next) CPU: 20% yearly growth Disk: 15% yearly growth Higher trigger (data) rates driven by physics needs Based on understanding of likely LHC parameters; Foreseen technology evolution (CPU, disk, tape) Experiments work hard to fit within constant budget scenario

12 Nov 2013 Ian Bird; WLCG Workshop10

A lot more to come … Ian Bird; e-Science, Beijing11 23 October 2013

Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 12 LHCb & Run 3 40 MHz 5-40 MHz 20 kHz (0.1 MB/event) 2 GB/s Storage Reconstruction + Compression 50 kHz 75 GB/s 50 kHz (1.5 MB/event)  PEAK OUTPUT 

Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 13 ATLAS & Run GB/s Storage Level 1 HLT 5-10 kHz (2MB/event) 40 GB/s Storage Level 1 HLT 10 kHz (4MB/event)  PEAK OUTPUT 

Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 14 Data: Outlook for HL-LHC Very rough estimate of a new RAW data per year of running using a simple extrapolation of current data volume scaled by the output rates. To be added: derived data (ESD, AOD), simulation, user data… PB

Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 15 CPU: Online + Offline Very rough estimate of new CPU requirements for online and offline processing per year of data taking using a simple extrapolation of current requirements scaled by the number of events. Little headroom left, we must work on improving the performance. ONLINE GRID Moore’s law limit MHS06 Historical growth of 25%/year Room for improvement

12 Nov 2013 Ian Bird; WLCG Workshop : expect 10 Tb/s networks Network access to facilities and data will be cheap Moving data around is expensive (needs disk!)

Problems No economies of scale (ops costs); -10 large centres much better than 150 smaller Too distributed – too much disk cache needed Current inability to effectively use CPU -Evolution of commodity or HPC architectures, Break down of Moore’s law (physics) 12 Nov 2013 Ian Bird; WLCG Workshop18

Opportunities Fantastic networking – as much as you want No reason at all to have data locally to physicist Much more “offline” goes “inline” – don’t store everything -HLT farms will significantly increase in size – why not carry this further? Change in funding models needed 12 Nov 2013 Ian Bird; WLCG Workshop19

Long term ? Current models do not simply scale – need to re-think What is the most cost-effective way to deploy computing? Proposing to hold series of workshops to brainstorm radical computing model changes for the 10-year timescale. -How can we benefit from economies of scale? -How does HEP collaborate with other sciences (big-data, e-infrastructures, etc) October 29, 2013

HEP computing in 10 years? We still use the computing model of 1970’s Opportunity to really re-think how we produce “science data” -And how physicists can use or query it Opportunity to (re-)build true commonalities -Within HEP, and with other science, and other big-data communities 12 Nov 2013 Ian Bird; WLCG Workshop21

Long Term strategy HEP computing needs a forum where these strategic issues can be coordinated since they impact the entire community: -Build on leadership in large scale data management & distributed computing – make our experience relevant to other sciences – generate long term collaborations and retain expertise -Scope and implementation of long term e-infrastructures for HEP – relationship with other sciences and funding agencies -Data preservation & reuse, open and public access to HEP data -Significant investment in software to address rapidly evolving computer architectures is necessary -HEP must carefully choose where to invest our (small) development effort – high added value in-house components, while making use of open source or commercial components where possible -HEP collaboration on these and other key topics with other sciences and industry July 17, 2013

Data provider services Edge server Client Data request Delivery via nearest edge server, cached copy if available Content request interface Locally used data cached in edge servers Content delivery network: deliver data quickly and efficiently by placing data of interest close to its clients Content delivery network: deliver data quickly and efficiently by placing data of interest close to its clients October 15, 2013Torre Wenaus, BNL CHEP 2013, Amsterdam23 The Content Delivery Network Model Most of the web operates this way

Data provider services Edge server Client A growing number of HEP services are designed to operate broadly on the CDN model October 15, 2013Torre Wenaus, BNL CHEP 2013, Amsterdam24 The Content Delivery Network Model ServiceImplementationIn production Frontier conditions DB Central DB + web service cached by http proxies ~10 years (CDF, CMS, ATLAS, …) CERNVM File System (CVMFS) Central file repo + web service cached by http proxies and accessible as local file system Few years (LHC expts, OSG, …) Xrootd based federated distributed storage Global namespace with local xrootd acting much like an edge service for the federated store Xrootd 10+ years Federations ~now (CMS AAA, ATLAS FAX, …) See Brian’s talk Event service Requested events delivered to a client agnostic as to event origin (cache, remote file, on-demand generation) ATLAS implementation coming in 2014 Virtual data service The ultimate event service backed by data provenance, regeneration infrastructure Few years?

What might this look like? Inside the CDN “torus” -Large scale data factories – consolidation of Tier 1s and large Tier 2s; -Function to deliver the datasets requested -No need to be transferring data around – essentially scale the storage to the CPU capacity Connected by v. high speed networks Distinction between “online” and “offline” could move to this boundary at the client interface At this point can think about new models of analysing data -Query data set rather than event-loop style? October 15, 2013 Torre Wenaus, BNL CHEP 2013, Amsterdam25