Ideas and Plans towards CMS Software and Computing for Phase 2 Maria Girone, David Lange for the CMS Offline and Computing group April 12 2015

Slides:



Advertisements
Similar presentations
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Advertisements

CMS Phase 2 Upgrade: Scope and R&D goals CMS Phase 2 Upgrade: Scope and R&D goals October 21, 2014 Jeremiah Mans On behalf of the CMS Collaboration.
Upgrading the CMS simulation and reconstruction David J Lange LLNL April CHEP 2015D. Lange.
CMS Full Simulation for Run-2 M. Hildrith, V. Ivanchenko, D. Lange CHEP'15 1.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013
TDAQ Upgrade Software Plans John Baines, Tomasz Bold Contents: Future Framework Exploitation of future Technologies Work for Phase-II IDR.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
Overview of a Plan for Simulating a Tracking Trigger (Fermilab) Overview of a Plan for Simulating a Tracking Trigger Harry Cheung (Fermilab)
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
ATLAS Trigger Development
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
Software Tools for Layout Optimization (Fermilab) Software Tools for Layout Optimization Harry Cheung (Fermilab) For the Tracker Upgrade Simulations Working.
News and Miscellaneous UPO Jan Didier Contardo, Jeff Spalding 1 UPO Jan Workshop on Upgrade simulations in 2013 (Jan. 17/18) -ESP in.
Software for the CMS Cosmic Challenge Giacomo BRUNO UCL, Louvain-la-Neuve, Belgium On behalf of the CMS Collaboration CHEP06, Mumbay, India February 16,
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
New Trigger Strategies Discussion Perimeter Institute HL-LHC Workshop June 8, 2015 Eva Halkiadakis, Matt Strassler, Gordon Watts.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
1 Towards Phase 2 TP releases 2013 Didier Contardo, Jeff Spalding Status of current simulations studies and ongoing issues Needs for TP preparation.
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
LHCbComputing Update of LHC experiments Computing & Software Models Selection of slides from last week’s GDB
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
A successful public- private partnership Maria Girone CERN openlab CTO.
Hall D Computing Facilities Ian Bird 16 March 2001.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
DATA ACCESS and DATA MANAGEMENT CHALLENGES in CMS.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL.
Extreme Scale Infrastructure
LQCD Computing Project Overview
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Computing models, facilities, distributed computing
A successful public-private partnership
SuperB and its computing requirements
The “Understanding Performance!” team in CERN IT
evoluzione modello per Run3 LHC
Workshop Computing Models status and perspectives
Chapter1 Fundamental of Computer Design
FUTURE ICT CHALLENGES IN SCIENTIFIC COMPUTING
for the Offline and Computing groups
WLCG: TDR for HL-LHC Ian Bird LHCC Referees’ meting CERN, 9th May 2017.
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
Status of Full Simulation for Muon Trigger at SLHC
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
The latest developments in preparations of the LHC community for the computing challenges of the High Luminosity LHC Dagmar Adamova (NPI AS CR Prague/Rez)
New strategies of the LHC experiments to meet
Computing at the HL-LHC
Presentation transcript:

Ideas and Plans towards CMS Software and Computing for Phase 2 Maria Girone, David Lange for the CMS Offline and Computing group April

CMS Upgrade Strategy - Overview 2 LS1 LS2 LS3 Upgrades 2013/14 ongoing: Completes muon coverage (ME4) Improve muon trigger (ME1), DT electronics Replace HCAL photo-detectors in forward and outer (HPD → SiPM) Upgrades 2013/14 ongoing: Completes muon coverage (ME4) Improve muon trigger (ME1), DT electronics Replace HCAL photo-detectors in forward and outer (HPD → SiPM) Complete original detector Address operational issues Start upgrade for high PU Phase 2 Upgrades: (Technical Proposal in preparation) Further Trigger/DAQ upgrade Barrel ECAL Electronics upgrade Tracker replacement/ Track Trigger End-Cap Calorimeter replacement Phase 2 Upgrades: (Technical Proposal in preparation) Further Trigger/DAQ upgrade Barrel ECAL Electronics upgrade Tracker replacement/ Track Trigger End-Cap Calorimeter replacement Maintain/Improve performance at extreme PU. Sustain rates and radiation doses Phase 1 Upgrades 2017/18/19: New Pixels, HCAL SiPMs and electronics, L1-Trigger Preparatory work during LS1: new beam pipe test slices of new systems (Pixel cooling, HCAL, L1-trigger) Phase 1 Upgrades 2017/18/19: New Pixels, HCAL SiPMs and electronics, L1-Trigger Preparatory work during LS1: new beam pipe test slices of new systems (Pixel cooling, HCAL, L1-trigger) Maintain/Improve performance at high PU

For CMS, The big step up in computing scale is the HL-LHC program and Phase-2 upgrade CMS at Phase 2 (HL-LHC): The CMS detector configuration is still to be determined Much higher output rate of trigger (potentially 10kHz) Even higher luminosity and pileup (140+ interactions/crossing) HL-LHC presents a big increase in the scale of the CMS computing program

Fitting Offline and Computing into this plan Offline and Computing take an incoming rate and complexity of events to be handled within a resource envelope – We are constrained on both ends! Talking concretely about Offline and Computing 10 years in advance is somewhat academic. CMS has been writing a technical proposal during LS1 and what follows is mostly based on the draft document – Disruptive technology changes can completely alter the landscape We are counting on it – HEP doesn’t drive the technology We are a small piece of a much larger market. Industrial areas have surpassed us, and even other sciences are catching up

Resource Growth Based on the technology section of the WLCG Computing Model Evolution document we currently we see 25% processing capacity and 20% storage increase per year for the same money – Means a doubling every 3(4) years for CPU (storage), so a factor of roughly 8 and 6 by the timescale of CMS Phase-II This assumes flat funding (the best scenario we can hope for) But past experience in technology evolution may not represent future performance

Estimating Resource Needs CMS is planning for 5-7.5kHz of data in CMS Phase-II. In this scenario CMS would collect 25B-37B raw events per year – Estimating from the current software and using the upgrade simulation we see that each of these events is more complicated to reconstruct and larger than the events we will collect in 2015 Run3 Run4

Size of the processing problem Factoring in the trigger rate and taking a weighed average of the data and simulation tasks we see the computing challenge is times worse than Run2 Anticipating a factor of 8 in CPU improvements and a factor of 2 in code improvement, we have a deficit of a factor of 3-15 Anticipating a factor 6 in storage improvements and having by Phase II events 4-5 times larger, we have still a deficit of 4-5 in storage Scale of computing resource needs relative to Run 2 including the increase in projected HLT output rate

Scale of solutions It is unlikely we will get a factor of 5 more money, nor will the experiment be willing to record a factor of 5 lewer events – Big improvements are needed CMS is investigating many areas.

Big uncertainty from projecting commodity computing Without a big change in our model for cost for computing, we need to maximize performance/cost of commodity computing 1.Processor evolution: Move towards multi/many core driven by performance/Watt 2.Increasing memory bandwidth limitations 3.Storage and Disk evolution? 4.Networks evolution?

Big uncertainty from the software even assuming today’s computing technologies We don’t yet know the CMS Phase-II detector design: – Currently a 4x spread in CPU resources needed to reconstruct different possibilities We continue to achieve factors of performance improvement in our reconstruction software – Modernize and improve our track reconstruction – Pileup mitigation techniques continue to improve LS1 performance improvements

Targets for improvement in CMS processing capacity Reconstruction: Improving the number of events that can be reconstructed per computing unit per Swiss Franc is the single biggest savings Analysis selection, skimming, production of reduced user formats Simulation, user analysis, calibration, monitoring

Blurring the Site Boundaries In Run2 CMS computing resources are intended to work more like a coherent system than a collection of sites with specialized functions –Improved networks have been key to this 12 Data Federation will make CMS datasets available transparently across the Grid One central queue for all resources and all workflows The HLT farm is now an integrated resource when we are not in data-taking mode HLT Tier-1 Tier- 0 Tier-2 GEN-SIM MC RECO DATA RECO ANALYSIS 12

Multi-threaded CMSSW Framework now in production commissioning We have developed next-generation framework for CMSSW based on a multi-threading approach – This gives CMS a natural platform to investigate highly parallelizable algorithms Short term focus: This Framework allows us to process higher Run 2 trigger rates efficiently and to adapt to computing technology trends Current results: – Good scaling in CPU performance up to at least 8 cores in a job – Substantial application memory savings in CMS reconstruction A development plan is in place to modify the FWK to scale up the threading performance to much higher levels over the next years. Detailed reports during CHEP

Big changes in CMS analysis data model: MiniAOD CMS has developed a new user analysis data format that is 5-10x smaller than our Run 1 format – Design and content based on Run 1 experience across analysis groups – Re-evaluation of physics requirements on stored objects and on variable precision. – Targets most analysis needs (80-90%) Potential for big analysis improvements in Run 2: – Increased analysis agility in limited computing resources: Recreating miniAOD is much faster than rerunning the reconstruction for a vast range of performance improvements – Substantial robustness improvements in analysis results by reducing user data processing Provides a basis for R+D towards more I/O performant analysis data models Detailed reports during CHEP

HL-LHC focused R+D Activities and ideas (Reports during CHEP on many areas) Towards using heterogeneous many-core resources – We Prototype tracking demonstrator for Intel Phi in development: Initial results from generic algorithm encouraging Trigger on many-core or specialized resources Simulation (G4 or beyond) on heterogeneous resources Resource scheduling, I/O, event processing frameworks for heterogeneous resources, parallel processing models – Developing a computing model simulation to study optimizing data access Improving tracking and other algorithms at high pile-up – Continuous improvement activities Evaluating improved cost/performance of using specialized centers for dedicated workflows

HL-LHC focused R+D Activities (Reports during CHEP on many areas) Evaluating “big data” data reduction and selection techniques for I/O intensive analysis procedures – Efficient physics object skimming/tagging would also promote re-use of selection criteria Developing tools and infrastructure (profilers, dev. tools, etc.) – Prototyping container technology (e.g., DOCKER) for CMSSW Evaluating metrics for performance per power use, in addition to simply raw performance Investigating data analytics techniques – Ideas for next generation data popularity 16

Outlook CMS is facing a a much more complex computing program for Phase-II due to detector changes, increased trigger rates and event complexity – Technology evolution alone will not close the gap between naïve resource extrapolations based on constant funding and resource needs for the HL-LHC era – In addition we need to evolve to be ready to effectively use future commodity computing technologies to be cost effective CMS is investigating a number of R&D areas, and is seeking to extend our collaboration between experiments, sites, and groups We have organized our R&D around weekly meeting (these are now open outside CMS) – Active collaboration between CMS and TechLab for R+D on (CMSSW) benchmark performance measurements on new architectures – CMS members closely following HEP software foundation (including via startup team)