16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.

Slides:



Advertisements
Similar presentations
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Advertisements

LHCb Upgrade Overview ALICE, ATLAS, CMS & LHCb joint workshop on DAQ Château de Bossey 13 March 2013 Beat Jost / Cern.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Tom Dietel University of Cape Town for the ALICE Collaboration Computing for ALICE at the LHC.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
1 “Steering the ATLAS High Level Trigger” COMUNE, G. (Michigan State University ) GEORGE, S. (Royal Holloway, University of London) HALLER, J. (CERN) MORETTINI,
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 1 Computing at the HL-LHC Predrag Buncic on behalf of the Trigger/DAQ/Offline/Computing Preparatory.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
Workshop ALICE Upgrade Overview Thorsten Kollegger for the ALICE Collaboration ALICE | Workshop |
Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Domenico Elia1 ALICE computing: status and perspectives Domenico Elia, INFN Bari Workshop CCR INFN / LNS Catania, Workshop Commissione Calcolo.
05/14/04Larry Dennis, FSU1 Scale of Hall D Computing CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015.
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
Ideas and Plans towards CMS Software and Computing for Phase 2 Maria Girone, David Lange for the CMS Offline and Computing group April
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
SuperB and its computing requirements
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
evoluzione modello per Run3 LHC
Workshop Computing Models status and perspectives
for the Offline and Computing groups
ALICE – First paper.
WLCG: TDR for HL-LHC Ian Bird LHCC Referees’ meting CERN, 9th May 2017.
LHCb Software & Computing Status
Commissioning of the ALICE HLT, TPC and PHOS systems
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
New strategies of the LHC experiments to meet
Computing at the HL-LHC
Presentation transcript:

16 September 2014 Ian Bird; SPC1

General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and ATLAS major upgrades for LS3 -Expect Run 3 data is manageable with Run 2 strategies and expected improvements -Run 4 requires more imaginative solutions and needs R&D 16 September 2014 Ian Bird; SPC2

ALICE Upgrade for Run3 16 September 2014 Ian Bird; SPC3

Storage 50 kHz (23 MB/event) 75 GB/s (peak) 50 kHz (1.5 MB/event) Increased LHC luminosity will result in interaction rates of 50 kHz for Pb-Pb and 200 kHz for p-p and p- Pb collisions. Several detectors (including TPC) will have continuous readout to address pileup and avoid trigger-generated dead-time. Online/Offline (O2) Facility at P2 will be tasked with reducing recorded data volume by doing the online reconstruction. ALICE physics program in Run3 will focus on precise measurements of heavy-flavor hadrons, low momentum quarkonia, and low mass dileptons. O2 O2 16 September 2014Ian Bird; SPC4

O2 Online-Offline facility (02) will provide – DAQ functionality – detector readout, data transport and event building – HLT functionality – data compression, clustering algorithms, tracking algorithms – Offline functionality – calibration, full event processing and reconstruction, up to analysis objects data Final compression factor x20 (raw data vs data on disk) Requires the same software to operate in challenging online environment and on the Grid – Possibly using hardware accelerators (FPGA, GPUs) and operating in distributed heterogeneous computing environment ALFA software framework ( ALICE+FAIR) – developed in collaboration with the FAIR at GSI 5 16 September 2014Ian Bird; SPC

Computing Model O2 Facility will play the major role in raw data processing Grid will continue to play important roles – MC simulation, end-user and organized data analysis, raw data reconstruction (reduced) – Custodial storage for a fraction of RAW data Expecting that Grid (CPU & storage) will grow with the same rate (20-25% per year) Data management and CPU needs for simulation will be the biggest challenges – O2 Facility will provide a large part of raw data store – Simulation needs speed up by using new approaches (G4, GV, fast simulation…) 6 16 September 2014Ian Bird; SPC

16 September 2014 Ian Bird; SPC7

8 Towards the LHCb Upgrade (Run 3, 2020) m We do not plan a revolution for LHCb Upgrade computing m Rather an evolution to fit in the following boundary conditions: o Luminosity levelling at 2x10 33 P Factor 5 c.f. Run 2 o 100kHz HLT output rate for full physics programme P Factor 8-10 more than in Run 2 o Flat funding for offline computing resources m Computing milestones for the LHCb upgrade: o TDR: 2017Q1 o Computing model: 2018Q3 m Therefore only brainstorming at this stage, to devise model that keeps within boundary conditions Ian Bird; SPC

Evolution of LHCb data processing model m Run 1: o Loose selection in HLT (no PID) o First pass offline reconstruction o Stripping P selects ~50% of HLT output rate for physics analysis o Offline calibration o Reprocessing and Restripping m Run 2: o Online calibration o Deferred HLT2 (with PID) o Single pass offline reconstruction P Same calibration as HLT P No reprocessing before LS2 o Stripping and Restripping P Selects ~90% of HLT output rate for Physics analysis m Given sufficient resources in HLT farm, online reconstruction could be made ~identical to offline 9Ian Bird; SPC

TurboDST: brainstorming for Run 3 m In Run 2, Online (HLT) reconstruction will be very similar to offline (same code, same calibration, fewer tracks) P If it can be made identical, why then write RAW data out of HLT, rather than Reconstruction output? m In Run 2 LHCb will record 2.5 kHz of “TurboDST” P RAW data plus result of HLT reconstruction and HLT selection P Equivalent to a microDST (MDST) from the offline stripping o Proof of concept: can a complete physics analysis be done based on a MDST produced in the HLT? P i.e. no offline reconstruction d no offline realignment, reduced opportunity for PID recalibration P RAW data remains available as a safety net o If successful, can we drop the RAW data? P HLT writes out ONLY the MDST ??? m Currently just ideas, but would allow a 100kHz HLT output rate without an order of magnitude more computing resources. 10Ian Bird; SPC

16 September 2014 Ian Bird; SPC11

ATLAS Reconstruction Software Speedup by factor of 3x achieved allows to safely operate at 1kHz EF output rate and reconstruct in timely manner at Tier0 Without compromising quality of reco output ! Speedup achieved by: Algorithmic improvements; cleanups; Eigen matrix library; Intel math library; switch to 64 bit; SL5  SL6, … Runs 3 and 4: Must efficiently exploit the CPU architectures of 10 years from now  massive multithreading, vectors …  work starts now 16 September 2014Ian Bird; SPC12

ATLAS: New Analysis Framework One Format replaces separate Root-readable and Athena-readable formats Common Reduction Framework and Analysis Framework for all physics groups 16 September 2014Ian Bird; SPC13

ATLAS: Distributed Computing NEW for Run 2: Distributed Data Management System (Rucio) being commissioned NEW for Run 2: Production System (ProdSys2 = JEDI + DEFT) being commissioned NEW for Run 2: Distributed Data Management Strategy: – All datasets will be assigned a Lifetime (infinite for RAW) – Disk (versus Tape) residency will be algorithmically managed. – Builds upon Rucio and ProdSys2 capabilities – The ATLAS computing model [spreadsheet] has embodied similar concepts for two years. – The new strategy will be progressively implemented in advance of Run 2 Runs 3 and 4: – Storage is likely to be even more constrained than CPU – Build on the New [for Run 2] Distributed Data Management Strategy – Add dynamic, automated “store versus recompute” decisions. 16 September 2014Ian Bird; SPC14

16 September 2014 Ian Bird; SPC15

Maria Girone, CERN  CMS is facing a huge increase in the scale of the expected computing needed for Run4  The WLCG Computing Model Evolution document predicts 25% processing capacity and 20% storage increase per year  Factor of 8 in processing and 6 in storage between now and Run4  Even assuming a factor of 2 code improvements the deficit is 4-13 in processing and 4-6 in storage PhasePile-UpHLT OutputReconstruction time ratio to Run2 AOD Size ratio to Run2 Total Weighted Average Increase above Run2 Phase I (2019) 501kHz41.43 Phase II (2024) 1405kHz Phase II (2024) kHz

Maria Girone, CERN  It is unlikely we will get a factor of 5 more money, nor will the experiment be willing to take a factor of 5 less data  Big improvements are needed  Roughly 40% of the CMS processing capacity is devoted to task identified as reconstruction  Prompt reconstruction, re-reconstruction, data and simulation reco  Looking at lower cost and lower power massively parallel systems like ARM and high performance processors like GPUs (Both can lower the average cost per processing)  ~20% of the offline computing capacity is in areas identified as selection and reduction  Analysis selection, skimming, production of reduced user formats  Looking at dedicated data reduction solutions like event catalogs and big data tools like Map Reduce  The remaining 40% is a mix  Lot of different activities with no single area to concentrate optimization effort  Simulation already has a strong ongoing optimization effort  User analysis activities developed by many people  Smaller scale calibration and monitoring activities 17

Maria Girone, CERN  In order to exploit new hardware investment is needed in parallelization  CMS has already achieved >99% parallel safe code and has excellent efficiency up to 8 cores  CMS maintains as open as possible triggers and datasets are reduced to optimized selections for particular activities  Nearly all the selections are serial passes through the data by users and groups  Relies on multiple distributed copies and many reads of each event  Techniques that reuse the selections can reduce the total processing needed 18