Recent performance improvements in ALICE simulation/digitization

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Copyright © 200\8 Quest Software High Performance PL/SQL Guy Harrison Chief Architect, Database Solutions.
Memory Consistency in Vector IRAM David Martin. Consistency model applies to instructions in a single instruction stream (different than multi-processor.
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
1: Operating Systems Overview
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
27-Jun-15 Profiling code, Timing Methods. Optimization Optimization is the process of making a program as fast (or as small) as possible Here’s what the.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
1 History of compiler development 1953 IBM develops the 701 EDPM (Electronic Data Processing Machine), the first general purpose computer, built as a “defense.
Testing Michael Ernst CSE 140 University of Washington.
ALICE Simulation Framework Ivana Hrivnacova 1 and Andreas Morsch 2 1 NPI ASCR, Rez, Czech Republic 2 CERN, Geneva, Switzerland For the ALICE Collaboration.
Axel Naumann. Outline  Static Code Analysis  Coverity  Reporting Tools, Report Quality  "Demo": Examples Axel Naumann Application Area Meeting2.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
Andreas Morsch, CERN EP/AIP CHEP 2003 Simulation in ALICE Andreas Morsch For the ALICE Offline Project 2003 Conference for Computing in High Energy and.
Heather Kelly PPA Scientific Computing Apps LAT was launched as part of the Fermi Gamma-ray Space Telescope on June 11 th 2008.
The Virtual MonteCarlo D.Adamova 2, V.Berejnoi 1, R.Brun 1, F.Carminati 1, A.Fassó 1, E.Futo 1, I.Gonzalez 3, I.Hrivnacova 4, A.Morsch 1 1 CERN, Geneva;
Navigation Timing Studies of the ATLAS High-Level Trigger Andrew Lowe Royal Holloway, University of London.
Geant4 in production: status and developments John Apostolakis (CERN) Makoto Asai (SLAC) for the Geant4 collaboration.
The CMS Simulation Software Julia Yarba, Fermilab on behalf of CMS Collaboration 22 m long, 15 m in diameter Over a million geometrical volumes Many complex.
P4Helpers P4Helpers.h provides static helper functions for kinematic calculation on objects deriving from I4Momentum. Anthony Denton-Farnworth, The University.
AliRoot survey P.Hristov 11/06/2013. Offline framework  AliRoot in development since 1998  Directly based on ROOT  Used since the detector TDR’s for.
Real-Time systems By Dr. Amin Danial Asham.
FORTRAN History. FORTRAN - Interesting Facts n FORTRAN is the oldest Language actively in use today. n FORTRAN is still used for new software development.
Statistical feature extraction, calibration and numerical debugging Marian Ivanov.
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
1 Reconstruction tasks R.Shahoyan, 25/06/ Including TRD into track fit (JIRA PWGPP-1))  JIRA PWGPP-2: Code is in the release, need to switch setting.
AliRoot Classes for access to Calibration and Alignment objects Magali Gruwé CERN PH/AIP ALICE Offline Meeting February 17 th 2005 To be presented to detector.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
MONTE CARLO TRANSPORT SIMULATION Panda Computing Week 2012, Torino.
MAUS Status A. Dobbs CM43 29 th October Contents MAUS Overview Infrastructure Geometry and CDB Detector Updates CKOV EMR KL TOF Tracker Global Tracking.
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
Data Formats and Impact on Federated Access
WP12 - General Development News Sandro Wenzel
EvtGen implementation/usage in the ALICE analysis framework
Installation of the ALICE Software
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
Current Generation Hypervisor Type 1 Type 2.
Chapter 9 Polymorphism.
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Learning to Program D is for Digital.
7. Inheritance and Polymorphism
DPG Activities DPG Session, ALICE Monthly Mini Week
Visualization of embedding
Work report Xianghu Zhao Nov 11, 2014.
Testing UW CSE 160 Winter 2017.
Releases and developments
History of compiler development
Software Life Cycle Models
Main Memory Management
The latest developments in preparations of the LHC community for the computing challenges of the High Luminosity LHC Dagmar Adamova (NPI AS CR Prague/Rez)
Testing UW CSE 160 Spring 2018.
Compiler Construction
Testing UW CSE 160 Winter 2016.
Framework for the acceptance and efficiency corrections
Polymorphism Professor Hugh C. Lauer CS-2303, System Programming Concepts (Slides include materials from The C Programming Language, 2nd edition, by Kernighan.
Use of Geant4 in experiment interactive frameworks AliRoot
Simulation Framework Subproject cern
(Computer fundamental Lab)
Paper by D.L Parnas And D.P.Siewiorek Prepared by Xi Chen May 16,2003
Data Structures & Algorithms
CSE 153 Design of Operating Systems Winter 19
Chapter 4: Simulation Designs
Programming with Shared Memory Specifying parallelism
Agenda SICb Session Status of SICb software migration F.Ranjard
Presentation transcript:

Recent performance improvements in ALICE simulation/digitization Sandro Wenzel / CERN-ALICE WLCG meeting, San Francisco, 9.10.2016

Outline Motivation Initial profile situation Developments to increase software performance Performance increase Outlook

Motivation Improve CPU performance of complete algorithmic pipeline in ALICE in the current AliRoot framework in ALICE specific code (independent of improvements in external simulation packages) as preparation for ALICE-O2 and higher luminosity requirements Start with analysis of simulation/digitization as the most important CPU consumer

Initial profile Status March 2016: Use valgrind/igprof on typical MC simulation scenarios using Geant3 or Geant4 p-p benchmark Pb-Pb benchmark Main CPU users: TPC digitization (expected) simulation physics routines (expected) TGeo geometry routines (expected) dynamic_casts on ~6% level (unexpected) accessing thread local storage variables on ~2% level (unexpected) accessing ROOT containers on ~10% level (unexpected) lots of calls to very small functions on the 1% level (unexpected)

Performance optimization campaign A dedicated campaign was started to remove the “unexpected” hotspots: “grab” the low-hanging fruits !! ROOT related issues: often not offering fast container access slow/non-optimal access to TLorentzVector no type-strictness of ROOT containers dynamic_cast oriented type checking … generic sorting algorithm are based on virtual functions, … Other typical problems: overuse of virtual function paradigm (when not strictly necessary) cache access problems due to wrong loop order etc. campaign to remove unnecessary virtual functions + inline campaign in AliROOT + Geant3 + VMC help the compiler doing optimizations

Overall Results Improvements measured on running a Pb-Pb simulation (typical event) - Geant3 ~20% gain in total runtime achieved ( from ~2359s to ~1966s ) Original Tuned compiler flags Code optimizations in AliRoot/ROOT RunSimulation 1462s 1367s 1182s RunSDigitization 683s 692s 585s Total simulation + all digitization + other parts 2359s 2274s 1966s

Details: Dynamic cast problem Overuse of dynamic casting in AliRoot: ROOT does not offer strongly typed containers, preventing compile-time checks on objects put into them users are then inclined to perform type checks at runtime Example: typically we write TObjArray *fDetectorModules // supposed to store objects of type DetectorModule and derived retrieving objects … one is inclined to say if (dynamic_cast<DetectorModule>(fDetectorModules->At(i))) … This is an expensive operation at every read from fDetectorModule which happens every step although fDetectorModule never changes sums up to almost ~6ish % Action taken: perform type checks only at initial write to containers use static_casts to read from (const) containers achieved complete elimination of dynamic_cast problem from profile

Details on some ROOT issues Non-optimal ROOT Container access AliROOT relies heavily on ROOT containers example 1: no fast way to get element from TArrayI fast; both TArrayI::At( ) and TArrayI::operator [](int) both perform bounding checks example II: retrieving element TMatrixT<>::operator(row, col) always performs assert checks + boundary checks ticket ROOT-5472 opened a while ago problems have been fixed in AliROOT with custom “fast-access” functions TLorentzVector heavily used in parts of simulation/digitisation access operators/constructors non-inline and convoluted (leading to 1% to 2% overall cost) an “optimizing” patch has been submitted to ROOT and was accepted ROOT Sorting sorting a TClonesArray is slower than sorting a “std::vector<T>” because the type of element is not known —> preventing inlining of “sort/compare” functor A template version of TClonesArray::Sort<T> has been implemented, speeding up sorting by factor ~2 plan to submit patch to ROOT team

Avoid access to thread-local variables Another major problem turned out to be accessing “thread-local storage” variables noticeable by excessive appearance of _tls_get_addr_ in profiler outputs strange since AliRoot not using threading for moment major reason was found to be accessing the singleton Virtual Monte Carlo object with TVMC::GetMC() problem of this function was: “virtual” + “non-inline” + “thread local local storage object” Solution taken: cache a reference to MC object in AliROOT simulation base class … not longer need to call GetMC() thread local storage problem almost completely resolved apart from some things remaining deep inside ROOT/TGeo

Miscellaneous points Systematic study of build system etc. Optimized the build flags for Geant3 previously compiled in a “conservative” mode reason: true RELEASE mode compilation for Geant3 leads to numeric instabilities did a systematic scan of compiler flags and identified the cause of numeric instabilities can now build Geant3 almost in RELEASE mode: -O2 + -fno-strict-overflow

Outlook After this pass of optimization steps, we get a clearer view on the real important algorithms in AliRoot geometry in simulation TPC in digitization Next steps will be a tackling of those parts Concrete ideas exist to decrease the time spent in geometry routines via usage of “VecGeom” modern, high-performance geometry package for simulation e.g., plan to using the VecGeom engine in Geant4/Geant3 via the Virtual Monte Carlo interface Also take a look at reconstruction algorithms