16 th Geant4 Collaboration Meeting SLAC, September 2011 P. Mato, CERN
For the last 40 years HEP event processing frameworks have had the same structure ◦ initialize; loop events {loop modules {…} }; finalize ◦ O-O has not added anything substantial ◦ It is simple, intuitive, easy to manage, scalable Current frameworks designed late 1990’s ◦ We know now better what is really needed ◦ Unnecessary complexity impacts on performance Clear consensus that we need to adapt HEP applications to new generation CPUs ◦ Multi-process, multi-threads, GPU’s, vectorization, etc. ◦ The one job-per-core scales well but requires too much memory and sequential file merging P. Mato/CERN 2
Covering a broad scope of applications and environments ◦ Single user desktop/laptop case to minimize latency ◦ Production (throughput computing) to minimize memory and other resources Investigations are starting and prototypes are being developed ◦ Opportunity to collaborate and organize better the work P. Mato/CERN 3
Universal framework for simulation, reconstruction, analysis, high level trigger applications Common framework for use by any experiment Decomposition of the processing of each event into ‘chunks’ that can be executed concurrently Ability to process several events concurrently Optimal scheduling and associated data structures Minimize any processing requiring exclusive access to resources because it breaks concurrency Supporting various hardware/software technologies Facilitate the integration of existing LHC applications code (algorithmic part) Quick delivery of running prototypes. The opportunity of the 18 months LHC shutdown P. Mato/CERN 4
Current frameworks used by LHC experiments supports all data processing applications ◦ High-level trigger, reconstruction, analysis, etc. ◦ Nothing really new here But, simulation applications are designed with a big ‘chunk’ in which all Geant4 processing is happening ◦ We to improve the full and fast simulation using the set common services and infrastructure ◦ See later the implications for Geant4 Running on the major platforms ◦ Linux, MacOSX, Windows P. Mato/CERN 5
Frameworks can be shared between experiments ◦ E.g. Gaudi used by LHCb, ATLAS, HARP, MINERVA, GLAST, BES3, etc. We can do better this time :-) ◦ Expect to work closely with LHC experiments ◦ Aim to support ATLAS and CMS at least Special emphasis to requirements from: ◦ New experiments E.g. Linear Collider, SuperB, etc. ◦ Different processing paradigms E.g. fix target experiments, astroparticles P. Mato/CERN 6
Framework with the ability to schedule concurrent tasks ◦ Full data dependency analysis would be required (no global data or hidden dependencies) ◦ Need to resolve the DAGs (Direct Acyclic Graphs) Not much gain expected with today’s designed ‘chunks’ ◦ See CMS estimates at CHEP’10 (*)* ◦ Algorithm decomposition can be influenced by the framework capabilities ‘Chunks’ could be processed by different hardware/software ◦ CPU, GPU, threads, process, etc. P. Mato/CERN 7 Time Input Processing Output
Need to deal with tails of sequential processing ◦ See Rene’s presentation (*)* Introducing Pipeline processing ◦ Never tried before! ◦ Exclusive access to resources can be pipelined e.g. file writing Need to design a very powerful scheduler P. Mato/CERN 8 Time
Concrete algorithms can be parallelized with some effort ◦ Making use of Threads, OpenMP, MPI, GPUs, etc. ◦ But difficult to integrate them in a complete application E.g. MT-G4 with Parallel Gaudi ◦ Performance-wise only makes sense to parallelize the complete application and not only parts Developing and validating parallel code is difficult ◦ ‘Physicists’ should be saved from this ◦ In any case, concurrency will limit what can and can not be done in the algorithmic code At the Framework level you have the overall view and control of the application P. Mato/CERN 9
It is not simple but we are not alone ◦ Technologies like the Apple’s Grand Central Dispatch (GCD) are designed to help write applications without having to fiddle directly with threads and locking (and getting it terribly wrong) New paradigms for concurrency programming ◦ Developer needs to factor out the processing in ‘chunks’ with their dependencies and let the framework (system) to deal with the creation and management of a ‘pool’ of threads that will take care of the execution of the ‘chunks’ ◦ Tries to eliminates lock-based code and makes it more efficient P. Mato/CERN 10
Geant4 [core] is a toolkit and should continue to be ◦ Facilitates the integration in existing applications/frameworks However Geant4 applications should be based on a new and more modern framework ◦ Configuration, proper scripting, interactivity, I/O, analysis, etc. ◦ Plugins based (physics process/models, visualization drivers, etc.) ◦ Ability to run full and fast MC together using common infrastructure (e.g. geometry, conditions, etc.) E.g. today’s frameworks allow to run different ‘tacking algorithms’ in the same program Defining clearly the input and output types Make use of the common set of foundation packages (math, vectors, utility classes, etc.) P. Mato/CERN 11
With an approach like the GDC we could exercise different factorizations ◦ Processing each event (set of primary particles) could be the ‘chunk’ (same as GeantMT) We could also go at the sub-event level ◦ Development of Rene’s ideas of ‘baskets’ of particles organized by particle type, volume shape, etc. ◦ Would need to develop an efficient summing (‘reduce’) of the results ◦ Would require to study the reproducibility of results (random number sequence) P. Mato/CERN 12
Collaboration of CERN with FNAL, DESY and possible other Labs ◦ Start with small number of people (at the beginning) ◦ Open to people willing to collaborate ◦ Strong interactions with ATLAS and CMS (and others) E.g. Instrumentation of existing applications to provide requirements ◦ Strong collaboration with Geant4 team Quick delivery of running prototypes (I and II) ◦ First prototype in 12 months :-) Agile project management with ‘short’ cycles ◦ Weekly meetings to review progress and update plans P. Mato/CERN 13
We need to evaluate some of the existing technologies and design partial prototypes of critical parts ◦ See next slide The idea would be to organize these R&D activities in short cycles ◦ Coordinating the interested people to cover all aspects ◦ Coming with conclusions (yes/no) within few months P. Mato/CERN 14
Investigate current LHC applications to gather requirements ◦ Dependencies, data access patterns, opportunities for concurrency, etc. Investigate design and implementations of state – of-the-art concurrency frameworks ◦ Scheduling (static, dynamic, adaptive), memory model, I/O Prototype framework elements ◦ Identify ‘exemplar’ algorithms to be parallelized ◦ Data structures and memory allocation strategies ◦ New languages (C++11,…) and libraries (OpenCL,…) Understanding I/O P. Mato/CERN 15
P. Mato/CERN LHC shutdown Today R&D, technology evaluation and design of critical parts Complete Prototype I Initial adaptation of LHC and Geant4 applications Complete Prototype II with experience of porting LHC applications First production quality release Project definition
Presented a proposal for the development of the new generic data processing framework with concurrency to exploit new CPU/GPU architectures Geant4 should be a main player providing simulation specific requirements and taking advantage of the new framework ◦ Would imply some re-engineering of Geant4 Need a R&D program to evaluate existing technologies and development of partial prototypes of critical parts Some initial ideas for the project definition being outlined P. Mato/CERN 17