Computing Performance Recommendations #10, #11, #12, #15, #16, #17
Recommendation 10 We recommend extending the scope of the computing professionals to review and optimize all Geant4 code. Periodical meetings over the past two-three years organised at CERN for reviewing computing performance in Geant4, have been replaced by internal Geant4 efforts, in communication with external (experiment) users, inviting experts representatives from the LHC experiments. Such meetings demonstrated useful in exchanging feedback and issues which could raise from studies made within the experiments frameworks. In 2007, software experts from FNAL joined the Geant4 Collaboration with explicit mandate to contribute in code reviews and performance studies on Geant4. Selected classes from particular domains were identified and code reviews organised (*). The suggested fixes as results of the code review have been promptly applied to the code and made available in the most recent releases. (*) See also notes attached to this slide
Recommendation 11 We recommend that Geant4 encourage users to monitor their applications, and provide feedback so additional “hot spots” can be identified. Communicated with users. Issues identified by profiling were addressed by Geant4 FNAL experts, GATE developers and CMS contributors in collaboration with Geant4 developers. Participation of computing experts from the experiments have been encouraged; fixes suggested by CMS in the CMS performance task-force have been evaluated and in most cases applied to the code, and released. Fixes 12-15% boost (CMS) in QGSP_BERT About 15% (GATE) in Low-E physics tables Monitoring by ATLAS Feedback useful to fix issues which occurred
Recommendation 12 We recommend the creation of a performance optimization guide. It is likely that such information already exists and just needs to be collected into one document. Presentations at the Geant4 Workshop 2007 summarised many options available for improving application performance. Improving use of Geant4, better user classes, and for appropriate applications using event biasing. The first draft web page with tips on improving performance is available as a Twiki document.Twiki document A link will be added in the User Documentation (Feb 2009) Once reasonably complete, it is planned to include the information as a separate document or dedicated chapter in the Geant4 Users’ Guide.
Recommendation 15 We recommend systematic tracking of code performance for each part of the code, and for each physics model. Comparisons with previous versions should be an integral part of the release notes. CPU performance of the Geant4 code is already systematically controlled and verified at every public release and/or patch. Benchmark tests have been implemented and grouped to a benchmark suite, to verify CPU performance at different levels (pure geometry and tracking, tracking with magnetic-field, EM and hadronic processes at integration level). Results are compared against previous releases taken as reference, making particular care on the system where tests are built and executed, in order to guarantee that the same system conditions apply when performing the tests, or else re-running the same tests on the older releases as well when this may not be possible. (… continued on next slide)
Recommendation 15 (cont.) In addition to this, experts from the EM working-group and physics validation, execute validation tests based on well-defined physics observables to assess the correctness of the physics results and overall performance of the various physics models. The results of the tests cited above (in particular the physics validation ones) are partially available from the web sites of the EM and hadronic working groups They are NOT available through the release notes (although, any performance issue or relevant improvement is mentioned in the release notes, as part of a dedicated section in the notes) Making a large number or all results/plots/summaries properly formatted to be published in time in the release notes would require a significant effort and cannot be realized with the current manpower and without seriously affecting (delaying) the release schedule A web spreadsheet page is being put in place, summarising the results of the CPU benchmarks run at each release.web spreadsheet page
Recommendation 16 We recommend that Geant4 keep itself abreast of developments in the area of multi cores and advanced instructions, so it can take advantage of them when there is sufficient infrastructure and support to do so. A multi-core version of Geant4 is under development, as part of the PhD Thesis project of Xin Dong, under the supervision of Prof. Gene Cooperman (Northeastern Univ.) who developed the existing event level parallel version of Geant4. Prototype have been created with successive refinements: starting from fork multi- process version (sharing via Linux copy on write): sharing nothing; enabling reuse of parts of Geant4 which consume significant memory, by separating out read-only parts and those which are changed during event simulation - starting from the geometry and the physics tables of key electromagnetic processes progressively. A presentation of the status was made at the 2008 Workshop. A first beta release of multi-core enabled revised version of Geant4 9.1 is proposed for April 2009, with draft documentation for identifying potential parallelisation problems. Within 2009 a second version, based on Geant4 9.2 is planned.
Recommendation 17 We recommend that Geant4 publish a plan regarding the expected computing performance of the toolkit over the next five years. It is very hard to predict the computing performance of future releases of Geant4. The only way we can make an approximate estimate is to utilize the experience of the last 4 years. Using this we forecast that on a single core, with constant hardware, there will be a reduction of around 4-6% per year in CPU time. This would be from code improvements, principally as a result of code reviews of key classes and the corresponding implementation and interface improvements. The uncertainty in this forecast is significant, and we estimate that the resulting reduction over five years could range between 5 and 30%. An additional one-time improvement of order 15% is expected in an area (low energy Livermore EM processes) in ongoing assessment is addressing hot spots identified in collaboration with Geant4 users (GATE developers). (… continued on next slide)
Recommendation 17 (cont.) In general we expect that over the next 5 years, the throughput (simulated events/minute) for a typical application of the Geant4 toolkit will follow the growth curve of available improvements in CPU performance – potentially of a factor between 10 and 30. This will due to more efficient and faster CPUs, the ability to run separate jobs in parallel on multi-core CPUs, and the development of a multi-thread capable variant of the Geant4 code. Our performance benchmarking will be used to identify and address any new bottlenecks from the changed environment of multi-core machines, and the continuing increase of the importance of memory accesses for performance. The development of a multi- core capable variant of Geant4 is an essential part of these plans.