Geant4 v9.4 Kernel III Makoto Asai (SLAC) Geant4 Tutorial Course.

Geant4 v9.4 Kernel III Makoto Asai (SLAC) Geant4 Tutorial Course

Contents Parallel geometry Moving objects Fast simulation (Shower parameterization) Computing performance Kernel III - M.Asai (SLAC)2

Geant4 v9.4 Parallel geometry

Parallel navigation In the previous versions, we have already had several ways of utilizing a concept of parallel world. But the usages are quite different to each other. –Ghost volume for shower parameterization assigned to G4GlobalFastSimulationManager –Readout geometry assigned to G4VSensitiveDetector –Importance field geometry for geometry importance biasing assigned to importance biasing process –Scoring geometry assigned to scoring process We merge all of them into common parallel world scheme. –Readout geometry for sensitive detector will be kept for backward compatibility. –Other current “parallel world schemes” became obsolete. Kernel III - M.Asai (SLAC)4

Parallel navigation Occasionally, it is not straightforward to define sensitivity, importance or envelope to be assigned to volumes in the mass geometry. –Typically a geometry built machinery by CAD, GDML, DICOM, etc. has this difficulty. New parallel navigation functionality allows the user to define more than one worlds simultaneously. –New G4Transportation process sees all worlds simultaneously. –A step is limited not only by the boundary of the mass geometry but also by the boundaries of parallel geometries. –Materials, production thresholds and EM field are used only from the mass geometry. –In a parallel world, the user can define volumes in arbitrary manner with sensitivity, regions with shower parameterization, and/or importance field for biasing. Volumes in different worlds may overlap. Kernel III - M.Asai (SLAC)5

Parallel navigation G4VUserParrallelWorld is the new base class where the user implements a parallel world. –The world physical volume of the parallel world is provided by G4RunManager as a clone of the mass geometry. –All UserParallelWorlds must be registered to UserDetectorConstruction. –Each parallel world has its dedicated G4Navigator object, that is automatically assigned when it is constructed. Though all worlds will be comprehensively taken care by G4Transportation process for their navigations, each parallel world must have its own process to achieve its purpose. –For example, in case the user defines a sensitive detector to a parallel world, a process dedicated to this world is responsible to invoke this detector. G4SteppingManager sees only the detectors in the mass geometry. The user has to have G4ParallelWorldScoringProcess in his physics list. Kernel III - M.Asai (SLAC)6

exampleN07 Mass geometry –sandwich of rectangular absorbers and scintilators Parallel scoring geometry –Cylindrical layers Kernel III - M.Asai (SLAC)7

Defining a parallel world main() (exampleN07.cc) G4VUserDetectorConstruction* geom = new ExN07DetectorConstruction; G4VUserParallelWorld* parallelGeom = new ExN07ParallelWorld("ParallelScoringWorld"); geom->RegisterParallelWorld(parallelGeom); runManager->SetUserInitialization(geom); –The name defined in the G4VUserParallelWorld constructor is used as the physical volume name of the parallel world, and must be used for G4ParallelWorldScoringProcess (next slide). void ExN07ParallelWorld::Construct() G4VPhysicalVolume* ghostWorld = GetWorld(); G4LogicalVolume* worldLogical = ghostWorld->GetLogicalVolume(); –The world physical volume (“ghostWorld”) is provided as a clone of the world volume of the mass geometry. The user cannot create it. –You can fill contents regardless of the volumes in the mass geometry. –Logical volumes in a parallel world needs not to have a material. Kernel III - M.Asai (SLAC)8

G4ParallelWorldScoringProcess void ExN07PhysicsList::ConstructProcess() { AddTransportation(); ConstructParallelScoring(); ConstructEM(); } void ExN07PhysicsList::ConstructParallelScoring() { G4ParallelWorldScoringProcess* theParallelWorldScoringProcess = new G4ParallelWorldScoringProcess("ParaWorldScoringProc"); theParallelWorldScoringProcess->SetParallelWorld("ParallelScoringWorld"); theParticleIterator->reset(); while( (*theParticleIterator)() ){ G4ProcessManager* pmanager = theParticleIterator->value()->GetProcessManager(); pmanager->AddProcess(theParallelWorldScoringProcess); pmanager->SetProcessOrderingToLast(theParallelWorldScoringProcess, idxAtRest); pmanager->SetProcessOrdering(theParallelWorldScoringProcess, idxAlongStep, 1); pmanager->SetProcessOrderingToLast(theParallelWorldScoringProcess, idxPostStep); } Kernel III - M.Asai (SLAC)9 G4ParallelWorldScoringProcess must be defined after G4Transportation but prior to any EM processes. Name of the parallel world defined by G4VUserParallelWorld constructor AlongStep must be 1, while AtRest and PostStep must be last

Geant4 v9.4 Moving objects

11 + + + …… 4D RT Treatment Plan Source: Lei Xing, Stanford University Upper Jaws Lower Jaws MLC Ion chamber Kernel III - M.Asai (SLAC)

Moving objects In some applications, it is essential to simulate the movement of some volumes. –E.g. particle therapy simulation Geant4 can deal with moving volume –In case speed of the moving volume is slow enough compared to speed of elementary particles, so that you can assume the position of moving volume is still within one event. Two tips to simulate moving objects : 1.Use parameterized volume to represent the moving volume. 2.Do not optimize (voxelize) the mother volume of the moving volume(s). Kernel III - M.Asai (SLAC)12

Moving objects - tip 1 Use parameterized volume to represent the moving volume. –Use event number as a time stamp and calculate position/rotation of the volume as a function of event number. void MyMovingVolumeParameterisation::ComputeTransformation (const G4int copyNo, G4VPhysicalVolume *physVol) const { static G4RotationMatrix rMat; G4int eID = 0; const G4Event* evt = G4RunManager::GetRunManager()->GetCurrentEvent(); if(evt) eID = evt->GetEventID(); G4double t = 0.1*s*eID; G4double r = rotSpeed*t; G4double z = velocity*t+orig; while(z>0.*m) {z-=8.*m;} rMat.set(CLHEP::HepRotationX(-r)); physVol->SetTranslation(G4ThreeVector(0.,0.,z)); physVol->SetRotation(&rMat0); } Kernel III - M.Asai (SLAC)13 Null pointer must be protected. This method is also invoked while geometry is being closed at the beginning of run, i.e. event loop has not yet began. You are responsible not to make the moving volume get out of (protrude from) the mother volume. Here, event number is converted to time. (0.1 sec/event) Position and rotation are set as the function of event number.

Moving objects - tip 2 Do not optimize (voxelize) the mother volume of the moving volume(s). –If moving volume gets out of the original optimized voxel, the navigator gets lost. motherLogical -> SetSmartless( number_of_daughters ); –With this method invocation, the one-and-only optimized voxel has all daughter volumes. –For the best performance, use hierarchal geometry so that each mother volume has least number of daughters. Kernel III - M.Asai (SLAC)14

Geant4 v9.4 Fast simulation (shower parameterization)

Fast simulation - Generalities Fast Simulation, also called as shower parameterization, is a shortcut to the "ordinary" tracking. Fast Simulation allows you to take over the tracking and implement your own "fast" physics and detector response. The classical use case of fast simulation is the shower parameterization where the typical several thousand steps per GeV computed by the tracking are replaced by a few ten of energy deposits per GeV. Parameterizations are generally experiment dependent. Geant4 provides a convenient framework. Kernel III - M.Asai (SLAC)16

Parameterization features Parameterizations take place in an envelope. An envelope is a region, that is typically a mother volume of a sub-system or of a major module of such a sub-system. Parameterizations are often dependent to particle types and/or may be applied only to some kinds of particles. They are often not applied in complicated regions. Kernel III - M.Asai (SLAC)17  e

Models and envelope Concrete models are bound to the envelope through a G4FastSimulationManager object. This allows several models to be bound to one envelope. The envelope is simply a G4Region which has G4FastSimulationManager. All [grand[…]]daughter volumes will be sensitive to the parameterizations. A model may returns back to the "ordinary" tracking the new state of G4Track after parameterization (alive/killed, new position, new momentum, etc.) and eventually adds secondaries (e.g. punch-through) created by the parameterization. Kernel III - M.Asai (SLAC)18 G4FastSimulationManager ModelForElectrons ModelForPions « envelope » (G4Region) G4LogicalVolume

Fast Simulation The Fast Simulation components are indicated in white. When the G4Track comes in an envelope, the G4FastSimulationManagerProcess looks for a G4FastSimulationManager. If one exists, at the beginning of each step in the envelope, each model is asked for a trigger. In case a trigger is issued, the model is applied at the point the G4track is. Otherwise, the tracking proceeds with a normal tracking. Kernel III - M.Asai (SLAC)19 G4FastSimulationManager ModelForElectrons ModelForPions G4LogicalVolume Multiple Scattering G4Transportation G4FastSimulationManagerProcess Process xxx G4Track G4ProcessManager Placements Envelope (G4LogicalVolume)

G4FastSimulationManagerProcess The G4FastSimulationManagerProcess is a process providing the interface between the tracking and the fast simulation. It has to be set to the particles to be parameterized: – The process ordering must be the following: [n-3] … [n-2] Multiple Scattering [n-1] G4FastSimulationManagerProcess [ n ] G4Transportation –It can be set as a discrete process or it must be set as a continuous & discrete process if using ghost volumes. Kernel III - M.Asai (SLAC)20

Geant4 v9.4 Computing performance

Different levels of parallelism Job execution = O(1~10) runs –X-section data files, external geometry description Run = O(10^2~10^9) events –X-sections in memory, optimized geometry in memory, histograms –Event loop Event = O(10~10^9) tracks –Primary tracks, secondary tracks –Hits, score Track = O(1~10^3) steps –Travelling in geometry –Generating secondary tracks Step –Geometrical navigation –Physics processes –Hits, score Kernel III - M.Asai (SLAC)22  Run parallelism  Event parallelism  Track parallelism  Multi-job

DIANE (DIstributed ANalysis Environment) DIANE is a tool which helps application communities and smaller Virtual Organizations using the distributed computing infrastructures more efficiently. The automatic control and scheduling of computations on a set of distributed worker nodes leads to an improvement of the quality of service of the EGEE/LCG Grid. –http://it-proj-diane.web.cern.ch/it-proj-diane/http://it-proj-diane.web.cern.ch/it-proj-diane/ This is a “multi-job” approach based on GRID environment, Geant4 offers one example to illustrate the use of DIANE. Similar approaches with commercial Cloud computing facilities are seen in company users. Kernel III - M.Asai (SLAC)23

MPI (Message Passing Interface) MPI is a language-independent communications protocol used to program parallel computers. MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today. MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. Geant4 offers a built-in MPI layer. Currently, LAM, MPICH2 and Open MPI are supported. Geant4 also offers a couple of examples which illustrate the use of MPI. “Run parallelism” approach. For example of exMPI01 in geant4.9.3/examples/extended/parallel/MPI, core i7 took 122 seconds (single thread), 62 seconds (2 threads) and 34 seconds (4 threads) wall clock time. Kernel III - M.Asai (SLAC)24

Kernel III - M.Asai (SLAC)25

TOP-C (Task Oriented Parallel C/C++) TOP-C is a tool of “task-oriented” master-slave architecture to make an application parallelized with a distributed memory model based on MPI. –Shared memory model (thread-based for a multi-processor node) is under development. See later slides. TOP-C is developed and maintained by G. Cooperman and his team at Northeastern University. –http://www.ccs.neu.edu/home/gene/topc.htmlhttp://www.ccs.neu.edu/home/gene/topc.html Geant4 offers a couple of examples to illustrate the use of TOP-C. “Event parallelism” approach. Kernel III - M.Asai (SLAC)26

Parallelism GRID (and Cloud) is surely a valid option. But it is outside of parallelization of Geant4 itself. –Use it if you have such an environment. For parallelism inside Geant4, we can qualitatively say : Kernel III - M.Asai (SLAC)27 Run parallelismEvent parallelismTrack parallelism Network traffic+ (less traffic)- (more traffic)--- (much more traffic) Load distribution - +??? Advantage forSimpler geometry Lower energy Complicated geometry Higher energy Ultra-high energy ???

An issue for multi-threading One of advantages of multi-thread / multi-core is efficient memory consumption. –MPI approach basically requires full copy of memory space for each slave. –For example, large x-section tables and complicated geometry could be shared by threads. X-section table in Geant4 has a caching mechanism. –Once a track accesses to the x-section for a certain particle in a certain material at a certain energy, the next access is likely for the same particle in the same material and at the nearby energy. –This caching mechanism never works if a table is shared by threads. Geant4 geometry is “dynamic”. –To reduce the memory size required for complicated geometry, Geant4 has a concept of “parameterized volume”. A volume returns its position, rotation, material, shape, size, etc. as a function of the “copy number”. And the copy number and some of these attributes are cashed. –This caching mechanism never works if geometry is shared by threads. Kernel III - M.Asai (SLAC)28

Geant4 approach Making cashes thread-local using TOP-C shared memory model. TLS (Thread-local-storage) –static/global variables to thread-local with “__thread” (gcc) –Automatic TLS conversions with patched C++ parser. For non-thread-safe variables –Lock with mutex (mutual exclusion) : potential performance bottle neck Development of semi-automatic conversion tool. The first Geant4MT beta-release (based on Geant4 version 9.4) is foreseen within a couple of months. Kernel III - M.Asai (SLAC)29

Kernel III - M.Asai (SLAC)30

Medium/longer term developments Year 2011 –More automatised conversion of Geant4  Geant4MT –More thread-safety check of STL and CLHEP –Expecting (caching mechanism of) x-section tables and geometry to be fully multi-threaded By splitting class data members so that R/W data members to be thread local Year 2012-2013 (-2014?) –Major architectural revision Moving “dynamic” components of x-section tables and geometry to track object –Planning all the x-section tables and geometry to be fully shared by threads Kernel III - M.Asai (SLAC)31

General purpose GPU? Though the new Fermi Architecture supports C++, it supports only for the data processing. It does not yet support object instantiation/deletion in GPU. –It may/will support at PTX 2.0. –But what do we do for secondary tracks? Size of L1 cach (16/48 KB) and L2 cache (768 KB) are too small. Accessing to the main memory is too costly (>>100 Cycles). –Calculating x-section for every step is faster than accessing to x-section tables. How do we do for date-driven tables? –Sharing complicated geometry in main memory does not offer any benefit. Could our user live with just replicated boxes? GPU seems not to be feasible, at least for the near future, to a particle-transport type Monte Carlo simulation like Geant4. –“Density- or probability-transport” calculation with simplest geometry may fit to GPGPU. Kernel III - M.Asai (SLAC)32

Geant4 v9.4 Tips for computing performance

Some tips to consider - 1 We are making our best effort to improve the speed of Geant4 toolkit. But, since it is a toolkit, a user may also make the simulation unnecessarily slow. For general applications –Check methods which are invoked frequently, e.g. UserSteppingAction(), ProcessHits(), ComputeTransformation(), GetField() etc. –In such methods, avoid string manipulation, file access or cout, unnecessary object instantiation or deletion, or unnecessary massive polynomial calculation such as sin(), cos(), log(), exp(). For relatively complex geometry or high energy applications –Kill unnecessary secondary particles as soon as possible. –Use stacking action wisely. Abort unnecessary events at the earliest stage. –Utilize G4Region for regional cut-offs, user limits. –For geometry, consider replica rather than parameterized volume as much as possible. Also consider nested parameterization. –Do not keep too many trajectories. For relatively simple geometry or low energy applications –Do not store the random number engine status for each event. Kernel III - M.Asai (SLAC)34

Some tips to consider - 2 Chop out unnecessary objects in memory. This is not only the issue of memory size of your CPU, but also the matter of cache-hit rate. –By default cross-section tables of EM processes are built for the energy range of 0.1 keV to 10 TeV. If your simulation does not require higher energies, cut higher part out. Do not change the granularity of sampling bins (7 bins per decade). –Delete unnecessary materials. –Limit size (number of bins) of scoring meshes. If you believe your simulation is unnecessarily slow, your application may have: –Memory leak –Geometry overlap Kernel III - M.Asai (SLAC)35

Geant4 v9.4 Kernel III Makoto Asai (SLAC) Geant4 Tutorial Course.

Similar presentations

Presentation on theme: "Geant4 v9.4 Kernel III Makoto Asai (SLAC) Geant4 Tutorial Course."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Geant4 v9.4 Kernel III Makoto Asai (SLAC) Geant4 Tutorial Course.

Similar presentations

Presentation on theme: "Geant4 v9.4 Kernel III Makoto Asai (SLAC) Geant4 Tutorial Course."— Presentation transcript:

Similar presentations

About project

Feedback