A Thread-Parallel Geant4 with Shared Geometry Gene Cooperman and Xin Dong College of Computer and Information Science Northeastern University 360 Huntington Avenue Boston, MA USA {gene, Jointed with Geant4 Team John Apostolakis … Supported by Openlab program Sverre Jarp
Outline Concept Methodology Implementation
Memory layout for multiple threads TLS: thread local storage At compile time, for any static data declared using __thread, the compiler will reserve space in the TLS of each new thread that is created.
TLS syntax and effect static type variable -> static __thread type variable (global) type variable -> (global) __thread type variable extern type variable -> extern __thread type variable Each thread initializes and holds its own data
First implementation: data replicated for each thread Image size is huge because of multiple copies of data
Outline Concept Methodology Implementation
Multi-threaded Geant4: current implementation Data that is not changed by ProcessOneEvent should be shared.
Three questions for the shared data model 1.Which data can be safely shared? –Data initialized dynamically. –Geant4 source code does not explicitly declare shared data. 2.How do we share the data? –Each instance may contain read-only data members (sharable) and read-write data members –For read-write data members (unshared), C++ does not allow __thread if the data member is non-static. 3.What is the correct way to initialize the worker thread? –Shared data is allocated and initialized by main (master) thread. –Workers make thread-private copies of read-write data members.
1. Which data can be safely shared? Expand ProcessOneEvent until variable access. Unavailable. –Complicated inheritance relationship –Virtual methods Use valgrind to check memory accesses dynamically at runtime. –valgrind --tool=helgrind a.out for checking data races –If two threads pass through and change the same variable without adequate locking, this tool issues an error message. –In the case of fullCMS, it is not practical to check how many data is changed by ProcessOneEvent. –Use unit tests for each module of Geant4 -- especially for the case geometry and navigation.
2. How do we share the data? An example – class G4PVReplica G4PVReplica instance copyno: 0, 1, 2, 3, 4, 5… thread worker 1 thread worker Physical volumes:
Multi-threaded Geant4: first implem. No shared instances; each instance has a unique copyno
Multi-threaded Geant4: current implem. Shared G4PVReplica instances; each thread sees private copyno
3. What is the correct way to initialize the worker thread? For main (master) thread case, initialize data in the standard way. The worker thread begins initialization only after main thread has finished its own initialization. For worker thread case –Run manager skips some initialization routines. For example, it skips construct method of detector construction class. –The worker thread initialize thread-private data only. (For example copyno in the case of G4PVReplica.)
Outline Concept Methodology Implementation
TestG4Navigation1.cc with multiple threads G4VPhysicalVolume *myTopNode; int sleepTime = 10; void *my_worker_thread1(void *waitTime_ptr) { //wait until the first thread finish sleep(*(int *)waitTime_ptr); testG4Navigator1(myTopNode); testG4Navigator2(myTopNode); //sleep forever, so valgrind can analyze it sleep(sleepTime); }
TestG4Navigation1.cc (continued) int main() { myTopNode=BuildGeometry(); // Build the geometry G4GeometryManager::GetInstance()->CloseGeometry(false); pthread_create( &tid1, NULL, my_worker_thread1, &waitTime1); pthread_create( &tid2, NULL, my_worker_thread1, &waitTime2); pthread_join(tid1, NULL); pthread_join(tid2, NULL); }
Start and analyze output Start –valgrind --tool=helgrind --log-file=testG4Navigator1output testG4Navigator1 Analyze output example 1 –==538== Possible data race during write of size 4 at 0x56360A0 –==538== at 0x42B944: G4PVReplica::SetCopyNo(int) (G4PVReplica.cc:180) –==538== by 0x4191E7: G4ParameterisedNavigation::LevelLocate(G4NavigationHistory&, G4VPhysicalVolume const*, int, CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const*, bool, CLHEP::Hep3Vector&) (G4ParameterisedNavigation.cc:636) –==538== Old state: owned exclusively by thread #2 –==538== New state: shared-modified by threads #2, #3 –==538== Reason: this thread, #3, holds no locks at all
Start and analyze output (continued) Analyze output example 2 –==538== Possible data race during write of size 8 at 0x5635F68 –==538== at 0x415218: G4LogicalVolume::SetSolid(G4VSolid*) (G4LogicalVolume.icc:217) –==538== by 0x419201: G4ParameterisedNavigation::LevelLocate(G4NavigationHistory&, G4VPhysicalVolume const*, int, CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const*, bool, CLHEP::Hep3Vector&) (G4ParameterisedNavigation.cc:641) –==538== Old state: shared-readonly by threads #2, #3 –==538== New state: shared-modified by threads #2, #3 –==538== Reason: this thread, #3, holds no consistent locks –==538== Location 0x5635F68 has never been protected by any lock
Start and analyze output (continued) Analyze output example 3 –==538== Possible data race during write of size 8 at 0x5634E18 –==538== at 0x40B1FD: G4Box::SetXHalfLength(double) (G4Box.cc:118) –==538== by 0x407E6D: G4LinScale::ComputeDimensions(G4Box&, int, G4VPhysicalVolume const*) const (testG4Navigator1.cc:67) –==538== Old state: owned exclusively by thread #2 –==538== New state: shared-modified by threads #2, #3 –==538== Reason: this thread, #3, holds no locks at all
Shared instances in geometry Only these three geometry classes are currently shared Physical volumes –G4VPhysicalVolume Thread private data members: G4RotationMatrix *frot; G4ThreeVector ftrans; –G4PVReplica Thread private data members: G4int fcopyNo; Logical volumes –Thread private data members: G4Material* fMaterial; G4VSolid* fSolid; G4MaterialCutsCouple* fCutsCouple; G4VSensitiveDetector* fSensitiveDetector; G4Region* fRegion; Solids –We may need more copies for each solid used by G4Parameterised.
Share logical volumes: step 1 ADD A NEW CLASS ADDED: class G4LogicalVolumePrivateData { public: G4Material* fMaterial; G4VSolid* fSolid; G4MaterialCutsCouple* fCutsCouple; G4VSensitiveDetector* fSensitiveDetector; G4Region* fRegion; }; class G4LogicalVolume {…} In class G4LogicalVolume, delete all thread private data members.
Share logical volumes: step 2 CREATE NEW CLASS class G4LogicalVolumeObjectCounter { public: PrivateObjectManager* shadowOffset; //shadow pointer for offset static __thread PrivateObjectManager* offset; int AddNew() {...} void WorkerCopy() {...} void FreeWorker() {...} }
Share logical volumes: step 3 ADD TWO DATA MEMBERS TO G4LogicalVolume static G4LogicalVolumeObjectCounter G4LogicalVolume::objectCounter; int G4LogicalVolume::objectOrder; MODIFY ALL CONSTRUCTORS OF G4LogicalVolume G4LogicalVolume::G4LogicalVolume(…) { objectOrder = objectCounter.AddNew(); //allocatePrivateData … //initialize in similar way to constructor … }
Share logical volumes: step 4 Redefine the read-write data members to make them thread- private #define fMaterial (objectCounter.offset[objectOrder]->fMaterial) We create a new static, thread local array: objectCounter.offset. objectOrder is the unique instance ID described in the concept slides.
Worker logical volumes: step 5 Worker starts after master has initialized all data. 1. When a worker starts, it copies offset content from main thread using method WorkerCopy() of G4LogicalVolumeObjectCounter 2. For each logical volume, call worker constructor to allocate memory space for thread-private data initialize them. 3. In some cases, thread-private data is constant and can be shared by all threads. Then, one just skips the above step.
Share logical volumes final results
Share physical volumes
The solid for a G4Parameterised instance
Physics tables: the other large consumer of memory in Geant4
Questions?
Thank you!