S. Guatelli, A. Mantero, J. Moscicki, M. G. Pia Geant4 medical simulations in a distributed computing environment 4th Workshop on Geant4 Bio-medical Developments Geant4 Physics Validation INFN Genova, July 2005
Outline Problem: how to obtain quick response Brief introduction of DIANE How to parallelize a Geant4 application Project –Parallelization of two medical physics Geant4 applications Brachytherapy IMRT application –Study of the performance Using a dedicated cluster Using the GRID First results Work in progress
Execution time - Brachy Number of events for a sufficient statistic for a dosimetric study of a single brachytherapic source: 20 M events Execution time of 20 M events on a Pentium IV, 3 GHz –16650 s ~ 5 h Clinical use: quick response means order of minutes
Execution times – IMRT Number of events for a sufficient statistic: 10 9 events Execution time of 10 9 events: – s ~ 228 h ~ 9 days and half Quick response required for clinical use
Speed adequate for clinic use Transparent configuration in sequential or parallel mode Transparent access to the GRID through an intermediate software layer Parallelisation Access to distributed computing resources
Project Parallelization of the Geant4 IMRT and Brachytherapy application Parallelization through DIANE Performance test –Run on a single machine –Run on a dedicated cluster –Run on the GRID
DIANE DIstributed ANalysis Environment speed OK but expensive hardware investment + maintenance IRCC LAN SWITCHSWITCH Node01 Node02 Node03 Node04 IMRT Geant4 Simulation and Anaphe Analysis on a dedicated Beowulf Cluster S. Chauvie et al., IRCC Torino, Siena 2002 Previous studies for parallelization of a Geant4 based medical application Alternative strategy DIANE ParallelisationAccess to the GRID Transparent access to a distributed computing environment
DIANE DIstributed ANalysis Environment prototype for an intermediate layer between applications and the GRID Parallel cluster processing – make fine tuning and customisation easy – transparently using GRID technology – application independent Hide complex details of underlying technology Developed by J. Moscicki, CERN
Practical example How to “dianize” the Geant4 application: –Look the Geant4 extended example: ExDIANE in the parallel directory –Completely transparent to the user: same G4 code –Documentation at specific for Geant4 applications availablewww.cern.ch/diane/
Run through DIANE #-*-python-*- Application = "G4Analysis" WorkerInitData = { 'G4ApplicationComponentName' : "G4MedLinac", ‘initMacroFile' : ""“ /control/verbose 1/run/verbose 1 /control/saveHistory /run/initialize/tracking/storeTrajectory 1 /Jaws/X1/DistanceFromAxis -5.0 cm /Jaws/X2/DistanceFromAxis 5.0 cm /Jaws/Y1/DistanceFromAxis -5.0 cm/ Jaws/Y2/DistanceFromAxis 5.0 cm /Jaws/update/energy 6.0 MeV /sourceType MeV """} JobInitData = { 'runParams' : { 'seed' : 0, 'eventNumber' : , 'macroFileTemplate': "/run/beamOn " }, 'eventsPerWorker' :10000, 'workerInit' : WorkerInitData } Example of a macro file
Run on parallel mode on a dedicated cluster Type the command: –Diane.startjob –j macrofileName.mac –wms=IPLIST IPLIST if a file name containing the list of the names of the machines the user intends to use
Practical Example example: Geant4 simulation with analysis the total number of events of the simulation is divided in tasks each task produces a file with histograms job result = sum of histograms produced by tasks master-worker model client starts a job workers perform tasks and produce histograms master integrates the results
Resources of this project Resources of this project Dedicated cluster of 4 pc’s in Genova (Pentium IV, 3 GHz) –For preliminary tests Dedicated 30 pc’s (biprocessors) cluster (Xeon, 2.8 GHz) –Thanks to H.C. Lee, Academia Sinica Computing Center, Taiwan LSF cluster at CERN –To study a real case of running on a cluster, used by more users GRID –Run on a distributed computing environment
Running on a single CPU To study the efficiency of DIANE Plot of: with respect to the number of events Execution time using DIANE means running sequentially the dianized simulation, dividing the job in tasks Useful study to optimize the number of events for task Brachy-Iridium source simulation Overhead of DIANE Preliminary
Preliminary results Running on a dedicated cluster –Divide the total number of events in tasks –Dispatch the tasks on more workers Execution times of the Brachy with respect to the number of CPU’s used No merging of the output files of the simulations
Preliminary results Running on a dedicated cluster Efficiency: The efficiency is higher with higher number of tasks N is the number of CPUs Preliminary
Performance Optimization Why the efficiency is higher with higher number of tasks The execution time is bigger because there is still one task to end Splitting the job in more tasks increases the balance in execution times of the workers
Work in progress Optimization of the method to merge the output files of the tasks –In the present situation the merging introduces a significant overhead on the results –We found problems with adding histograms with PI –A. Pfeiffer and L. Moneta are helping in this last task Last step: running on the GRID Refine the results