LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics 2013 Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna
NICA scheme Gertsenberger K.V.2 MMCP’2013
Multipurpose Detector (MPD) The software MPDRoot is developed for the event simulation, reconstruction and physical analysis of the heavy ions’ collision registered by MPD at the NICA collider. 3Gertsenberger K.V. MMCP’2013
Prerequisites of the NICA cluster high interaction rate (to 6 KHz) high particle multiplicity, about 1000 charged particles for the central collision at the NICA energy one event reconstruction takes tens of seconds in MPDRoot now, 1M events – months large data stream from the MPD: 100k events ~ 5 TB k events ~ 5 PB/year unified interface for parallel processing and storing of the event data 4Gertsenberger K.V. MMCP’2013
Development of the NICA cluster 2 main lines of the development: data storage development for the experiment organization of parallel processing of the MPD events 5 development and expansion distributed cluster for the MPD experiment based on LHEP farm development and expansion distributed cluster for the MPD experiment based on LHEP farm Gertsenberger K.V. MMCP’2013
Current NICA cluster in LHEP for MPD 6Gertsenberger K.V. MMCP’2013
Distributed file system GlusterFS aggregates the existing file systems in common distributed file system automatic replication works as background process background self-checking service restores corrupted files in case of hardware or software failure implemented on application layer and working in user space 7Gertsenberger K.V. MMCP’2013
Data storage on the NICA cluster 8Gertsenberger K.V. MMCP’2013
Development of the distributed computing system PROOF server parallel data processing in a ROOT macro on the parallel architectures NICA cluster concurrent data processing on cluster nodes MPD-scheduler scheduling system for the task distribution to parallelize data processing on cluster nodes 9Gertsenberger K.V. MMCP’2013
Parallel data processing with PROOF PROOF (Parallel ROOT Facility) – the part of the ROOT software, no additional installations PROOF uses data independent parallelism based on the lack of correlation for MPD events good scalability Parallelization for three parallel architectures: 1.PROOF-Lite parallelizes the data processing on one multiprocessor/multicores machine 2.PROOF parallelizes processing on heterogeneous computing cluster 3.Parallel data processing in GRID Transparency: the same program code can execute both sequentially and concurrently 10Gertsenberger K.V. MMCP’2013
Using PROOF in MPDRoot The last parameter of the reconstruction: run_type (default, “local”). Speedup on the user multicore machine: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof”) parallel processing of 1000 events with thread count being equal logical processor count $ root reco.C (“evetest.root”, “mpddst.root”, 0, 500, “proof:workers=3”) parallel processing of 500 events with 3 concurrent threads Speedup on the NICA cluster: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, parallel processing of 1000 events on all cluster nodes of PoD farm $ root reco.C (“eve”, “mpddst”, 0, 500, parallel processing of 500 events on PoD cluster with 10 workers 11Gertsenberger K.V. MMCP’2013
Speedup of the reconstruction on 4-cores machine 12Gertsenberger K.V. MMCP’2013
PROOF on the NICA cluster 13Gertsenberger K.V. MMCP’2013 proof proof = master server proof = slave node *.root GlusterFS Proof On Demand Cluster (8) (16) (24) (32) $ root reco.C(“evetest.root”,”mpddst.root”, 0, 3, event count evetest.root event №0 event №1 event №2 mpddst.root
Speedup of the reconstruction on the NICA cluster 14Gertsenberger K.V. MMCP’2013
MPD-scheduler Developed on C++ language with ROOT classes support. Uses scheduling system Sun Grid Engine (qsub command) for execution in cluster mode. SGE combines cluster machines on LHEP farm into the pool of worker nodes with 78 logical processors. The job for distributed execution on the NICA cluster is described and passed to MPD-scheduler as XML file: $ mpd-scheduler my_job.xml 15Gertsenberger K.V. MMCP’2013
Job description 16 The description starts and ends with tag. Tag sets information about macro being executed by MPDRoot Tag defines files to process by macro above Tag describes run parameters and allocated resources * mpd.jinr.ru – server name with production database Gertsenberger K.V. MMCP’2013
Job execution on the NICA cluster 17Gertsenberger K.V. MMCP’ Gertsenberger K.V. MMCP’2013 SGE SGE = Sun Grid Engine server SGE = Sun Grid Engine worker *.root GlusterFS SGE batch system (8) (16) (24) (32) qsub evetest1.root SGE MPD-scheduler evetest2.root evetest3.root free free busy mpddst2.root job_reco.xml job_command.xml mpddst1.root mpddst3.root job_command.xml
Speedup of the one reconstruction on NICA cluster 18Gertsenberger K.V. MMCP’2013
NICA cluster section on mpd.jinr.ru 19Gertsenberger K.V. MMCP’2013
Conclusions The distributed NICA cluster was deployed based on LHEP farm for the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Torque, Maui). 128 cores The data storage was organized with distributed file system GlusterFS: /nica/mpd[1-8]. 10 TB PROOF On Demand cluster was implemented to parallelize event data processing for the MPD experiment, PROOF support was added to the reconstruction macro. The system for the distributed job execution MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. The web site mpd.jinr.ru in section Computing – NICA cluster presents the manuals for the systems described above. 20Gertsenberger K.V. MMCP’2013
LOGO
Analytical model for parallel processing on cluster 22 speedup for point (data independent) algorithm of image processing P node – count of logical processors, n – data to process (byte), В D – speed of the data access (MB/s), T 1 – “pure” time of the sequential processing (s) Gertsenberger K.V. MMCP’2013
Prediction of the NICA computing power 23 How many are logical processors required to process N TASK physical analysis tasks and one reconstruction within T day days in parallel? If n 1 = 2 MB, N EVENT = events, T PA = 5 s/event, T REC = 10 s/event., B D = 100 MB/s, T day = 30 days Gertsenberger K.V. MMCP’2013