Presentation is loading. Please wait.

Presentation is loading. Please wait.

LOGO Mock Data Challenge for the MPD experiment on the NICA cluster Potrebenikov Yu. K., Schinov B. G., Rogachevsky O.V., Gertsenberger K. V. Laboratory.

Similar presentations


Presentation on theme: "LOGO Mock Data Challenge for the MPD experiment on the NICA cluster Potrebenikov Yu. K., Schinov B. G., Rogachevsky O.V., Gertsenberger K. V. Laboratory."— Presentation transcript:

1 LOGO Mock Data Challenge for the MPD experiment on the NICA cluster Potrebenikov Yu. K., Schinov B. G., Rogachevsky O.V., Gertsenberger K. V. Laboratory of High Energy Physics, JINR for MPD collaboration The 7th International Conference "Distributed Computing and Grid- technologies in Science and Education“ (GRID 2016) 5 July 2016

2 NICA accelerator complex 25 July 2016Gertsenberger K.V. Commissioning: 2020

3 MPD and MpdRoot software The software MpdRoot is developed for the MPD event simulation, reconstruction of experimental or simulated data and following physical analysis of heavy ion collisions registered by the MultiPurpose Detector at the NICA collider. (based on ROOT and FairRoot) 3 The MpdRoot software is available in the GitLab https://git.jinr.ru/nica/mpdroot 5 July 2016Gertsenberger K.V.

4  high interaction rate (up to 7 KHz)  high particle multiplicity, up to 1000 charged particles for the central collision at the NICA energy  one event reconstruction can take a lot of time in MpdRoot now  large data stream from the MPD: is estimated at 5-10 PB of raw data per year 100m simulated events ~ 5 PB  MPD event data can be processed concurrently! 4 Prerequisites of the parallel processing/storing 5 July 2016Gertsenberger K.V.

5 Possible components of distributed system  data storage for the experiment  parallel MPD event processing in a ROOT macro on the parallel architectures  batch system for MPD task distribution  monitoring system to view state of cluster nodes and tasks  database with experimental and simulated data for offline processing  user interfaces to manage databases and data processing 55 July 2016Gertsenberger K.V.

6 Current NICA cluster (prototype) 6  5 July 2016Gertsenberger K.V. B.G. Schinov

7 Current data storage on the NICA cluster 7 GlusterFS distributed file system free and open source aggregates existing file systems in a common distributed file system has no metadata server automatic replication works as background process background self-checking service restores corrupted files in case of hardware or software failure 5 July 2016Gertsenberger K.V. B.G. Schinov

8 The purposes of Mock Data Challenge (MDC) 8  provides large scale simulation production  exercise the full spectrum of experiment software and hardware from simulation through to physics analysis  good for stress-testing of distributed computing infrastructure of the experiment  identifies potential issues before first data  helps to estimate offline computing requirements  data quality checking  prepares quick turnaround from data taking to publication 5 July 2016Gertsenberger K.V.

9 MPD DAQ data flow (plans) 9  Database FLP First Level Processor Data Check Flow Control Formatting DRE EvM Event Monitor FT REC Fast Event Reconstruction (Fast Tracking) HLT High Level L3 Trigger RQ Raw Data Quality Check HIST Online Histograms TDS Transient Data Storage PDS Permanent Data Storage EvB Event Builder Buffering Sorting Distribution On-line Processing DAQ LDC Off-line Processing Rootifier ROOT Format Mapping Alignment REC Event Reconstruction PA Physics Analysis NICA cluster matched from “DAQ software in MPD experiment NICA”, I.A. Filippov, I.V. Slepnev, NEC’2015 conference

10 MpdRoot offline processing 10 mpddst.root physical analysis reconstruction simulation mpd_digits.root mpddst.root DAQ Data Storage raw data in MPD format evetest.root Event Generators UrQMD, QGSM, Pythia… generator.data digitizer mpd_run.data reconstruction physical analysis DST format 5 July 2016Gertsenberger K.V. Geant3/4, Fluka…

11 MpdRoot simulation chain Analysis as for experimental data Analysis as for experimental data UrQMD LAQGSM Pythia… Event generator simulate physics process (quantum mechanics and probabilities) Event generator simulate physics process (quantum mechanics and probabilities) Geant3 Geant4 Fluka… Simulation simulate interaction with media and detector materials Simulation simulate interaction with media and detector materials Digitization translate interactions with detectors into clusters of signals Digitization translate interactions with detectors into clusters of signals Reconstruction as for experimental data Reconstruction as for experimental data MpdRoot Interaction of interest Geometry of the system Materials used Particles of interest Generation of test events of particles Interactions of particles with matter and EM fields Response to detectors Records of energies and tracks Analysis of the full simulation at whatever detail you like Visualization of the detector system and tracks Clustering Hits reconstruction in subdetectors Tracks reconstruction Searching for track candidates in main tracker Track propagation using Kalman filter Matching with other detectors Vertex finding Particles identification Phases of QCD matter at high baryon density Hydrodynamics and hadronic observables Femtoscopy, correlations and fluctuations Local P and CP violation in hot QCD matter Cumulative processes Polarization effects and spin physics Hypernuclei production in heavy ion collisions and many others… 5 July 2016Gertsenberger K.V.11

12 Data storage levels 5 July 2016Gertsenberger K.V.12

13 The Unified Database scheme The Unified Database scheme pointer to RAW storage level pointer to SIM storage level 5 July 2016Gertsenberger K.V.13 PostgreSQL implementation

14 The simulation storage Web-interface by Ivan Slepov 5 July 2016Gertsenberger K.V.14 File location: NICA cluster Event generator: UrQMD, QGSM …

15 Parallel MPD event processing for MDC PROOF server parallel event data processing in ROOT macros on the parallel architectures MPD-Scheduler scheduling system for task distribution to parallelize MPD data processing on the cluster nodes 15 NICA cluster concurrent data processing on cluster nodes 5 July 2016Gertsenberger K.V.

16 Parallel data processing with PROOF  PROOF (Parallel ROOT Facility) is a part of the ROOT software, no additional installations  PROOF uses data independent parallelism based on the lack of correlation for MPD events  good scalability  Parallelization for three parallel architectures: 1.PROOF-Lite parallelizes the data processing on one multiprocessor/multicores machine 2.PROOF parallelizes processing on heterogeneous computing cluster 3.Parallel data processing in GRID system 165 July 2016Gertsenberger K.V.

17 The speedup of the reconstruction on 4-cores 175 July 2016Gertsenberger K.V.

18 PROOF on the NICA cluster (file splitting mode) 18 proof proof = master server proof = slave node *.root GlusterFS Proof On Demand cluster (64 cores) $ root reco.C(“evetest.root”,”mpddst.root”, 0, 3, “proof:mpd@nc10.jinr.ru:21001”) event count evetest.root event №1 event №2 mpddst.root event №0 proof 5 July 2016Gertsenberger K.V.

19 The speedup on the NICA cluster 195 July 2016Gertsenberger K.V.

20 Parallel MPD event processing for MDC PROOF server parallel data processing in ROOT macros on the parallel architectures MPD-Scheduler scheduling system for task distribution to parallelize MPD data processing on the cluster nodes 20 NICA cluster concurrent data processing on cluster nodes 5 July 2016Gertsenberger K.V.

21 Batch system and MPD-Scheduler 21  Sun Grid Engine (qsub command as in PBS and Torque)  SGE combines cluster machines on the NICA cluster (nc10-nc11,nc13- nc19) into the pool of worker nodes with 252 logical processor  MPD-Scheduler is developed on C++ language with ROOT classes support. GIT: nica_modules/mpd_scheduler  MPD-Scheduler simplifies and parallelize job executing without knowledge of SGE and qsub command, and can use the Unified Database!  Jobs for multithreading execution on one user multicore machine and distributed execution on the NICA cluster are described and passed to MPD-Scheduler as XML file: $ mpd-scheduler my_job.xml 5 July 2016Gertsenberger K.V.

22 Job description for MPD-Scheduler 22 The description starts and ends with tag. Tag sets information about ROOT macro being executed by MpdRoot: name (path), start_event, event_count, add_args… Tag defines files to process by macro above: input (path with regular), file_input, db_input, job_input, output, start_event, event_count, parallel_mode, merge… Tag describes run parameters and allocated resources for the job: mode (‘global’ – on the NICA cluster, ‘local’ – on a multicore machine), count, config, logs… Tag with argument line is used to run a non-ROOT command. 5 July 2016Gertsenberger K.V.

23 MPD-Scheduler on the NICA cluster 23 SGE SGE = Sun Grid Engine server SGE = Sun Grid Engine worker *.root GlusterFS SGE batch system (252 cores) qsub evetest1.root SGE MPD-Scheduler evetest2.root evetest3.root free free mpddst2.root job_reco.xml job_command.xml mpddst1.root mpddst3.root job_command.xml busy 5 July 2016Gertsenberger K.V.

24 Reconstruction of sim. files on the NICA cluster 245 July 2016Gertsenberger K.V.

25 MDC on the NICA cluster (jobs) 25 mpddst*.root Geant3 evetest*.root Event Generators 420 UrQMD input files runMC.C AuAu_7gev_*.f14.root reco.C femtoAna.C MPD-Scheduler XML description: 5 July 2016Gertsenberger K.V. *nc13.jinr.ru – server name with the unified database

26 MDC on the NICA cluster (run) 26 Event Generators runMC.C reco.C femtoAna.C 5 July 2016Gertsenberger K.V. Successful! data quality system is not ready now, so result data was exercised manually

27 MDC on the NICA cluster (time) 275 July 2016Gertsenberger K.V.

28 «Computing» section on mpd.jinr.ru 285 July 2016Gertsenberger K.V.

29 Conclusions  The MpdRoot environment was deployed on the distributed NICA cluster for MPD data processing: Fairsoft, ROOT/PROOF, MpdRoot, MPD-Scheduler...  The Unified Database was developed based on PostgreSQL and deployed on the NICA cluster for offline data processing.  PROOF On Demand cluster was implemented to parallelize event data processing for the MPD experiment.  Batch System based on the Sun Grid Engine is used on the NICA cluster to accelerate processing of MPD tasks. The MPD-Scheduler software was developed to automate running MpdRoot macros concurrently.  Mock Data Challenge used the simulation-analysis chains to test the software and computing infrastructure. All of the steps were successfully completed.  The site mpd.jinr.ru presents the detailed information in the ‘Computing’ section. 295 July 2016Gertsenberger K.V.

30 LOGO 5 July 2016Gertsenberger K.V. The 7th International Conference "Distributed Computing and Grid- technologies in Science and Education“ (GRID 2016) MPD site: mpd.jinr.ru email: gertsen@jinr.ru

31 315 July 2016Gertsenberger K.V.

32 User multicore machine 32 Multithreading parallelization of time-consuming tasks in MpdRoot TOF matching TPC tracking Intel Core i7-920 2.67GHz 5 July 2016Gertsenberger K.V.


Download ppt "LOGO Mock Data Challenge for the MPD experiment on the NICA cluster Potrebenikov Yu. K., Schinov B. G., Rogachevsky O.V., Gertsenberger K. V. Laboratory."

Similar presentations


Ads by Google