LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Threads, SMP, and Microkernels
Distributed components
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Computer System Architectures Computer System Software
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
DISTRIBUTED COMPUTING
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Managing a Cloud For Multi Agent System By, Pruthvi Pydimarri, Jaya Chandra Kumar Batchu.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Parallelization Geant4 simulation is an embarrassingly parallel computational problem – each event can possibly be treated independently 1.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
LOGO The unified database for the fixed target experiment Gertsenberger K. V. Laboratory of High Energy Physics, JINR for collaboration 30 September.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
LOGO Mock Data Challenge for the MPD experiment on the NICA cluster Potrebenikov Yu. K., Schinov B. G., Rogachevsky O.V., Gertsenberger K. V. Laboratory.
Compute and Storage For the Farm at Jlab
Experience of PROOF cluster Installation and operation
PROOF system for parallel NICA event processing
Introduction to Distributed Platforms
OpenMosix, Open SSI, and LinuxPMI
Mock Data Challenge for the MPD experiment on the HybriLIT cluster
Definition of Distributed System
Controlling a large CPU farm using industrial tools
Bernd Panzer-Steindel, CERN/IT
Distributed System 電機四 陳伯翰 b
US CMS Testbed.
Distributed Systems CS
Support for ”interactive batch”
Distributed Systems CS
Operating System Introduction.
MonteCarlo production for the BaBar experiment on the Italian grid
Production Manager Tools (New Architecture)
Presentation transcript:

LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics 2013 Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna

NICA scheme Gertsenberger K.V.2 MMCP’2013

Multipurpose Detector (MPD) The software MPDRoot is developed for the event simulation, reconstruction and physical analysis of the heavy ions’ collision registered by MPD at the NICA collider. 3Gertsenberger K.V. MMCP’2013

Prerequisites of the NICA cluster  high interaction rate (to 6 KHz)  high particle multiplicity, about 1000 charged particles for the central collision at the NICA energy  one event reconstruction takes tens of seconds in MPDRoot now, 1M events – months  large data stream from the MPD: 100k events ~ 5 TB k events ~ 5 PB/year  unified interface for parallel processing and storing of the event data 4Gertsenberger K.V. MMCP’2013

Development of the NICA cluster 2 main lines of the development:  data storage development for the experiment  organization of parallel processing of the MPD events 5 development and expansion distributed cluster for the MPD experiment based on LHEP farm development and expansion distributed cluster for the MPD experiment based on LHEP farm Gertsenberger K.V. MMCP’2013

Current NICA cluster in LHEP for MPD 6Gertsenberger K.V. MMCP’2013

Distributed file system GlusterFS  aggregates the existing file systems in common distributed file system  automatic replication works as background process  background self-checking service restores corrupted files in case of hardware or software failure  implemented on application layer and working in user space 7Gertsenberger K.V. MMCP’2013

Data storage on the NICA cluster 8Gertsenberger K.V. MMCP’2013

Development of the distributed computing system PROOF server parallel data processing in a ROOT macro on the parallel architectures NICA cluster concurrent data processing on cluster nodes MPD-scheduler scheduling system for the task distribution to parallelize data processing on cluster nodes 9Gertsenberger K.V. MMCP’2013

Parallel data processing with PROOF  PROOF (Parallel ROOT Facility) – the part of the ROOT software, no additional installations  PROOF uses data independent parallelism based on the lack of correlation for MPD events  good scalability  Parallelization for three parallel architectures: 1.PROOF-Lite parallelizes the data processing on one multiprocessor/multicores machine 2.PROOF parallelizes processing on heterogeneous computing cluster 3.Parallel data processing in GRID  Transparency: the same program code can execute both sequentially and concurrently 10Gertsenberger K.V. MMCP’2013

Using PROOF in MPDRoot  The last parameter of the reconstruction: run_type (default, “local”). Speedup on the user multicore machine: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof”) parallel processing of 1000 events with thread count being equal logical processor count $ root reco.C (“evetest.root”, “mpddst.root”, 0, 500, “proof:workers=3”) parallel processing of 500 events with 3 concurrent threads Speedup on the NICA cluster: $ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, parallel processing of 1000 events on all cluster nodes of PoD farm $ root reco.C (“eve”, “mpddst”, 0, 500, parallel processing of 500 events on PoD cluster with 10 workers 11Gertsenberger K.V. MMCP’2013

Speedup of the reconstruction on 4-cores machine 12Gertsenberger K.V. MMCP’2013

PROOF on the NICA cluster 13Gertsenberger K.V. MMCP’2013 proof proof = master server proof = slave node *.root GlusterFS Proof On Demand Cluster (8) (16) (24) (32) $ root reco.C(“evetest.root”,”mpddst.root”, 0, 3, event count evetest.root event №0 event №1 event №2 mpddst.root

Speedup of the reconstruction on the NICA cluster 14Gertsenberger K.V. MMCP’2013

MPD-scheduler  Developed on C++ language with ROOT classes support.  Uses scheduling system Sun Grid Engine (qsub command) for execution in cluster mode.  SGE combines cluster machines on LHEP farm into the pool of worker nodes with 78 logical processors.  The job for distributed execution on the NICA cluster is described and passed to MPD-scheduler as XML file: $ mpd-scheduler my_job.xml 15Gertsenberger K.V. MMCP’2013

Job description 16 The description starts and ends with tag. Tag sets information about macro being executed by MPDRoot Tag defines files to process by macro above Tag describes run parameters and allocated resources * mpd.jinr.ru – server name with production database Gertsenberger K.V. MMCP’2013

Job execution on the NICA cluster 17Gertsenberger K.V. MMCP’ Gertsenberger K.V. MMCP’2013 SGE SGE = Sun Grid Engine server SGE = Sun Grid Engine worker *.root GlusterFS SGE batch system (8) (16) (24) (32) qsub evetest1.root SGE MPD-scheduler evetest2.root evetest3.root free free busy mpddst2.root job_reco.xml job_command.xml mpddst1.root mpddst3.root job_command.xml

Speedup of the one reconstruction on NICA cluster 18Gertsenberger K.V. MMCP’2013

NICA cluster section on mpd.jinr.ru 19Gertsenberger K.V. MMCP’2013

Conclusions  The distributed NICA cluster was deployed based on LHEP farm for the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Torque, Maui). 128 cores  The data storage was organized with distributed file system GlusterFS: /nica/mpd[1-8]. 10 TB  PROOF On Demand cluster was implemented to parallelize event data processing for the MPD experiment, PROOF support was added to the reconstruction macro.  The system for the distributed job execution MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster.  The web site mpd.jinr.ru in section Computing – NICA cluster presents the manuals for the systems described above. 20Gertsenberger K.V. MMCP’2013

LOGO

Analytical model for parallel processing on cluster 22 speedup for point (data independent) algorithm of image processing P node – count of logical processors, n – data to process (byte), В D – speed of the data access (MB/s), T 1 – “pure” time of the sequential processing (s) Gertsenberger K.V. MMCP’2013

Prediction of the NICA computing power 23 How many are logical processors required to process N TASK physical analysis tasks and one reconstruction within T day days in parallel? If n 1 = 2 MB, N EVENT = events, T PA = 5 s/event, T REC = 10 s/event., B D = 100 MB/s, T day = 30 days Gertsenberger K.V. MMCP’2013