Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam MINEO Sogo (Univ. Tokyo), ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE Soohyung (Korea Univ.)
Distributed parallel framework Analysis framework: ROOBASF – Extended from BASF (Belle’s framework) – Controls analysis workflow – For MPI distributed-memory system * – With a Python interface * – ROOT embedded * For the use of: – Belle II (High energy physics) – Hyper Suprime-Cam (Astrophysics) 2 * Newly appended features
Table of contents Motivation – Hyper Suprime-Cam & Belle II Distributed parallel framework – MPI & Python Test pipeline Summary 3
MOTIVATION 4
Hyper Suprime-Cam (HSC) & Belle II Hyper Suprime-Cam (HSC) – Next-generation camera aiming for dark energy On the prime focus of the Subaru Telescope. Data rate: 2GB/shot. – 10 times larger than the current camera’s. Belle II – Next-generation B factory With Super KEKB: new high luminosity e - -e + collider at KEK. Data rate: 600MB/sec. – > 40 times larger than the current Belle detector’s Efficient, distributed parallel analysis system is necessary 5
Analyses on HSC images Chip-by-chip correction 116 CCD sensors cover the focal plane Easily data-parallelized. Assigning chips with processes 1 by 1 Pedestal correction Gain correction Determine positions by matching celestial objects superpose chips Parallelization is not trivial Processes must exchange – object position information – pixel information – etc. “Mosaicking” Processes need communication 6
Use case in Belle ll ROOT-based data format. DAQ cluster needs cooperation 7
Existing framework BASF: the framework for the Belle experiment – successfully used for 10 years. – Involved in nearly all of the experiment. Data Acquisition, Simulation, Users’ analysis – Software pipeline architecture Enables modular structure of analysis paths. Flexible and dynamic module linking. – Event-by-event parallel analysis Issues to be improved: – Large data rate: distributed parallelization – with Inter-process communication. – ROOT support / Object-oriented data flow. analysis modules Path Upgrade BASF for Belle II & also for HSC 8
DISTRIBUTED PARALLEL FRAMEWORK 9
Parallel framework (ROOBASF) Control analysis paths. – Like BASF in Belle. Data parallel. – Inter-process comm. Program parallel. Python user interface. ROOT utilization. Process 1 Process 2 Process 3 Process 4 analysis modules Process 1Process 2 Path 10
Parallelization ROOBASF uses Message Passing Interface (MPI) – De-facto standard of distributed parallel computing. – Expected to run in various environments. Analysis modules use MPI to perform data-parallel algorithms. – Each pipeline stage is given an MPI group (communicator.) – Modules perform parallel processing just like stand-alone MPI programs in the given group. Process group 1Process group 2 11
Two layers of analysis paths Sequential paths – Sequence of analysis modules. – Conditional branches. →All executed in one process. Parallel paths – Sequence of processes & c. branches. Each of the processes execute a “sequential path. ” Program-parallelization. – Multiple copies run simultaneously. Data-parallelization. analysis modules Con. branch processes 12
Data flow Events – Event or image data to be analyzed. Broadcast messages – Experiment parameters, observation params, etc. – Have to be sent to all modules. – Must not switch order with events. overtake event? c. branch 12 event bcast 2 Suspend b-cast until it arrives from all branches 13
Native (C++ etc) Utilization of Python Analysis paths are described in the Python language. – Modules can also be described in the script inline. Modules can be quickly developed in Python. CPU costly, then be rewritten in C++. →Efficient development of analysis modules. Implemented with the boost.python library. – Python scripts can call native codes. – Native codes can call Python scripts. Unique feature of boost.python, absent from SWIG. ROOBASF Python script Path Descrpt. Analysis code call Analysis code 14
Python script import boostpbasf as basf f = basf.CFrame() f.Plug_Module( "Astr1Chip").SetParam( "config", "matching.scamp”) Create an instance of ROOBASF framework dopen() “Astr1Chip.so”, link the plugin code, and set its parameter. class Load(basf.CModule): def __init__(self, namefmt): basf.CModule.__init__(self) self.namefmt = namefmt self.count = 0 def event(self, status, ev, comm): if status == 0: ev.SetFile(namefmt % count) (……) Define a python module load = Load(“/data/img%03d.fits") f.Seq_Add("main", load) f.Seq_Add("main", "Astr1Chip") Create a sequential path “main” Python ROOBASF (native) “main” path Astr1Chip.so (native) Load 15
TEST PIPELINE 16
Pipeline for the test Data-parallel analysis path (for on-line monitoring): – Performs pedestal/gain correction – Checks data quality – Performs 1-chip astrometry – Tiny modules in Python: Error detector, Time watch, etc. ROOBASF OSSFLATAGPSTATSEXTASTR OSSFLATAGPSTATSEXTASTR OSSFLATAGPSTATSEXTASTR CCD images correction Check Data Quality 1-chip astrometry (Multi-threaded) 17
Test environment 3 PCs only – x64 4-core – Gigabit-Ethernet-linked Number of processes – 1, 3x1, 3x2, 3x3 Parallelization will not go linear (though CPU has 4 cores) because of multi-threaded modules. 1 process 3x1 process 3x2 processes 3x3 processes HDD In. images Out. images CPU: 4 cores HDD Programs In. images Out. images CPU: 4 cores HDD In. images Out. images CPU: 4 cores (NFS) 18 Process with threads
Analysis time per image / sec (inversed) Parallelization efficiency 19 Ideal speedup Process with threads Speedup Analysis time per image / sec (inversed)
SUMMARY 20
Summary Analysis framework: ROOBASF – Distributed memory (MPI) – Python script – ROOT I/O We built a parallel analysis path for astronomical images. Yet to confirm feasibility in Belle II. 21