Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam MINEO Sogo (Univ. Tokyo), ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Introduction CSCI 444/544 Operating Systems Fall 2008.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/27 A Control Software for the ALICE High Level Trigger Timm.
1: Operating Systems Overview
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Establishing the overall structure of a software system
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Fundamentals of Python: From First Programs Through Data Structures
What is Concurrent Programming? Maram Bani Younes.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Operating System A program that controls the execution of application programs An interface between applications and hardware 1.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
@2011 Mihail L. Sichitiu1 Android Introduction Platform Overview.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
Online & Offline software at H.E.S.S.  The H.E.S.S. Experiment  Data storage & Off-line software  Acquisition software  ROOT Problems/Wishlist ROOT.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Management System of Event Processing and Data Files Based on XML Software Tools at Belle Ichiro Adachi, Nobu Katayama, Masahiko Yokoyama IPNS, KEK, Tsukuba,
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
PART II OPERATING SYSTEMS LECTURE 8 SO TAXONOMY Ştefan Stăncescu 1.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
RAVE – a detector-independent vertex reconstruction toolkit W. Waltenberger, F. Moser, W. Mitaroff Austrian Academy of Sciences Institute of High Energy.
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
CSC480 Software Engineering Lecture 10 September 25, 2002.
Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
High Speed Detectors at Diamond Nick Rees. A few words about HDF5 PSI and Dectris held a workshop in May 2012 which identified issues with HDF5: –HDF5.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
A Remote Collaboration Environment for Protein Crystallography HEPiX-HEPNT Conference, 8 Oct 1999 Nicholas Sauter, Stanford Synchrotron Radiation Laboratory.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
General requirements for BES III offline & EF selection software Weidong Li.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
Transition Nobu Katayama B2GM July, 7, Belle II software Start from scratch? – Need a lot of thinking, extensive work at the beginning (NOW!) Reuse.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
SMI 7 May 2008B. Franek SMI++ Framework Knowledge Exchange seminar 1 SMI++ Object-Oriented Framework for Designing and Implementing Distributed Control.
Unit 4: Processes, Threads & Deadlocks June 2012 Kaplan University 1.
Development of a data acquisition program builder via a user interface F.Fujiwara, N.Tamura, M.Abe, S.Enomoto, G.Iwai, S,Kawabata, A.Manabe,Y.Nagasaka,
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
COMPASS DAQ Upgrade I.Konorov, A.Mann, S.Paul TU Munich M.Finger, V.Jary, T.Liska Technical University Prague April PANDA DAQ/FEE WS Игорь.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
Using the VTune Analyzer on Multithreaded Applications
Modeling Big Data Execution speed limited by: Model complexity
CMS High Level Trigger Configuration Management
Parallel Objects: Virtualization & In-Process Components
MPI-Message Passing Interface
AIMS Equipment & Automation monitoring solution
Zhangxy Zhangxm Huangxt Dec 17 ,2003
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam MINEO Sogo (Univ. Tokyo), ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE Soohyung (Korea Univ.)

Distributed parallel framework Analysis framework: ROOBASF – Extended from BASF (Belle’s framework) – Controls analysis workflow – For MPI distributed-memory system * – With a Python interface * – ROOT embedded * For the use of: – Belle II (High energy physics) – Hyper Suprime-Cam (Astrophysics) 2 * Newly appended features

Table of contents Motivation – Hyper Suprime-Cam & Belle II Distributed parallel framework – MPI & Python Test pipeline Summary 3

MOTIVATION 4

Hyper Suprime-Cam (HSC) & Belle II Hyper Suprime-Cam (HSC) – Next-generation camera aiming for dark energy On the prime focus of the Subaru Telescope. Data rate: 2GB/shot. – 10 times larger than the current camera’s. Belle II – Next-generation B factory With Super KEKB: new high luminosity e - -e + collider at KEK. Data rate: 600MB/sec. – > 40 times larger than the current Belle detector’s Efficient, distributed parallel analysis system is necessary 5

Analyses on HSC images Chip-by-chip correction 116 CCD sensors cover the focal plane Easily data-parallelized. Assigning chips with processes 1 by 1 Pedestal correction Gain correction Determine positions by matching celestial objects superpose chips Parallelization is not trivial Processes must exchange – object position information – pixel information – etc. “Mosaicking” Processes need communication 6

Use case in Belle ll ROOT-based data format. DAQ cluster needs cooperation 7

Existing framework BASF: the framework for the Belle experiment – successfully used for 10 years. – Involved in nearly all of the experiment. Data Acquisition, Simulation, Users’ analysis – Software pipeline architecture Enables modular structure of analysis paths. Flexible and dynamic module linking. – Event-by-event parallel analysis Issues to be improved: – Large data rate: distributed parallelization – with Inter-process communication. – ROOT support / Object-oriented data flow. analysis modules Path Upgrade BASF for Belle II & also for HSC 8

DISTRIBUTED PARALLEL FRAMEWORK 9

Parallel framework (ROOBASF) Control analysis paths. – Like BASF in Belle. Data parallel. – Inter-process comm. Program parallel. Python user interface. ROOT utilization. Process 1 Process 2 Process 3 Process 4 analysis modules Process 1Process 2 Path 10

Parallelization ROOBASF uses Message Passing Interface (MPI) – De-facto standard of distributed parallel computing. – Expected to run in various environments. Analysis modules use MPI to perform data-parallel algorithms. – Each pipeline stage is given an MPI group (communicator.) – Modules perform parallel processing just like stand-alone MPI programs in the given group. Process group 1Process group 2 11

Two layers of analysis paths Sequential paths – Sequence of analysis modules. – Conditional branches. →All executed in one process. Parallel paths – Sequence of processes & c. branches. Each of the processes execute a “sequential path. ” Program-parallelization. – Multiple copies run simultaneously. Data-parallelization. analysis modules Con. branch processes 12

Data flow Events – Event or image data to be analyzed. Broadcast messages – Experiment parameters, observation params, etc. – Have to be sent to all modules. – Must not switch order with events. overtake event? c. branch 12 event bcast 2 Suspend b-cast until it arrives from all branches 13

Native (C++ etc) Utilization of Python Analysis paths are described in the Python language. – Modules can also be described in the script inline. Modules can be quickly developed in Python. CPU costly, then be rewritten in C++. →Efficient development of analysis modules. Implemented with the boost.python library. – Python scripts can call native codes. – Native codes can call Python scripts. Unique feature of boost.python, absent from SWIG. ROOBASF Python script Path Descrpt. Analysis code call Analysis code 14

Python script import boostpbasf as basf f = basf.CFrame() f.Plug_Module( "Astr1Chip").SetParam( "config", "matching.scamp”) Create an instance of ROOBASF framework dopen() “Astr1Chip.so”, link the plugin code, and set its parameter. class Load(basf.CModule): def __init__(self, namefmt): basf.CModule.__init__(self) self.namefmt = namefmt self.count = 0 def event(self, status, ev, comm): if status == 0: ev.SetFile(namefmt % count) (……) Define a python module load = Load(“/data/img%03d.fits") f.Seq_Add("main", load) f.Seq_Add("main", "Astr1Chip") Create a sequential path “main” Python ROOBASF (native) “main” path Astr1Chip.so (native) Load 15

TEST PIPELINE 16

Pipeline for the test Data-parallel analysis path (for on-line monitoring): – Performs pedestal/gain correction – Checks data quality – Performs 1-chip astrometry – Tiny modules in Python: Error detector, Time watch, etc. ROOBASF OSSFLATAGPSTATSEXTASTR OSSFLATAGPSTATSEXTASTR OSSFLATAGPSTATSEXTASTR CCD images correction Check Data Quality 1-chip astrometry (Multi-threaded) 17

Test environment 3 PCs only – x64 4-core – Gigabit-Ethernet-linked Number of processes – 1, 3x1, 3x2, 3x3 Parallelization will not go linear (though CPU has 4 cores) because of multi-threaded modules. 1 process 3x1 process 3x2 processes 3x3 processes HDD In. images Out. images CPU: 4 cores HDD Programs In. images Out. images CPU: 4 cores HDD In. images Out. images CPU: 4 cores (NFS) 18 Process with threads

Analysis time per image / sec (inversed) Parallelization efficiency 19 Ideal speedup Process with threads Speedup Analysis time per image / sec (inversed)

SUMMARY 20

Summary Analysis framework: ROOBASF – Distributed memory (MPI) – Python script – ROOT I/O We built a parallel analysis path for astronomical images. Yet to confirm feasibility in Belle II. 21