ALFA: Next generation concurrent framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-ExpSys/CERN-PH.

Slides:



Advertisements
Similar presentations
Boris Sharkov FAIR Joint Core Team
Advertisements

Status of FAIR Boris Sharkov FAIR Joint Core Team.
HEPTECH meeting at Sofia Overview of Technology Marketing at GSI.
GSI and FAIR - EPICS status report - Peter Zumbruch Experiment control systems group GSI (KS/EE)
Lasers and Highly-Charged Heavy Ions at Günther RosnerHZDR Workshop, Rossendorf, 6/9/111.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Agnes Lundborg Open Seminar 9 th of April 2003 in Uppsala on the SweGrid project PANDA computing “a tiny specification”
Technical Architectures
ALFA: The new ALICE-FAIR software framework
Status and roadmap of the AlFa Framework Mohammad Al-Turany GSI-IT/CERN-PH-AIP.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Platform as a Service (PaaS)
ALFA - a common concurrency framework for ALICE and FAIR experiments
FairRoot framework Mohammad Al-Turany (GSI-Scientific Computing)
Microsoft Visual Basic 2005 CHAPTER 1 Introduction to Visual Basic 2005 Programming.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
An Introduction to Software Architecture
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
FAIR/SFRS meeting at HIPHeikki Penttila Oct 6, 2008 Facility for Antiproton and Ion Research FAIR is An international accelerator facility of the ”next.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
IMPLEMENTATION OF SOFTWARE INPUT OUTPUT CONTROLLERS FOR THE STAR EXPERIMENT J. M. Burns, M. Cherney*, J. Fujita* Creighton University, Department of Physics,
NUSTAR Nuclear Structure, Astrophysics, Reactions NUSTAR is the biggest collaboration in FAIR The collaboration unites 450 scientists from 98 institutions.
International Accelerator Facility for Beams of Ions and Antiprotons at Darmstadt Status of the FAIR project Visit in Wuhan,China November 23, 2006 J.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
FAIR Facility for Antiproton and Ion Research Guenther RosnerNuPECC, Prague,
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
1 Plans for JINR participation at FAIR JINR Contributions: ● Accelerator Complex ● Condensed Baryonic Matter ● Antiproton Physics ● Spin Physics Physics.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
February 17, 2015 Software Framework Development P. Hristov for CWG13.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
CBM Software Workshop for Future Challenges in Tracking and Trigger Concepts, GSI, 9 June 2010 Volker Friese.
Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Status and Perspectives for the NuSTAR collaboration Reiner Krücken Physik Department E12 Technische Universität München.
International Accelerator Facility for Beams of Ions and Antiprotons at Darmstadt Construction of FAIR Phase-1 December 2005 J. Eschke, GSI Construction.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
FairRoot Status and plans Mohammad Al-Turany 6/25/13M. Al-Turany, ALICE Offline Meeting1.
ALFA - a common concurrency framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-IT/CERN-PH.
4 Construction Site today Construction Site – current Google Maps View 1Short Status of FAIR for NuPECC /KP.
O 2 Project Roadmap P. VANDE VYVRE 1. O2 Project: What’s Next ? 2 O2 Plenary | 11 March 2015 | P. Vande Vyvre TDR close to its final state and its submission.
CWG13: Ideas and discussion about the online part of the prototype P. Hristov, 11/04/2014.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Monitoring for the ALICE O 2 Project 11 February 2016.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Mitglied der Helmholtz-Gemeinschaft FairMQ with FPGAs and GPUs Simone Esch –
ICE-DIP Project: Research on data transport for manycore processors for next generation DAQs Aram Santogidis › 5/12/2014.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Flexible data transport for the online reconstruction in FairRoot Mohammad Al-Turany Dennis Klein Anar Manafov Alexey Rybalchenko 6/25/13M. Al-Turany,
ALFA - a common concurrency framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-IT/CERN-PH.
International Accelerator Facility for Beams of Ions and Antiprotons at Darmstadt The Facility for Antiproton and Ion Research Peter Malzacher, GSI EGEE'09,
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
Status of the FAIR Project
PROOF system for parallel NICA event processing
ESR Slow highly charged ions and antiprotons
DDS Dynamic Deployment System
Intel Code Modernisation Project: Status and Plans
Kilohertz Decision Making on Petabytes
Data transfer on manycore processors for high throughput applications
Integration of Network Services Interface version 2 with the JUNOS Space SDK
University of Technology
Ladies and Gentlemen, Let me give you a short outlook to the operating of the new facility at this point. My colleague has just shown you, how we operate.
An Introduction to Software Architecture
Software Architecture
Presentation transcript:

ALFA: Next generation concurrent framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-ExpSys/CERN-PH

This Talk Introduction: – GSI/FAIR : Research and Experiments Computing at GSI/FAIR – Green Cube – FairRoot ALICE-FAIR (ALFA) General concept Project with OpenLab 10/06/15CERN openlab Open Day2

10/06/15 CERN openlab Open Day3

Research Areas at GSI Plasma Physics Hot dense plasma Ion-plasma-interaction Materials Research Ion-Solid-Interactions Structuring of materials with ion beams Accelerator Technology Linear accelerator Synchrotrons and storage rings Biophysics and radiation medicine Radiobiological effect of ions Cancer therapy with ion beams Nuclear Physics Nuclear reactions Superheavy elements Hot dense nuclear matter Atomic Physics Atomic Reactions Precision spectroscopy of highly charged ions Alice 10/06/15CERN openlab Open Day4

FAIR - Facility for Antiproton and Ion Research 100 m UNILAC SIS 18 SIS 100/300 HESR Super FRS NESR CR RESR Today Future Facility ESR 10/06/15CERN openlab Open Day Higher beam intensity Higher beam energy Anti-matter beams and experiments Unique beam quality by beam cooling measures Parallel operation Added value Construction cost > 1 Billion Euro Scientific Users: appr per year Cost,user community 65 % Federal Republic 10 % State of Hesse 25 % International Partners Funding (Construction) 5

Experiments at FAIR/GSI Heavy ion physics ►CBM Hadron physics ►PANDA Nuclear structure and astrophysics (NUSTAR) ►Super-FRS ►HISPEC / DESPEC ►MATS ►LASPEC ►R3B ► ILIMA ► AIC ► ELISe ► EXL Atomic physics, plasma physics and applied physics (APPA) ► SPARC ► FLAIR ► HEDgeHOB ► WDM ► BIOMAT Each collaboration may involve a few dozen or a few hundred scientists. Some experiment set- ups are small, others will cost ca. 80 million euros. 10/06/15CERN openlab Open Day6

Space for ” racks (2,2m) 4 MW cooling (baseline) Max cooling power 12 MW Can be used for any commercial IT PUE <1.07 Baseline building cost 12 M€ The FAIR Data Center (Green Cube) 10/06/15CERN openlab Open Day7

Green Cube On schedule for start of operation in Q4/2015 CERN openlab Open Day Construction started in Dec /06/158

FairRoot Framework 10/06/15CERN openlab Open Day9

Start testing the VMC concept for CBM First Release of CbmRoot MPD (NICA) start also using FairRoot ASYEOS joined (ASYEOSRoot) GEM-TPC seperated from PANDA branch (FOPIRoot) Panda decided to join-> FairRoot: same Base package for different experiments R3B joined EIC (Electron Ion Collider BNL) EICRoot FairRoot 2012 SOFIA (Studies On Fission with Aladin) ENSAR-ROOT Collection of modules used by structural nuclear phsyics exp SHIP - Search for HIdden Particles 10

ALICE and FAIR: Why? Two projects – same requirements Massive data volume reduction (1 TByte/s input) Data reduction by (partial) online reconstruction Online reconstruction and event selection Much tighter coupling between online and offline reconstruction software Lets work together 10/06/15CERN openlab Open Day11

ALFA A tools set library that contains: – Transport layer (FairMQ, based on: ZeroMQ, nanomsg) – Configuration tools – Management and monitoring tools – …. A data-flow based model (Message Queues based multi-processing ). Provide unified access to configuration parameters and databases. 10/06/15CERN openlab Open Day12

Scalability through multi-processing with message queues? No locking, each process runs with full speed Easier to scale horizontally to meet computing and throughput demands (starting new instances) than applications that exclusively rely on multiple threads which can only scale vertically. No locking, each process runs with full speed Easier to scale horizontally to meet computing and throughput demands (starting new instances) than applications that exclusively rely on multiple threads which can only scale vertically. 10/06/15CERN openlab Open Day Each process assumes limited communication and reliance on other processes. 13

Correct balance between reliability and performance Multi-process concept with message queues for data exchange – Each "Task" is a separate process, which: Can be multithreaded, SIMDized, …etc. Can run on different hardware (CPU, GPU, XeonPhi, …etc.) Be written in an any supported language (Bindings for 30+ languages) – Different topologies of tasks can be adapted to the problem itself, and the hardware capabilities. 10/06/15CERN openlab Open Day14

ALFA uses FairMQ to connect different pieces together 10/06/15CERN openlab Open Day FairMQ nanomsg 15

Co-Processors in ALFA Up to now: – Offload portions of code to the co-processors – GPUs, FPGAs and Xeon Phi support this mode of operation 10/06/15CERN openlab Open Day16

GPUs in ALFA 10/06/15CERN openlab Open Day Ludovico BIANCHI 17

Xeon Phi 10/06/15CERN openlab Open Day Aram SANTOGIDIS 18

Xeon Phi & FairMQ Work is ongoing on a transport mechanism on top of SCIF that can be integrated in FairMQ (ZeroMQ and nanomsg) A direct support of SCIF in FairMQ will improve the data transport performance from/to Xeon Phi tremendously The solution developed for Knights Corner will be ported in the future to Knights Landing and Omni-Path. 10/06/15CERN openlab Open Day Aram SANTOGIDIS 19

Future Plans Porting compute-intensive code to Xeon Phi and use them via traditional offloading (the message handling will be exclusively done on the host) As soon as the direct support of SCIF in FairMQ becomes available, the prospects of using messages directly for communication with the accelerator cards will be investigated 10/06/15CERN openlab Open Day20

Backup 10/06/15CERN openlab Open Day21

Integrating ALICE-HLT code into ALFA 10/06/15CERN openlab Open Day Matthias RICHTER 22

Results 10/06/15CERN openlab Open Day Aram SANTOGIDIS 23

Heterogeneous Platforms: Message format The framework does not impose any format on messages. It supports different serialization standards – BOOST C++ serialization – Google’s protocol buffers – ROOT – User defined 10/06/15CERN openlab Open Day24

How to deploy ALFA on a laptop, few PCs or a cluster? DDS: Dynamic Deployment System – Users describe desired tasks and their dependencies using topology files – The system takes so called “topology file” as the input. – Users are provided with a WEB GUI to create topology (Can be created manually as well). 10/06/15CERN openlab Open Day25

DDS: Basic concepts Implements a single-responsibility-principle command line tool-set and APIs, Treats users’ tasks as black boxes, Doesn’t depend on RMS (provides deployment via SSH, when no RMS is present), Supports workers behind FireWalls, Doesn’t require pre-installation on WNs (Worker nodes), Deploys private facilities on demand with isolated sandboxes, Provides a key-value properties propagation service for tasks, Provides a rules based execution of tasks /06/15CERN openlab Open Day26

Tests of ALFA devices via DDS Devices (user tasks)startup time*propagated key-value properties 2721 (1360 FLP EPN + 1 Sampler) 17 sec~ 6x (2720 FLP EPN + 1 Sampler) 58 sec~ 23x (5040 FLP EPN + 1 Sampler) 207 sec~ 77x10 6 FLP(First Level Processer), EPN (Event Processor Event) N –To-N connections – Each EPN need to connect to each FLPs (Time frame building) – Each FLP need to connect to all EPNs (Heartbeat signal) Throughout tests only one DDS commander server was used. In the future we will support multiple distributed servers. Our current short term target is to manage 100K devices and keep startup time of the whole deployment within sec. * startup time - the time which took DDS to distribute user tasks, to propagate all needed properties, plus the time took devices to bind/connect and to enter into RUN state.

ALFA and FairRoot 10/06/15CERN openlab Open Day ROOT Geant3 Geant4 Genat4_VMC Libraries and Tools … VGM Runtime DB Module Detector Magnetic Field Event Generator MC Application Fair MQ Building configuration Testing CMake ZeroMQ DDS BOOST Protocol Buffers FairRoot ALFA CbmRoot PandaRoot AsyEosRoot R3BRoot SofiaRoot MPDRoot FopiRoot EICRoot AliRoot6 (O 2 ) ???? ShipRoot 28

Volker Lindenstruth ( May 22, 2012— Copyright ©, Goethe Uni, Alle Rechte vorbehaltenwww.compeng.deVolker Lindenstruth ( May 22, 2012— Copyright ©, Goethe Uni, Alle Rechte vorbehaltenwww.compeng.de Data Center State of the Art Source: EYP Mission Critical Facilities Inc., New York PUE = 2 PUE = Total Facility Power IT Equipment Power Power Usage Effectiveness (typical PUE = 1,6 to 3,5) Data Center Cooling 12% Electrical Power UPS 10% Light, etc. 3% Back Cooling 25% IT Equipment 50% Note: 12C/kWh  1W = 1,05€/a 10/06/15CERN openlab Open Day29