ALFA - a common concurrency framework for ALICE and FAIR experiments

Slides:



Advertisements
Similar presentations
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Advertisements

COM vs. CORBA.
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
Introduction CSCI 444/544 Operating Systems Fall 2008.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Technical Architectures
ALFA: The new ALICE-FAIR software framework
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration Timm Morten Steinbeck, Computer Science.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
The Architecture of Transaction Processing Systems
Status and roadmap of the AlFa Framework Mohammad Al-Turany GSI-IT/CERN-PH-AIP.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
FairRoot framework Mohammad Al-Turany (GSI-Scientific Computing)
.NET, and Service Gateways Group members: Andre Tran, Priyanka Gangishetty, Irena Mao, Wileen Chiu.
Flexible data transport for the online reconstruction of FAIR experiments Mohammad Al-Turany Dennis Klein Alexey Rybalchenko (GSI-IT) 5/17/13M. Al-Turany,
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
DISTRIBUTED COMPUTING
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
April 2000Dr Milan Simic1 Network Operating Systems Windows NT.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Types of Operating Systems
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Operating System 2 Overview. OPERATING SYSTEM OBJECTIVES AND FUNCTIONS.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
University of Toronto at Scarborough © Kersti Wain-Bantin CSCC40 system architecture 1 after designing to meet functional requirements, design the system.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
ALFA - a common concurrency framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-IT/CERN-PH.
Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.
CWG13: Ideas and discussion about the online part of the prototype P. Hristov, 11/04/2014.
CLIENT SERVER COMPUTING. We have 2 types of n/w architectures – client server and peer to peer. In P2P, each system has equal capabilities and responsibilities.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Mitglied der Helmholtz-Gemeinschaft FairMQ with FPGAs and GPUs Simone Esch –
Flexible data transport for online reconstruction M. Al-Turany Dennis Klein A. Rybalchenko 12/05/12 M. Al-Turany, Panda Collaboration Meeting, Goa 1.
Flexible data transport for the online reconstruction in FairRoot Mohammad Al-Turany Dennis Klein Anar Manafov Alexey Rybalchenko 6/25/13M. Al-Turany,
ALFA - a common concurrency framework for ALICE and FAIR experiments Mohammad Al-Turany GSI-IT/CERN-PH.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Data Transport for Online & Offline Processing
Grid Computing.
#01 Client/Server Computing
Lecture Topics: 11/1 General Operating System Concepts Processes
Chapter 1 Introduction.
MPJ: A Java-based Parallel Computing System
Prof. Leonardo Mostarda University of Camerino
Overview of Workflows: Why Use Them?
#01 Client/Server Computing
Presentation transcript:

ALFA - a common concurrency framework for ALICE and FAIR experiments 4/20/2017 ALFA - a common concurrency framework  for ALICE and FAIR experiments Mohammad Al-Turany GSI-IT/CERN-PH M. Al-Turany

M. Al-Turany, Annual Concurrency Forum meeting This Talk Introduction Transport layer Deployment system Serializing message data Summary (Example) April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting ALFA Common layer for parallel processing. Common algorithms for data processing. Common treatment of conditions database. Common deployment and monitoring infrastructure. April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting ALFA Will rely on a data-flow based model (Message Queues). It will contain Transport layer (based on: ZeroMQ, NanoMSG) Configuration tools Management and monitoring tools Provide unified access to configuration parameters and databases. It will include support for a heterogeneous and distributed computing system. Incorporate common data processing components April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting Design constrains Highly flexible: different data paths should be modeled. Adaptive: Sub-systems are continuously under development and improvement Should work for simulated and real data: developing and debugging the algorithms It should support all possible hardware where the algorithms could run (CPU, GPU, FPGA) It has to scale to any size! With minimum or ideally no effort. April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Before we start? Do we intend to separate online and offline? NO A message queue based system would: Decouple producers from consumers. Spread the work to be done over several processes and machines. We can manage/upgrade/move around programs (processes) independently of each other. April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Multi-processing vs. Multi-threading Different processes are insulated from each other by the OS, an error in one process cannot bring down another process. Inter-process communication can be used across network Error in one thread can bring down all the threads in the process. Inter-thread communication is fast April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

ALFA will use both: Multi-processing and Multi-threading April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Correct balance between reliability and performance Multi-process concept with message queues for data exchange Each "Task" is a separate process, which can be also multithreaded, and the data exchange between the different tasks is done via messages. Different topologies of tasks that can be adapted to the problem itself, and the hardware capabilities. April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting This Talk Introduction Transport layer Deployment system Serializing message data Summary (Example) April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

What is available on the market and in the community? RabbitMQ: One of the leading implementation of the AMQP protocol. It implements a broker architecture Advanced scenarios like routing, load balancing or persistent message queuing are supported in just a few lines of code. Less scalable and “slower” because the central node adds latency and message envelopes are quite big. April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

What is available on the market and in the community? ZeroMQ A very lightweight messaging system specially designed for high throughput/low latency scenarios Zmq supports many advanced messaging scenarios but contrary to RabbitMQ, you’ll have combine the various pieces of the framework yourself (e.g : sockets and devices). April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

What is available on the market and in the community? ActiveMQ is in the middle ground. Like Zmq, it can be deployed with both broker and P2P topologies. Like RabbitMQ, it’s easier to implement advanced scenarios but usually at the cost of raw performance April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

ALFA will use ZMQ to connect different pieces together BSD sockets API Bindings for 30+ languages Lockless and Fast Automatic re-connection Multiplexed I/O April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting ZeroMQ is a New Library April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Can be easily adapted by regular programmers! April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Works across all types of networks Network April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting Fastest way to make multithreaded apps that scale linearly to any number of cores April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

nanomsg is under development by the original author of ZeroMQ Pluggable Transports: ZeroMQ has no formal API for adding new transports (Infiniband, WebSeockets, etc). nanomsg defines such API, which simplifies implementation of new transports. Zero-Copy: Better zero-copy support with RDMA and shared memory, which will improve transfer rates for larger data for inter-process communication. Simpler interface: simplifies some zeromq concepts and API, for example, it no longer needs Context class. Numerous other improvements, described here: http://nanomsg.org/documentation-zeromq.html FairRoot is independent from the transport library Modular/Pluggable/Switchable transport libraries. April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

ALFA has an abstract transport layer April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting Current Status The Framework delivers some components which can be connected to each other in order to construct a processing pipeline(s). All components share a common base called Device Devices are grouped by three categories: Source: Data Readers (Simulated, raw) Message-based Processor: Sink, Splitter, Merger, Buffer, Proxy Content-based Processor: Processor April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting This Talk Introduction Transport layer Deployment system Serializing message data Summary (Example) April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

How to deploy ALFA on a laptop, few PCs or a cluster? We need to utilize any RMS (Resource Management system) We should also run with out RMS Support different topologies and process dependencies April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

STORM is a very attractive option but no native support for C++ ! 4/20/2017 STORM is a very attractive option but no native support for C++ ! http://storm.incubator.apache.org/about/multi-language.html April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting  M. Al-Turany

How to deploy ALFA on a laptop, few PCs or a cluster? We have to develop a dynamic deployment system (DDS): An independent set of utilities and interfaces, which provide a dynamic distribution of different user processes by any given topology on any RMS. April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

The Dynamic Deployment System (DDS) Should: Deploy task or set of tasks Use (utilize) any RMS (Slurm, Grid Engine, … ), Secure execution of nodes (watchdog), Support different topologies and task dependencies Support a central log engine …. See Talk by Anar Manafov on Alice Offline week (March 2014) https://indico.cern.ch/event/305441/ April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting This Talk Introduction Transport layer Deployment system Serializing message data Summary (Example) April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting 4/20/2017 Protocol buffers Google Protocol Buffers support is now implemented Example in Tutorial 3 in FairRoot. To use protobuf, run cmake as follows: cmake -DUSE_PROTOBUF=1 April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting  M. Al-Turany

M. Al-Turany, Annual Concurrency Forum meeting Boost serialization Code portability - depend only on ANSI C++ facilities. Code economy - exploit features of C++ such as RTTI, templates, and multiple inheritance, etc. where appropriate to make code shorter and simpler to use. Independent versioning for each class definition. That is, when a class definition changed, older files can still be imported to the new version of the class. Deep pointer save and restore. That is, save and restore of pointers saves and restores the data pointed to. …. http://www.boost.org/doc/libs/1_55_0/libs/serialization/doc/index.html This is used already by the CBM Online group in Frankfurt and to we need it to exchange data with them! April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Protobuf, Boost or Manual serialization? we are generic in the tasks but intrusive in the data classes (digi, hit, timestamp) Manual and Protobuf we are generic in the class but intrusive in the tasks (need to fill/access payloads from class with set/get x, y, z etc ). Manual method is still the fastest, protobuf  is 20% slower and boost is 30% slower. preliminary April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting Summary The AlFA project is starting The Communication layer is ready to use Different data serialization methods are available DDS and Topology parser are under development April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting Backup and Discussion April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Integrating the existing software: ROOT Files, Lmd Files, Remote event server, … Root (Event loop) ZeroMQ FairRootManager FairRunAna FairTasks Init() Re-Init() Exec() Finish() FairMQProcessorTask Init() Re-Init() Exec() Finish() April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting FairRoot: Example 3 4 -Tracking stations with a dipole field Simulation: 10k event: 300 Protons/ev Digitization Reconstruction: Hit/Cluster Finder April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

2 x 2.4 Xeon Quad core Intel Xeon 16 GB Memory TClonesArray 100 s 263 MByte Throughput ~ 1000 ev/s Total Memory 263 MByte April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

2 x 2.4 Xeon Quad core Intel Xeon 16 GB Memory 171 s 6 * 263 MByte TClonesArray TClonesArray TClonesArray TClonesArray TClonesArray TClonesArray Throughput ~ 3500 ev/s Total Memory: 1578 MByte Wall time: 171 s Total Event: 60k events April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

2 x 2.4 Xeon Quad core Intel Xeon 16 GB Memory 300 s 8 * 263 MByte TClonesArray TClonesArray TClonesArray TClonesArray TClonesArray TClonesArray TClonesArray TClonesArray Wall time: 300 s Total Event: 80k events Throughput ~ 2660 ev/s Total Memory: 2104 MByte April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

2 x 2.4 Xeon Quad core Intel Xeon 16 GB Memory April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

Before we continue: sampler Process that read from ROOT files and send each entry as a massege Proxy Bind on Input and Output sink Get payloads, save/convert to ROOT Objects (TClonesArrays) then to files processor Device class that contains the FairTask April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

2 x 2.4 Xeon Quad core Intel Xeon 16 GB Memory 17.1 s 77 Mb 26.1 s 211 Mb 4x 2x 17.3 s 168 Mb processor 2x sampler processor sink Proxy Proxy processor sampler sink 8.5 s 113 Mb 7,1 s 176 Mb Push processor Push Throughput ~ 7400 ev/s Total Memory 1355 MByte Wall time: 26.1 s Total Event: 20k events April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

2 x 2.4 Xeon Quad core Intel Xeon 16 GB Memory 33 s 86 Mb 36.4 s 228 Mb 22 s 151 Mb 4x 3x 4x sampler processor sink Push Push sampler processor Proxy Proxy sink sampler processor 12.8 s 105 Mb 15.6 s 175Mb sink sampler processor Throughput ~ 10990 ev/s Total Memory: 1308 MByte Wall time: 36.4 s Total Event: 40k events Gigabit Ethernet April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

M. Al-Turany, Annual Concurrency Forum meeting In Summary: Throughput ~ 1000 ev/s Total Memory 263 MByte embarrassingly parallel 6-processes Throughput ~ 3500 ev/s Total Memory: 1578 MByte embarrassingly parallel 8-processes Throughput ~ 2660 ev/s Total Memory: 2104 MByte Concurrent 4 processes 2 file readers 2 file sinks Throughput ~ 7400 ev/s Total Memory 1355 MByte Concurrent 4 processes Separated reader Throughput ~ 10990 ev/s Total Memory: 1308 MByte April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting 

The built-in core ØMQ patterns are: Request-reply, which connects a set of clients to a set of services. (remote procedure call and task distribution pattern) Publish-subscribe, which connects a set of publishers to a set of subscribers. (data distribution pattern) Pipeline, which connects nodes in a fan-out / fan-in pattern that can have multiple steps, and loops. (Parallel task distribution and collection pattern) Exclusive pair, which connect two sockets exclusively April 1, 2014 M. Al-Turany, Annual Concurrency Forum meeting