Presentation is loading. Please wait.

Presentation is loading. Please wait.

February 17, 2015 Software Framework Development P. Hristov for CWG13.

Similar presentations


Presentation on theme: "February 17, 2015 Software Framework Development P. Hristov for CWG13."— Presentation transcript:

1 February 17, 2015 Software Framework Development P. Hristov for CWG13

2 February 17, 2015 General news and development 2

3 February 17, 2015 Doxygen documentation M. Al-Turany

4 February 17, 2015 CDash/CTest is running M. Al-Turany

5 February 17, 2015 Modular build option and Preparation for Geant4 Multi-threaded Does not require the environment variables SIMPATH and FAIRROOTPATH Triggered with ALICEO2_MODULAR_BUILD Migration of MC related classes for multi- threading I. Hrivnacova

6 February 17, 2015 OCDB in AliceO2: O2CDB Importing STEER/CDB/ from AliRoot as starting point of the o2cdb, Applying naming conventions to it. Introduce a FairMQ server and client R. Grosso

7 February 17, 2015 Magnetic Field Separate the chebyshev classes from the field library to MathUtil module (They can be used for other applications) Remove obsolete files (mfchebKGI_meas.root) The mfchebKGI_sym.root is rewritten with parameterization stored in the new AliceO2::Field::MagneticWrapperChebyshev objects - no need to use #pragma rules R. Shahoyan

8 February 17, 2015 TPC detector in AliceO2 Empty implementation of the detector with all classes needed to communicate with framework. A library is also created. This is the first step needed port the TPC code from AliRoot to AliceO2 M. Al-Turany

9 February 17, 2015 Groundwork for the Hough Transform implementation First step needed to implement the Hough Transform algorithm for AliceO2 The runHough executable takes as an argument an event number (i.e. runHough 032) and for the given event it loads all clusters from the corresponding data files. More documentation in DoxygenDoxygen 9 C.Kouzinopoulos

10 February 17, 2015 DDS – Dynamic Deployment System Current stable release - DDS v0.8 (2015-02-16, http://dds.gsi.de/download.html), http://dds.gsi.de/download.html Some highlights of this release: –Fully functional key-value property propagation feature, –the start-up time of DDS agents has been dramatically reduced due to several improvements in the ssh plug-in, –the ssh plug-in supports multiple agents per host, –added stop (restart) execution of tasks at runtime, –added several functional tests to monitor stability, scalability and performance of DDS, –many internal speed and reliability improvements. A.Manafov, A.Lebedev 10

11 February 17, 2015 DDS transport protocol This release delivers several major improvements in the DDS protocol. For example, the the transport learned to accumulate commands before sending, instead of sending them one by one. It has significantly improved key-value propagation performance. Throughout tests, one DDS commander server has propagated more than 1.5M key-value properties in less than 30 s. 11 A.Manafov, A.Lebedev

12 February 17, 2015 Current DDS status Current stable release - DDS v0.8 (2015-02-16, http://dds.gsi.de/download.html), http://dds.gsi.de/download.html Home site: http://dds.gsi.dehttp://dds.gsi.de User’s Manual: http://dds.gsi.de/documentation.htmlhttp://dds.gsi.de/documentation.html Continues integration: http://demac012.gsi.de:22001/waterfall http://demac012.gsi.de:22001/waterfall Source Code: https://github.com/FairRootGroup/DDS https://github.com/FairRootGroup/DDS-user-manual https://github.com/FairRootGroup/DDS-web-site https://github.com/FairRootGroup/DDS-topology-editor https://github.com/FairRootGroup/DDS https://github.com/FairRootGroup/DDS-user-manual https://github.com/FairRootGroup/DDS-web-site https://github.com/FairRootGroup/DDS-topology-editor 12 A.Manafov, A.Lebedev

13 February 17, 2015 Prototype development, tests, and results 13

14 February 17, 2015 payload for tests (FairMQ multipart message) : header (timeframe ID, FLP ID) data of configurable size Test performance & scalability of FLP2EPN devices Example test setup: Sync sampler publishes timeframe IDs at configurable rate. FLPs generate dummy data of configurable size and distribute it to available EPNs (availability is ensured with the heartbeats). Upon collecting sub-timeframes from all FLPs, EPN send confirmation to the sampler with the timeframe ID to measure roundtrip time. EPNs can also measure intervals between receiving from the same FLP (used to see the effect of traffic shaping) The devices can switch between test mode (as described above) and default mode where FLPs receive data instead of generating it (as demonstrated in Alice HLT scenario by Matthias Richter) Deployment and execution with DDS Tests with DDS showed fast deployment and execution of over 1280 processes on the cluster. Tests on HLT development cluster (thanks!): 42 nodes [12 x Intel Xeon E5520 (16 cores) nodes, 30 x AMD Opteron 6172 (24 cores); 30 nodes with GPU]. A.Rybalchenko

15 February 17, 2015 IP over Infiniband performance difference due to CPU architecture Intel Xeon E5520 reaches ~2.4 - 2.6 GByte/s without data loss. (here 2 GB/s was configured for the initial test) AMD Opteron 6172 reaches only ~1.6 - 1.7 GByte/s without data loss. For fair scalability comparison maximum data rate per FLP node is set to 1.6 GByte/s for all FLP nodes. A.Rybalchenko

16 February 17, 2015 Scalability: n FLPs to 1 EPN Throughtput on the FLPs Throughput on the EPNs 2x14x18x116x132x1 Fixed rate per FLP node: (1.6 GByte/s) / #FLPs 2 FLP devices per node and 3 EPN devices per node (resulted in best performance) No data loss! monitoring output A.Rybalchenko

17 February 17, 2015 Scalability: 5 FLPs to n EPNs Throughput on the EPNs Fixed rate per FLP node: ~1 GByte/s 2 FLP devices per node and 3 EPN devices per node (resulted in best performance) No data loss! 5x355x305x255x20 monitoring output A.Rybalchenko

18 February 17, 2015 Scalability: n FLPs to n EPNs Fixed rate per FLP node: 1.6 GByte/s 2 FLP devices per node and 3 EPN devices per node (resulted in best performance) No data loss! 20x2015x15 10x10 Throughput on the EPNs A.Rybalchenko

19 February 17, 2015 Traffic shaping on the FLPs Traffic shaping on the FLPs is implemented that allows more balanced and predictable traffic flow. By „staggering“ the sending of the data on the FLPs, simultaneous transfers from many FLPs to the same EPN can be prevented. The buffer size for the staggering is a product of configurable device priority and the message size. The effect is demonstrated by measuring on EPN device the time interval between receiving from the same FLP in the following histograms. Topology: 18 FLPs -> 27 EPNs (9 and 9 nodes). HLT dev cluster. 2.2 GByte/s throughput per EPN, over 2 hours run time. no staggering with staggering A.Rybalchenko

20 February 17, 2015 AliceO2 prototype tests Current setup includes –aliceHLTWrapper : interfacing different HLT components FilePublisher publishing of binary cluster data blocks BlockFilter relay of cluster blocks before FLP device TPCCATracker track nding (CPU and/or GPU) TPCCAGlobalMerger track merging and tting FileWriter dump resulting track data to disk –testFLP distributed –testEPN distributed Code: AliceO2 dev includes HLT wrapper and FLP-EPN code M. Richter 20

21 February 17, 2015 Prototype setup 36 Data Publishers on 36 nodes (one for each TPC slice) 36 FLP devices  28 EPN devices 28 tracking devices  28 track merger devices 1 data sink 21 Test data: 1285 events of run 167808 (2011) minimum bias PbPb Low-multiplicity events ~10 reconstructed tracks and high-multiplicity events with up to 4000 tracks M. Richter

22 February 17, 2015 Integration of HLT TPC Tracking Data structures in the HLT TPC tracking All internally data structures represent 7 cluster parameters, but different precision HWCF cluster: 24 Byte Cluster bitstream: 10 Byte Transformed cluster: 44 Byte TPC CA tracker (track finder) on GPU nodes 22 M. Richter

23 February 17, 2015 Processing Topology Performance Publishing data at 100Hz, average sample size 16 MByte (HWCF clusters) Topology is processing aggregated size of 1.6 GByte/s 23 Cluster publishersTrackerMerger Turnaround time for publishers: 7 ms Turnaround time for tracker/merger: 220 ms (28 EPN branches) M. Richter

24 February 17, 2015 Some internal plots, 10 Hz processing setup The topology is not saturated at 10Hz, that's why the input data rate has this linear trend, we have to compare this at higher sample rate, but there was a limit now with the network disk The turnaround time of 2.5s matches 10Hz and 25 processing branches, it is over time Slight increase of turnaround time with bigger data samples observed 24 M. Richter

25 February 17, 2015 Plans Provide performance figures for the TDR TPC simulation Native device for Hough transform (TPC) Raw data format Emulator to create “time frames” from the old raw data ITS reconstruction device 25


Download ppt "February 17, 2015 Software Framework Development P. Hristov for CWG13."

Similar presentations


Ads by Google