Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matthias Richter, Sebastian Kalcher, Jochen Thäder & Timm M. Steinbeck

Similar presentations


Presentation on theme: "Matthias Richter, Sebastian Kalcher, Jochen Thäder & Timm M. Steinbeck"— Presentation transcript:

1 Matthias Richter, Sebastian Kalcher, Jochen Thäder & Timm M. Steinbeck
AliRoot - Pub/Sub Framework Analysis Component Interface

2 Overview System consists of three main parts A C++ shared library with a component handler class Compile and callable directly from AliRoot A number of C++ shared libraries with the actual reconstruction components themselves Compiled as part of AliRoot and directly callable from it A C wrapper API that provides access to the component handler and reconstruction components Contained in component handler shared library Called by Pub/Sub wrapper component Makes Pub/Sub and AliRoot compiler independant Binary compatibility No recompile of reconstruction code

3 Overview Component Handler HLT TPC Shared Library Shared Library
Register Component HLT TPC Shared Library (Load Component Library;) Initialize Components; Get Components Component Handler C++ Class Global Clusterfinder Object Global Tracker Object (Load Comp. Lib.;) Get Component; Initialize Component Load Library Component Handler C Wrapper Functions Clusterfinder C++ Class Tracker C++ Class AliRoot Initialize ProcessEvent ProcessEvent Pub/Sub Framework (Load Component Library;) Get Component; Initialize Component; Process Event Wrapper Processing Component

4 Components Components have to implement a set of abstract functions Return ID (string) Return set of required input data types Return produced output data type(s) Process one event Close to what is needed for Pub/Sub components But simplified One global instance of each component has to be present in shared component library Automagic registration with global component handler object

5 AliRoot AliRoot code accesses classes in component handler shared library Obtains component objects from handler class Accesses component objects directly Component Handler Shared Library Register Component HLT TPC Shared Library Global Clusterfinder Object Global Tracker Object Component Handler C++ Class Load Library Clusterfinder C++ Class Tracker C++ Class Initialize ProcessEvent AliRoot (Load Component Library;) Initialize Components; Get Components

6 Publisher/Subscriber
Pub/Sub Framework uses ONE wrapper component Accesses handler and components via C wrapper API Can call multiple components in different libraries One component per wrapper instance Component Handler Shared Library Register Component HLT TPC Shared Library Global Clusterfinder Object Global Tracker Object Component Handler C++ Class Load Library (Load Comp. Lib.;) Get Component; Initialize Component Component Handler C Wrapper Functions Clusterfinder C++ Class Tracker C++ Class Initialize ProcessEvent Pub/Sub Framework (Load Component Library;) Get Component; Initialize Component; Process Event Wrapper Processing Component

7 Publisher/Subscriber
Initialization Sequence : AliRootWrapperSubscriber : C Wrapper : AliHLTComponentHandler : AliHLTComponent

8 Publisher/Subscriber
Processing Sequence : AliRootWrapperSubscriber : C Wrapper : AliHLTComponentHandler : AliHLTComponent

9 Publisher/Subscriber
Termination Sequence : AliRootWrapperSubscriber : C Wrapper : AliHLTComponentHandler : AliHLTComponent

10 Current Status Basic Implementation Done Base library with ComponentHandler and Component base class implemented Pub/Sub wrapper component done and working HLT TPC reconstruction code ported and working Basic AliRoot HLT Configuration scheme implemented Ongoing work on integration of the ComponentHandler into the data processing scheme of AliRoot

11 ClusterFinder Benchmarks
pp – Events 14 TeV , 0.5 T Number of Events: 1200 Iterations: 100 TestBench: SimpleComponentWrapper TestNodes: HD ClusterNodes e304, e307 (PIII, 733 MHz) HD ClusterNodes e106, e107 (PIII, 800 MHz) HD GatewayNode alfa (PIII, 1.0 GHz) HD ClusterNode eh001 (Opteron, 1.6 GHz) CERN ClusterNode eh000 (Opteron, 1.8 GHz)

12 Cluster Distribution

13 Signal Distribution

14 File Size Distribution

15 Total Distributions

16 Padrows & Pads per Patch

17 Basic Results 13797 4434 42 Average Per patch , event 13437 4264 35 3
12233 3985 44 2 17525 5683 61 1 13312 4149 23 5 13384 4210 29 4 12892 4313 60 Filesize [Byte] # Signals # Cluster Patch

18 Timing Results 6,54 6,01 4,81 3,99 2,90 Patch 5 [ms] 6,90 6,61 6,67
6,14 8,82 6,57 PIII 733MHz 6,33 6,06 6,12 5,64 8,10 6,04 PIII 800 MHz 5,11 4,87 4,90 4,51 6,65 4,95 PIII 1,0 GHz 4,13 3,94 3,98 3,66 5,32 3,96 Opteron 1,8 GHz 3,06 2,93 2,96 2,73 3,92 Opteron 1,6 GHz Average Patch 4 Patch 3 Patch 2 Patch 1 Patch 0 CPU Xeon IV 3.2 GHz 2.11 2.79 1.98 2.14 2.13 2.21

19 Timing Results

20 Timing Results Memory streaming benchmarks: 1.6 GHz Opteron system ca. 4.3 GB/s 1.8 GHz Opteron system ca. 3 GB/s Reason for performance drop of 1.8 GHz system compared to 1.6 GHz system Cause of memory performance difference unknown, currently being investigated Maybe related to NUMA parameter (cf. slice 23)

21 Tracker Timing Results
Slice tracker, average times per slice Opteron 1.8 GHz (Dual MP, Dual Core): 1 Process: ca. 3.6 ms/slice Independent of CPU 2 Processes, different chips: ca. factor 1 2 Processes, same chip, different cores: ca. factor 1.75 4 Processes, all cores: ca. factor 1.83 Xeon 3.2 GHz (Dual MP, HyperThreading: Mapping to CPUs unknown for more than 1 process 1 Process: ca ms/slice 2 Processes: ca. factor 2 slower 3 Processes: ca. factor 3.5 slower

22 Timing Results – Opteron Memory
Floating Point/Memory Microbenchmarks CPU loop: No effect w. multiple processes Linear memory read: Almost no effect Random memory read: Runtime Factors: 1.33, 1.01, 1.43 Linear memory read and write: Runtime Factors: 1.57, 1.12, 2.31 Random memory read and write: Runtime Factors: 1.91, 1.92, 2.78 Linear memory write: Runtime Factors: 1.71, 1.72, 3.48 Random memory write: Runtime Factors: 1.97, 1.90, 3.76 Runtime Factors are for two processes on same chip, two processes on different chips, four processes on all cores, relative to single process

23 Timing Results – Opteron Memory
Floating Point/Memory Microbenchmarks Memory results, in particular memory writes, likely explanation f. tracker behaviour Tasks Examine system memory parameters (e.g. BIOS, Linux kernel) One critical parameter: Kernel NUMA-Awareness found not activated Re-evaluate/optimize tracker code wrt. memory writes Likely problem already found, Conformal Mapping uses large memory array with widely (quasi) randomly distributed write and read accesses Lesson for system procurement If possible evaluate systems/architectures wrt. pure performance AND scalability

24 Mainboard prices comparable ca. 350-450 for dual MP, dual core capable
Price Comparison Opteron 1.8 GHz: Single Core: ca. 180,- € Dual Core: ca. 350,- € Xeon 3.2 GHz: Single Core: ca. 330,- € Mainboard prices comparable ca for dual MP, dual core capable For Opterons, per core prizes for full systems: Assumption: 1 GB memory per core 1.8 GHz Single/Dual Core, Dual MP: ca. 800/600,- € 2.4 GHz Single/Dual Core, Dual MP: ca. 1000/880,- € 2.4 GHz Single/Dual Core, Quad MP: ca. 1700/1250,- €


Download ppt "Matthias Richter, Sebastian Kalcher, Jochen Thäder & Timm M. Steinbeck"

Similar presentations


Ads by Google