Presentation is loading. Please wait.

Presentation is loading. Please wait.

Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October.

Similar presentations


Presentation on theme: "Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October."— Presentation transcript:

1 Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October 18-22, 2010

2 21 October 2010, CHEP-2010Ivan Kisel, GSI2/11 Tracking Challenge in CBM (FAIR/GSI, Germany) Fixed-target heavy-ion experiment 10 7 Au+Au collisions/s 1000 charged particles/collision Non-homogeneous magnetic field Double-sided strip detectors (85% combinatorial space points) Track reconstruction in STS/MVD and displaced vertex search are required in the first trigger level. Reconstruction packages: track findingCellular Automaton (CA) track fittingKalman Filter (KF) vertexingKF Particle

3 21 October 2010, CHEP-2010Ivan Kisel, GSI3/11 Many-Core CPU/GPU Architectures 2x4 cores 512 cores 1+8 cores 32 cores Intel/AMD CPU NVIDIA GPU Intel MICAIBM Cell Future systems are heterogeneous

4 21 October 2010, CHEP-2010Ivan Kisel, GSI4/11 Kalman Filter (KF) based Track Fit Track fit: Estimation of the track parameters at one or more hits along the track – Kalman-Filter (KF) Detector layers Hits  (r, C) r – Track parameters C – Precision Initialising Prediction Correction Precision 1 2 3 r = { x, y, z, p x, p y, p z } Position, direction and momentum State vector Nowadays the Kalman Filter is used in almost all HEP experiments Kalman Filter: 1.Start with an arbitrary initialization. 2.Add one hit after another. 3.Improve the state vector. 4.Get the optimal parameters after the last hit. KF as a recursive least squares method KF Block-diagram 1 2 3

5 21 October 2010, CHEP-2010Ivan Kisel, GSI5/11 Performance of the KF Track Fit on CPU/GPU Systems scalar double single -> 24 8 16 32 1.00 10.00 0.10 0.01 Scalability on different CPU architectures – speed-up 100 Data Stream Parallelism (10x) Task Level Parallelism (100x) 2xCell SPE (16 ) Woodcrest ( 2 ) Clovertown ( 4 ) Dunnington ( 6 ) SIMD Cores and Threads Time/Track,  s Threads Cores Threads SIMD Real-time performance on different Intel CPU platforms Real-time performance on NVIDIA GPU graphic cards Scalabilty CPU GPU The Kalman Filter algorithm performs at ns level

6 21 October 2010, CHEP-2010Ivan Kisel, GSI6/11 Scalability of the KF Track Fit on 2xNehalem = 8 Cores Aiming to develop a scheduler, we investigate scalability and CPU load. Given n threads each filled with 10 m tracks, run them on specific n logical cores with 1 thread per 1 logical core. Each physical core of Nehalem has 2 HW threads (read/exe), which are considered by the operating system as logical cores. For small groups of tracks the overhead becomes significant, while large groups of tracks use CPU more efficient. Measure tracks throughput rather than time per track. Core 1 Core 2

7 21 October 2010, CHEP-2010Ivan Kisel, GSI7/11 Cellular Automaton (CA) as Track Finder Track finding: Wich hits in detector belong to the same track? – Cellular Automaton (CA) 0. Hits1. Segments 1 2 3 4 2. Counters3. Track Candidates4. Tracks Cellular Automaton: local w.r.t. data intrinsically parallel extremely simple very fast Perfect for many-core CPU/GPU ! Detector layers Hits 4. Tracks (CBM) 0. Hits (CBM) 1000 Hits 1000 Tracks Cellular Automaton: 1.Build short track segments. 2.Connect according to the track model, estimate a possible position on a track. 3.Tree structures appear, collect segments into track candidates. 4.Select the best track candidates.

8 21 October 2010, CHEP-2010Ivan Kisel, GSI8/11 CBM Cellular Automaton Track Finder 770 Tracks Top view Front view Efficiency Scalability Highly efficient reconstruction of 150 central collisions per second Intel X5550, 2x4 cores at 2.67 GHz

9 21 October 2010, CHEP-2010Ivan Kisel, GSI9/11 CA Track Finder: First 4D Version The beam will have no bunch structure, therefore 4D measurements (x, y, z, t). 100 mbias events have been divided into groups, each of them has been processed as a single event. Each hit had an integer index (equal to the event number, later will represent the time measurement t ). Each tracklet was constructed from hits with the same t. The same efficiency. Slight increase of the processing time with larger time slices. Reconstruction of time slices rather then events will be needed. Ready for more realistic 4D simulations Events/groupSpeed, ev/s 162 267 462 561 1054 2047

10 21 October 2010, CHEP-2010Ivan Kisel, GSI10/11 Consolidate Efforts: International Tracking Workshop 45 participants from Austria, China, Germany, India, Italy, Norway, Russia, Switzerland, UK and USA Follow-up Workshop: 10-11 March, 2011, CERN Sverre Jarp

11 21 October 2010, CHEP-2010Ivan Kisel, GSI11/11Summary CBM event reconstruction is at the ms level CBM event reconstruction is at the ms level Strong many-core scalability Strong many-core scalability Data parallelism vs. task parallelism Data parallelism vs. task parallelism Use SIMD unit, multi-threading, many-cores Use SIMD unit, multi-threading, many-cores Parallelization is now becoming a standard in the CBM experiment Parallelization is now becoming a standard in the CBM experiment


Download ppt "Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October."

Similar presentations


Ads by Google