Online Event Selection in the CBM Experiment I. Kisel GSI, Darmstadt, Germany RT'09, Beijing May 12, 2009.

Slides:

Advertisements

Similar presentations

DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)

Advertisements

Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.

Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.

L1 Event Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting Dubna, October 16, 2008.

FSOSS Dr. Chris Szalwinski Professor School of Information and Communication Technology Seneca College, Toronto, Canada GPU Research Capabilities.

Challenge the future Delft University of Technology Evaluating Multi-Core Processors for Data-Intensive Kernels Alexander van Amesfoort Delft.

A many-core GPU architecture.. Price, performance, and evolution.

ALICE HLT High Speed Tracking and Vertexing Real-Time 2010 Conference Lisboa, May 25, 2010 Sergey Gorbunov 1,2 1 Frankfurt Institute for Advanced Studies,

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

Porting Reconstruction Algorithms to the Cell Broadband Engine S. Gorbunov 1, U. Kebschull 1,I. Kisel 1,2, V. Lindenstruth 1, W.F.J. Müller 2 S. Gorbunov.

K-Ary Search on Modern Processors Fakultät Informatik, Institut Systemarchitektur, Professur Datenbanken Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Evaluation of Multi-core Architectures for Image Processing Algorithms Masters Thesis Presentation by Trupti Patil July 22, 2009.

CA+KF Track Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting GSI, February 28, 2008.

KIP TRACKING IN MAGNETIC FIELD BASED ON THE CELLULAR AUTOMATON METHOD TRACKING IN MAGNETIC FIELD BASED ON THE CELLULAR AUTOMATON METHOD Ivan Kisel KIP,

Event Reconstruction in STS I. Kisel GSI CBM-RF-JINR Meeting Dubna, May 21, 2009.

Online Track Reconstruction in the CBM Experiment I. Kisel, I. Kulakov, I. Rostovtseva, M. Zyzak (for the CBM Collaboration) I. Kisel, I. Kulakov, I. Rostovtseva,

Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October.

Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm:

Elastic Neural Net for standalone RICH ring finding Sergey Gorbunov Ivan Kisel DESY, Zeuthen KIP, Uni-Heidelberg DESY, Zeuthen KIP, Uni-Heidelberg KIP.

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

Status of the L1 STS Tracking I. Kisel GSI / KIP CBM Collaboration Meeting GSI, March 12, 2009.

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |

© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.

Speed-up of the ring recognition algorithm Semeon Lebedev GSI, Darmstadt, Germany and LIT JINR, Dubna, Russia Gennady Ososkov LIT JINR, Dubna, Russia.

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

Ooo Performance simulation studies of a realistic model of the CBM Silicon Tracking System Silicon Tracking for CBM Reconstructed URQMD event: central.

Fast reconstruction of tracks in the inner tracker of the CBM experiment Ivan Kisel (for the CBM Collaboration) Kirchhoff Institute of Physics University.

Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.

Fast track reconstruction in the muon system and transition radiation detector of the CBM experiment at FAIR Andrey Lebedev 1,2 Claudia Höhne 3 Ivan Kisel.

Status of Reconstruction in CBM

Standalone FLES Package for Event Reconstruction and Selection in CBM DPG Mainz, 21 March 2012 I. Kisel 1,2, I. Kulakov 1, M. Zyzak 1 (for the CBM.

V.Petracek TU Prague, UNI Heidelberg GSI Detection of D +/- hadronic 3-body decays in the CBM experiment ● D +/- K  B. R. 

1 Open charm simulations ( D +, D 0,  + c ) Sts geometry: 2MAPS +6strip (Strasbourg geo) or 2M2H4S (D+ and D - at 25AGeV); TOOLS: signal (D +  K - 

First Level Event Selection Package of the CBM Experiment S. Gorbunov, I. Kisel, I. Kulakov, I. Rostovtseva, I. Vassiliev (for the CBM Collaboration (for.

Performance simulations with a realistic model of the CBM Silicon Tracking System Silicon tracking for CBM Number of integration components Ladders106.

– Self-Triggered Front- End Electronics and Challenges for Data Acquisition and Event Selection CBM  Study of Super-Dense Baryonic Matter with.

Methods for fast reconstruction of events Ivan Kisel Kirchhoff-Institut für Physik, Uni-Heidelberg FutureDAQ Workshop, München March 25-26, 2004 KIP.

Fast Tracking of Strip and MAPS Detectors Joachim Gläß Computer Engineering, University of Mannheim Target application is trigger  1. do it fast  2.

Global Tracking for CBM Andrey Lebedev 1,2 Ivan Kisel 1 Gennady Ososkov 2 1 GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt, Germany 2 Laboratory.

Cellular Automaton Method for Track Finding (HERA-B, LHCb, CBM) Ivan Kisel Kirchhoff-Institut für Physik, Uni-Heidelberg Second FutureDAQ Workshop, GSI.

Muon detection in the CBM experiment at FAIR Andrey Lebedev 1,2 Claudia Höhne 1 Ivan Kisel 1 Anna Kiseleva 3 Gennady Ososkov 2 1 GSI Helmholtzzentrum für.

CA+KF Track Reconstruction in the STS S. Gorbunov and I. Kisel GSI/KIP/LIT CBM Collaboration Meeting Dresden, September 26, 2007.

A parallel High Level Trigger benchmark (using multithreading and/or SSE)‏ Håvard Bjerke.

Kalman Filter based Track Fit running on Cell S. Gorbunov 1,2, U. Kebschull 2, I. Kisel 2,3, V. Lindenstruth 2 and W.F.J. Müller 1 1 Gesellschaft für Schwerionenforschung.

1/13 Future computing for particle physics, June 2011, Edinburgh A GPU-based Kalman filter for ATLAS Level 2 Trigger Dmitry Emeliyanov Particle Physics.

Parallelization of the SIMD Kalman Filter for Track Fitting R. Gabriel Esteves Ashok Thirumurthi Xin Zhou Michael D. McCool Anwar Ghuloum Rama Malladi.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

Fast parallel tracking algorithm for the muon detector of the CBM experiment at FAIR Andrey Lebedev 1,2 Claudia Höhne 1 Ivan Kisel 1 Gennady Ososkov 2.

My Coordinates Office EM G.27 contact time:

S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.

Parallel Implementation of the KFParticle Vertexing Package for the CBM and ALICE Experiments Ivan Kisel 1,2,3, Igor Kulakov 1,4, Maksym Zyzak 1,4 1 –

LIT participation LIT participation Ivanov V.V. Laboratory of Information Technologies Meeting on proposal of the setup preparation for external beams.

Track Reconstruction in MUCH and TRD Andrey Lebedev 1,2 Gennady Ososkov 2 1 Gesellschaft für Schwerionenforschung, Darmstadt, Germany 2 Laboratory of Information.

KFParticle Ivan Kisel 1,2, Igor Kulakov 1,3, Iourii Vassiliev 1,2, Maksym Zyzak 1,3 1 – Goethe-Universität Frankfurt, Frankfurt am Main, Germany 2 – GSI.

The Present and Future of Parallelism on GPUs

M. Bellato INFN Padova and U. Marconi INFN Bologna

Y. Fisyak1, I. Kisel2, I. Kulakov2, J. Lauret1, M. Zyzak2

New Track Seeding Techniques at the CMS experiment

Kalman filter tracking library

GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht

Geant4 MT Performance Soon Yung Jun (Fermilab)

Fast Parallel Event Reconstruction

Parallel Computing Lecture

ALICE HLT tracking running on GPU

TPC reconstruction in the HLT

Multicore and GPU Programming

6- General Purpose GPU Programming

CSE 502: Computer Architecture

Multicore and GPU Programming

Presentation transcript:

Online Event Selection in the CBM Experiment I. Kisel GSI, Darmstadt, Germany RT'09, Beijing May 12, 2009

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany2/12 Tracking Challenge in CBM (FAIR/GSI, Germany) * Future fixed-target heavy-ion experiment * 10 7 Au+Au collisions/sec * ~ 1000 charged particles/collision * Non-homogeneous magnetic field * Double-sided strip detectors (85% fake space points) Track reconstruction in STS/MVD and displaced vertex search required in the first trigger level

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany3/12 Open Charm Event Selection D  (c  = 312  m): D +  K -  +  + (9.5%) D 0 (c  = 123  m): D 0  K -  + (3.8%) D 0  K -  +  +  - (7.5%) D  s (c  = 150  m): D + s  K + K -  + (5.3%)  + c (c  = 60  m):  + c  pK -  + (5.0%) No simple trigger primitive, like high p t, available to tag events of interest. The only selective signature is the detection of the decay vertex. K-K-K-K- ++ First level event selection will be done in a processor farm fed with data from the event building network

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany4/12 Many-core HPC Heterogeneous systems of many cores Heterogeneous systems of many cores Uniform approach to all CPU/GPU families Uniform approach to all CPU/GPU families Similar programming languages (CUDA, Ct, OpenCL) Similar programming languages (CUDA, Ct, OpenCL) Parallelization of the algorithm (vectors, multi-threads, many-cores) Parallelization of the algorithm (vectors, multi-threads, many-cores) On-line event selection On-line event selection Mathematical and computational optimization Mathematical and computational optimization Optimization of the detector Optimization of the detectorCores HW Threads SIMD width ? OpenCL ? Gaming STI: Cell STI: CellGaming GP CPU Intel: Larrabee Intel: Larrabee GP CPU Intel: Larrabee Intel: Larrabee CPU Intel: XX-cores Intel: XX-coresCPU FPGA Xilinx: Virtex Xilinx: VirtexFPGA CPU/GPU AMD: Fusion AMD: FusionCPU/GPU GP GPU Nvidia: Tesla Nvidia: Tesla GP GPU Nvidia: Tesla Nvidia: Tesla

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany5/12 Cellular Automaton Track Finder 1 Create triplets 2 Collect tracks Three types of tracks and three stages of track finding: fast primary tracks / all primary tracks / all tracks Hits : 30000/ 9000/ 5000 Triplets: 20000/10000/ 3000 Tracks: 500/ 200/ 10 Cellular Automaton Kalman Filter FPGA GPU CPU ?

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany6/12 Core of Reconstruction: Kalman Filter based Track Fit detectors measurements  (r, C) track parameters and errors Initial estimates for and Prediction step State estimate Error covariance Filtering step  2 x  2 y …  2 y …  2 z  2 z  2 px  2 px …  2 py …  2 py  2 pz  2 pz C =C =C =C = error of x r = { x, y, z, p x, p y, p z } position momentum State vector Covariancematrix

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany7/12 Code (Part of the Kalman Filter) inline void AddMaterial( TrackV &track, Station &st, Fvec_t &qp0 ) { cnst mass2 = *0.1396; cnst mass2 = *0.1396; Fvec_t tx = track.T[2]; Fvec_t tx = track.T[2]; Fvec_t ty = track.T[3]; Fvec_t ty = track.T[3]; Fvec_t txtx = tx*tx; Fvec_t txtx = tx*tx; Fvec_t tyty = ty*ty; Fvec_t tyty = ty*ty; Fvec_t txtx1 = txtx + ONE; Fvec_t txtx1 = txtx + ONE; Fvec_t h = txtx + tyty; Fvec_t h = txtx + tyty; Fvec_t t = sqrt(txtx1 + tyty); Fvec_t t = sqrt(txtx1 + tyty); Fvec_t h2 = h*h; Fvec_t h2 = h*h; Fvec_t qp0t = qp0*t; Fvec_t qp0t = qp0*t; cnst c1=0.0136, c2=c1*0.038, c3=c2*0.5, c4=-c3/2.0, c5=c3/3.0, c6=-c3/4.0; cnst c1=0.0136, c2=c1*0.038, c3=c2*0.5, c4=-c3/2.0, c5=c3/3.0, c6=-c3/4.0; Fvec_t s0 = (c1+c2*st.logRadThick + c3*h + h2*(c4 + c5*h +c6*h2) )*qp0t; Fvec_t s0 = (c1+c2*st.logRadThick + c3*h + h2*(c4 + c5*h +c6*h2) )*qp0t; Fvec_t a = (ONE+mass2*qp0*qp0t)*st.RadThick*s0*s0; Fvec_t a = (ONE+mass2*qp0*qp0t)*st.RadThick*s0*s0; CovV &C = track.C; CovV &C = track.C; C.C22 += txtx1*a; C.C22 += txtx1*a; C.C32 += tx*ty*a; C.C33 += (ONE+tyty)*a; C.C32 += tx*ty*a; C.C33 += (ONE+tyty)*a;} Use headers to overload +, -, *, / operators --> the source code is unchanged !

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany8/12 Header (Intel’s SSE) typedef F32vec4 Fvec_t; typedef F32vec4 Fvec_t; /* Arithmetic Operators */ /* Arithmetic Operators */ friend F32vec4 operator +(const F32vec4 &a, const F32vec4 &b) { return _mm_add_ps(a,b); } friend F32vec4 operator +(const F32vec4 &a, const F32vec4 &b) { return _mm_add_ps(a,b); } friend F32vec4 operator -(const F32vec4 &a, const F32vec4 &b) { return _mm_sub_ps(a,b); } friend F32vec4 operator -(const F32vec4 &a, const F32vec4 &b) { return _mm_sub_ps(a,b); } friend F32vec4 operator *(const F32vec4 &a, const F32vec4 &b) { return _mm_mul_ps(a,b); } friend F32vec4 operator *(const F32vec4 &a, const F32vec4 &b) { return _mm_mul_ps(a,b); } friend F32vec4 operator /(const F32vec4 &a, const F32vec4 &b) { return _mm_div_ps(a,b); } friend F32vec4 operator /(const F32vec4 &a, const F32vec4 &b) { return _mm_div_ps(a,b); } /* Functions */ /* Functions */ friend F32vec4 min( const F32vec4 &a, const F32vec4 &b ){ return _mm_min_ps(a, b); } friend F32vec4 min( const F32vec4 &a, const F32vec4 &b ){ return _mm_min_ps(a, b); } friend F32vec4 max( const F32vec4 &a, const F32vec4 &b ){ return _mm_max_ps(a, b); } friend F32vec4 max( const F32vec4 &a, const F32vec4 &b ){ return _mm_max_ps(a, b); } /* Square Root */ /* Square Root */ friend F32vec4 sqrt ( const F32vec4 &a ){ return _mm_sqrt_ps (a); } friend F32vec4 sqrt ( const F32vec4 &a ){ return _mm_sqrt_ps (a); } /* Absolute value */ /* Absolute value */ friend F32vec4 fabs( const F32vec4 &a){ return _mm_and_ps(a, _f32vec4_abs_mask); } friend F32vec4 fabs( const F32vec4 &a){ return _mm_and_ps(a, _f32vec4_abs_mask); } /* Logical */ /* Logical */ friend F32vec4 operator&( const F32vec4 &a, const F32vec4 &b ){ // mask returned friend F32vec4 operator&( const F32vec4 &a, const F32vec4 &b ){ // mask returned return _mm_and_ps(a, b); return _mm_and_ps(a, b); } friend F32vec4 operator|( const F32vec4 &a, const F32vec4 &b ){ // mask returned friend F32vec4 operator|( const F32vec4 &a, const F32vec4 &b ){ // mask returned return _mm_or_ps(a, b); return _mm_or_ps(a, b); } friend F32vec4 operator^( const F32vec4 &a, const F32vec4 &b ){ // mask returned friend F32vec4 operator^( const F32vec4 &a, const F32vec4 &b ){ // mask returned return _mm_xor_ps(a, b); return _mm_xor_ps(a, b); } friend F32vec4 operator!( const F32vec4 &a ){ // mask returned friend F32vec4 operator!( const F32vec4 &a ){ // mask returned return _mm_xor_ps(a, _f32vec4_true); return _mm_xor_ps(a, _f32vec4_true); } friend F32vec4 operator||( const F32vec4 &a, const F32vec4 &b ){ // mask returned friend F32vec4 operator||( const F32vec4 &a, const F32vec4 &b ){ // mask returned return _mm_or_ps(a, b); return _mm_or_ps(a, b); } /* Comparison */ /* Comparison */ friend F32vec4 operator<( const F32vec4 &a, const F32vec4 &b ){ // mask returned friend F32vec4 operator<( const F32vec4 &a, const F32vec4 &b ){ // mask returned return _mm_cmplt_ps(a, b); return _mm_cmplt_ps(a, b); } SIMD instructions

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany9/12 Kalman Filter Track Fit on Intel Xeon, AMD Opteron and Cell Motivated, but not restricted by Cell ! 2 Intel Xeon Processors with Hyper-Threading enabled and 512 kB cache at 2.66 GHz; 2 Dual Core AMD Opteron Processors 265 with 1024 kB cache at 1.8 GHz; 2 Cell Broadband Engines with 256 kB local store at 2.4G Hz. Intel P4 Cell Comp. Phys. Comm. 178 (2008) faster!

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany10/12 Performance of the KF Track Fit on CPU/GPU Systems CBM Progress Report, 2008 Speed-up 3.7 on the Xeon 5140 (Woodcrest) at 2.4 GHz using icc 9.1 Real-time performance on different Intel CPU platforms Real-time performance on NVIDIA for a single track Cores HW Threads SIMD width scalar double single -> xCell SPE ( 16 ) Woodcrest ( 2 ) Clovertown ( 4 ) Dunnington ( 6 ) Real-time performance (  s) on different CPU architectures – speed-up 100 with 32 threads

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany11/12 Cellular Automaton Track Finder: Pentium faster!

May 12, 2009, RT'09, BeijingIvan Kisel, GSI, Germany12/12Summary A standalone package for online event selection is ready for investigation A standalone package for online event selection is ready for investigation Cellular Automaton track finder takes 5 ms per minimum bias event Cellular Automaton track finder takes 5 ms per minimum bias event Kalman Filter track fit is a benchmark for modern CPU/GPU architectures Kalman Filter track fit is a benchmark for modern CPU/GPU architectures SIMDized multi-threaded KF track fit takes 0.1  s/track on Intel Core i7 SIMDized multi-threaded KF track fit takes 0.1  s/track on Intel Core i7 Throughput of 2.2·10 7 tracks /sec is reached on NVIDIA GTX 280 Throughput of 2.2·10 7 tracks /sec is reached on NVIDIA GTX 280 ? OpenCL ? Gaming STI: Cell STI: CellGaming GP CPU Intel: Larrabee Intel: Larrabee GP CPU Intel: Larrabee Intel: Larrabee ? CPU Intel: XX-cores Intel: XX-coresCPU FPGA Xilinx: Virtex Xilinx: VirtexFPGA ? CPU/GPU AMD: Fusion AMD: FusionCPU/GPU ? GP GPU Nvidia: Tesla Nvidia: Tesla GP GPU Nvidia: Tesla Nvidia: Tesla