Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm: D (c = 312 m): D + K - + + (9.5%) D 0 (c = 123 m): D 0 K - + (3.8%) D 0 K - + + - (7.5%) D s (c = 150 m): D + s K + K - + (5.3%) + c (c = 60 m): + c pK - + (5.0%) No simple, single track level trigger primitive, like high p t, available to tag events of interest. The only selective signature is the detection of the decay vertex. Track reconstruction in STS/MVD and displaced vertex search required in the first trigger level. Such a complex trigger is not feasible within the latency limits of conventional Front-End Electronics, typically 4 μsec at LHC. Work without L1 trigger Use Self-triggered Front-End Electronics Use timestamps to organize and correlate data Ship all hits to subsequent data buffer and processing stages High-Speed DAQ and Event Building Typical parameters (for 10 7 int/sec and 1% occupancy): 100 kHz channel hit rate 600 Byte/sec per channel data flow First level event selection, which replaces the L1 trigger in a conventional system, is done in a processor farm fed with data from the event building network Very efficient tracking algorithms are essential for the feasibility of the open charm event selection Up to 10 9 tracks/sec in the Silicon tracker Co-develop Silicon tracker layout and tracking algorithm for best overall performance Develop algorithms which exploit the full potential of modern processors. First step: -use 'Single Instruction Multiple Data' (SIMD) instructions. They are essential for the high performance of many multi-media applications (e.g. video codecs), but rarely used in data analysis. Best results were obtained with a Cellular Automaton based track finder with integrated Kalman filter track fit allows usage of double-side strip detectors even at high track densities highly optimized code - field approximated by polynomials - compact, cache-efficient data - most calculations SIMDized - fast on standard PC's - well adapted to next generation many-core and wide-SIMD processors - already ported to IBM cell processor very fast when only hard quasi-primary tracks are reconstructed, as needed in the online first level event selection of open charm candidates supports reconstruction of soft tracks down to 100 MeV/c, as needed in the offline analysis High Speed Tracking Algorithms Source: I. Kisel, KIP, Heidelberg and GSI, Darmstadt FPGA PCPCPCPCPC Sub-Farm Gaming STI: Cell STI: CellGaming GP GPU Nvidia: Tesla Nvidia: Tesla GP GPU Nvidia: Tesla Nvidia: Tesla GP CPU Intel: Larrabee Intel: Larrabee GP CPU Intel: Larrabee Intel: Larrabee CPU/GPU AMD: Fusion AMD: FusionCPU/GPU ?? Cell: heterogeneous multi-core Intel P4 Cell lxg1411 eh102 blade11bc4 Data flow out of the Front-end Electronics at 10 7 int/sec will be about 1 TByte/sec Optimization steps for the track fit routine Performance on different platforms CPU time for track reconstruction and fit Typ. Au+Au collision Concept of SIMD instructions: process a short vector per cycle R&D Roadmap Detailed simulation and co-optimization of the tracking system and the analysis algorithms -alternate sensor types (single-sided sensors) -alternate module layouts Detailed studies of event selection algorithms - open charm selector covering all relevant channels (D 0,D ±,D s,Λ c ) -design of multi-level event selection Mathematical and computational optimization of all algorithms Determine best platform (programmable logic vs.processor) for the different processing steps: -Hit/Cluster finding -Tracklet finding -Tracking/Vertexting Go beyond SIMDization (from scalars to vectors) Address MIMDization (multi-threads, multi-cores and many-core systems) Exploit the numerical throughput of dedicated purpose processors like GPU's (Graphics Processors) Be ready for the emerging heterogeneous many-core systems Re-design algorithms to run efficiently on all CPU/GPU architectures Investigate new languages for the performance critical core of algorithms, like Ct or CUDA GPU: Controller plus many ALU CPU: SIMD, multi-core