New Track Seeding Techniques at the CMS experiment

Slides:



Advertisements
Similar presentations
High Level Trigger (HLT) for ALICE Bergen Frankfurt Heidelberg Oslo.
Advertisements

Track Trigger Designs for Phase II Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen, S.X. Wu, (Boston.
Evaluating Graph Coloring on GPUs Pascal Grosset, Peihong Zhu, Shusen Liu, Suresh Venkatasubramanian, and Mary Hall Final Project for the GPU class - Spring.
Computing EVO meeting, January 15 th 2013 Status of the Tracking Code Gianluigi Boca, Pavia University.
B Tagging with CMS Fabrizio Palla INFN Pisa B  Workshop Helsinki 29 May – 1 June 2002.
Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.
HLT - data compression vs event rejection. Assumptions Need for an online rudimentary event reconstruction for monitoring Detector readout rate (i.e.
High Level Trigger – Applications Open Charm physics Quarkonium spectroscopy Dielectrons Dimuons Jets.
A Fast Level 2 Tracking Algorithm for the ATLAS Detector Mark Sutton University College London 7 th October 2005.
Online Measurement of LHC Beam Parameters with the ATLAS High Level Trigger David W. Miller on behalf of the ATLAS Collaboration 27 May th Real-Time.
ALICE HLT High Speed Tracking and Vertexing Real-Time 2010 Conference Lisboa, May 25, 2010 Sergey Gorbunov 1,2 1 Frankfurt Institute for Advanced Studies,
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
February 19th 2009AlbaNova Instrumentation Seminar1 Christian Bohm Instrumentation Physics, SU Upgrading the ATLAS detector Overview Motivation The current.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Tracking at the ATLAS LVL2 Trigger Athens – HEP2003 Nikos Konstantinidis University College London.
Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October.
Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm:
Tracking at LHCb Introduction: Tracking Performance at LHCb Kalman Filter Technique Speed Optimization Status & Plans.
Optimising Cuts for HLT George Talbot Supervisor: Stewart Martin-Haugh.
Tracking at Level 2 for the ATLAS High Level Trigger Mark Sutton University College London 26 th September 2006.
Status of Reconstruction in CBM
Kati Lassila-Perini/HIP HIP CMS Software and Physics project evaluation1/ Electron/ physics in CMS Kati Lassila-Perini HIP Activities in the.
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
Commissioning and Performance of the CMS High Level Trigger Leonard Apanasevich University of Illinois at Chicago for the CMS collaboration.
Standalone FLES Package for Event Reconstruction and Selection in CBM DPG Mainz, 21 March 2012 I. Kisel 1,2, I. Kulakov 1, M. Zyzak 1 (for the CBM.
G. Volpi - INFN Frascati ANIMMA Search for rare SM or predicted BSM processes push the colliders intensity to new frontiers Rare processes are overwhelmed.
Normal text - click to edit HLT tracking in TPC Off-line week Gaute Øvrebekk.
Bangalore, India1 Performance of GLD Detector Bangalore March 9 th -13 th, 2006 T.Yoshioka (ICEPP) on behalf of the.
Fast Tracking of Strip and MAPS Detectors Joachim Gläß Computer Engineering, University of Mannheim Target application is trigger  1. do it fast  2.
Roberto Barbera (Alberto Pulvirenti) University of Catania and INFN ACAT 2003 – Tsukuba – Combined tracking in the ALICE detector.
ATLAS Trigger Development
Cellular Automaton Method for Track Finding (HERA-B, LHCb, CBM) Ivan Kisel Kirchhoff-Institut für Physik, Uni-Heidelberg Second FutureDAQ Workshop, GSI.
LCWS11 – Tracking Performance at CLIC_ILD/SiD Michael Hauschild - CERN, 27-Sep-2011, page 1 Tracking Performance in CLIC_ILD and CLIC_SiD e + e –  H +
Upgrade Letter of Intent High Level Trigger Thorsten Kollegger ALICE | Offline Week |
Development of the parallel TPC tracking Marian Ivanov CERN.
“MultiRing Almagest with GPU” Gianluca Lamanna (CERN) Mainz Collaboration meeting TDAQ WG.
1/13 Future computing for particle physics, June 2011, Edinburgh A GPU-based Kalman filter for ATLAS Level 2 Trigger Dmitry Emeliyanov Particle Physics.
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
A Fast Hardware Tracker for the ATLAS Trigger System A Fast Hardware Tracker for the ATLAS Trigger System Mark Neubauer 1, Laura Sartori 2 1 University.
B-Tagging Algorithms at the CMS Experiment Gavril Giurgiu (for the CMS Collaboration) Johns Hopkins University DPF-APS Meeting, August 10, 2011 Brown University,
SiD Tracking in the LOI and Future Plans Richard Partridge SLAC ALCPG 2009.
1 Reconstruction tasks R.Shahoyan, 25/06/ Including TRD into track fit (JIRA PWGPP-1))  JIRA PWGPP-2: Code is in the release, need to switch setting.
3 May 2003, LHC2003 Symposium, FermiLab Tracking Performance in LHCb, Jeroen van Tilburg 1 Tracking performance in LHCb Tracking Performance Jeroen van.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
FTK high level simulation & the physics case The FTK simulation problem G. Volpi Laboratori Nazionali Frascati, CERN Associate FP07 MC Fellow.
FNAL Software School Day 4 Matt Herndon, University of Wisconsin – Madison.
Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,
Pixel Phase 1 Upgrade Simulation: Technical Status and Plans (Fermilab) (For the Tracker Upgrade Simulations Group) Pixel Phase 1 Upgrade Simulation: Technical.
Triggering events with GPGPU in ATLAS
Using IP Chi-Square Probability
Parallel Plasma Equilibrium Reconstruction Using GPU
2018/6/15 The Fast Tracker Real Time Processor and Its Impact on the Muon Isolation, Tau & b-Jet Online Selections at ATLAS Francesco Crescioli1 1University.
Tracking for Triggering Purposes in ATLAS
Upgrade Tracker Simulation Studies
for the Offline and Computing groups
Particle detection and reconstruction at the LHC (IV)
Studies for Phase-II Muon Detector (|η| = ) – Plans
Architecture & Organization 1
PATATRACK Accelerated Pixel Tracks at the HLT starting from Run 3
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Spare Register Aware Prefetching for Graph Algorithms on GPUs
The LHC collider in Geneva
Architecture & Organization 1
Low Level HLT Reconstruction Software for the CMS SST
The LHCb Level 1 trigger LHC Symposium, October 27, 2001
Contents First section: pion and proton misidentification probabilities as Loose or Tight Muons. Measurements using Jet-triggered data (from run).
6- General Purpose GPU Programming
Presentation transcript:

New Track Seeding Techniques at the CMS experiment Felice Pantaleo CERN felice@cern.ch

Overview Motivations Heterogeneous Computing Track seeding on GPUs during Run-3 Run-2 Track Seeding Online Offline Conclusion

CMS High-Level Trigger in 2016 (1/2) Today the CMS online farm consists of ~22k Intel Xeon cores The current approach: one event per logical core Pixel Tracks are not reconstructed for all the events at the HLT This will be even more difficult at higher pile-up Combinatorial time in pixel seeding O(m!) in worst case More memory/event

CMS High-Level Trigger in 2016 (2/2) Today the CMS online farm consists of ~22k Intel Xeon cores The current approach: one event per logical core Pixel Tracks are not reconstructed for all the events at the HLT This will be even more difficult at higher pile-up Combinatorial time in pixel seeding O(m!) in worst case More memory/event full track reconstruction and particle flow e.g. jets, tau

CMS and LHC Upgrade Schedule pixel detector upgrade

Terminology - CMS Silicon Tracker Online: Pixel-only tracks used for fast tracking and vertexing Offline: Pixel tracks are used as seeds for the Kalman filter in the strip detector

Phase 1 Pixel detector (1/2) The already complex online and offline track reconstruction will have to deal not only with a much more crowded environment but also with data coming from a more complex detector.

Phase 1 Pixel detector (2/2) The already complex online and offline track reconstruction will have to deal not only with a much more crowded environment but also with data coming from a more complex detector.

Tracking at HLT Pixel hits are used for pixel tracks, vertices, seeding HLT Iterative tracking: Iteration name Phase0 Seeds Phase1 Seeds Target Tracks Pixel Tracks triplets quadruplets Iter0 Prompt, high pT Iter1 Prompt, low pT Iter2 doublets High pT, recovery

Pixel Tracks Evaluation of Pixel Tracks combinatorial complexity could easily be dominated by track density and become the bottleneck of the High-Level Trigger and offline reconstruction execution times. The CMS HLT farm and its offline computing infrastructure cannot rely anymore on an exponential growth of frequency guaranteed by the manufacturers. Hardware and algorithmic solutions have been studied

Heterogeneous Computing

CPU vs GPU architectures

CPU vs GPU architectures Large caches (slow memory accesses to quick cache accesses) Powerful ALUs Low bandwidth to memory (tens GB/s) In CMS: One event per thread Rested on our laurels, thanks to independency of events HLT/Reco limited by I/O Memory footprint a issue

CPU vs GPU architectures Many Streaming Multiprocessors execute kernels (aka functions) using hundreds of threads concurrently High bandwidth to memory (up to 1TB/s) Number of threads in-fly increases In CMS: Cannot make threads work on different events due to SIMT architecture Handle heterogeneity to assign the job to the best match Long term solution: unroll and offload combinatorics to many threads GPU

Track seeding on GPUs during Run-3

From RAW to Tracks during run 3 Profit from the end-of-year upgrade of the Pixel to redesign the seeding code Exploiting the information coming from the 4th layer would improve efficiency, b-tag, IP resolution Trigger avg latency should stay within 220ms Reproducibility of the results (bit-by-bit equivalence CPU-GPU) Integration in the CMS software framework Ingredients: Massive parallelism within the event Independence from thread ordering in algorithms Avoid useless data transfers and transformations Simple data formats optimized for parallel memory access Result: A GPU based application that takes RAW data and gives Tracks as result

Algorithm Stack Input, size linear with PU Raw to Digi Hits - Pixel Clusterizer Hit Pairs Ntuplets - Cellular Automaton Output, size ~linear with PU

Cellular Automaton (CA) The CA is a track seeding algorithm designed for parallel architectures It requires a list of layers and their pairings A graph of all the possible connections between layers is created Doublets aka Cells are created for each pair of layers (compatible with a region hypothesis) Fast computation of the compatibility between two connected cells No knowledge of the world outside adjacent neighboring cells required, making it easy to parallelize

CAGraph of seeding layers Seeding layers interconnections Hit doublets for each layer pair can be computed independently by sets of threads

CA: R-z plane compatibility The compatibility between two cells is checked only if they share one hit AB and BC share hit B In the R-z plane a requirement is alignment of the two cells: There is a maximum value of 𝜗 that depends on the minimum value of the momentum range that we would like to explore

CA: x-y plane compatibility In the transverse plane, the intersection between the circle passing through the hits forming the two cells and the beamspot is checked: They intersect if the distance between the centers d(C,C’) satisfies: r’-r < d(C,C’) < r’+r Since it is a Out – In propagation, a tolerance is added to the beamspot radius (in red) One could also ask for a minimum value of transverse momentum and reject low values of r’

Cells Connection blockIdx.x and threadIdx.x = Cell id in a LayerPair Each cell asks its innermost hits for cells to check compatibility with. blockIdx.y = LayerPairIndex [0,13)

Evolution If two cells satisfy all the compatibility requirements they are said to be neighbors and their state is set to 0 In the evolution stage, their state increases in discrete generations if there is an outer neighbor with the same state At the end of the evolution stage the state of the cells will contain the information about the length If one is interested in quadruplets, there will be surely one starting from a state 2 cell, pentuplets state 3, etc.

T=2 T=1 T=0 24

Quadruplets finding blockIdx.x and threadIdx.x = Cell id in an outermost LayerPair Each cell on. an outermost layer pair will perform a DFS of depth = 4 following inner neighbors. blockIdx.y = LayerPairIndex in ExternalLayerPairs

Triplet Propagation Propagate 1-2-3 triplet to 4th layer and search for compatible hits Natural continuation of the current approach from pairs to triplets

Simulated Physics Performance PixelTracks CA tuned to have same efficiency as Triplet Propagation Efficiency significantly larger than 2016, especially in the forward region (|η|>1.5).

Simulated Physics Performance PixelTracks Fake rate up to 40% lower than Triplet Propagation Two orders of magnitudes lower than 2016 tracking thanks to higher purity of quadruplets wrt to triplets

Timing The physics performance and timing justified the integration of the Cellular Automaton in its sequential implementation at the HLT already in 2017 Hardware: Intel Core i7-4771@3.5GHz , NVIDIA GTX 1080 Algorithm time per event [ms] 2016 Pixel Tracks 29.3 ± 13.1 Triplet Propagation 72.1 ± 25.7 GPU Cellular Automaton 1.2 ± 0.9 CPU Cellular Automaton 14 ± 6.2

Cellular Automaton @ Run-2 Offline Track Seeding

CA in offline tracking The performance of the sequential Cellular Automaton at the HLT justified its integration also in the 2017 offline iterative tracking

CA in offline tracking The performance of the sequential Cellular Automaton at the HLT justified its integration also in the 2017 offline iterative tracking

CA Physics performance vs 2016 Reconstruction efficiency increased especially in forward region. Fake rate significantly reduced in the entire pseudo-rapidity range

CA Physics performance vs conventional Overall track reconstruction efficiencies are very similar Fake rate lower up to about 40% when transitioning from the barrel to the forward (|η|>1.5).

CA Physics performance vs PU Efficiency slightly affected by the increasing pile-up More fake tracks pass the track selection and their hits get removed from the pool of available hits Fake rate ~quadratic dependency from PU

Timing vs PU CA track seeding at same level of the 2016 seeding More robust, smaller complexity vs PU than 2016 track seeding despite the increased number of layer combinations involved in the seeding phase with respect to the 2016 seeding In pattern recognition, the CA seeding brings no additional gain ~20% faster track reconstruction wrt to 2016 tracking at avg PU70

Conclusion Pixel Track seeding algorithms have been redesigned with high-throughput parallel architectures in mind Improvements in performance may come even when running sequentially Factors at the HLT, tens of % in the offline, depending on the fraction of the code that use new algos CA algorithms with additional graph knowledge capability are very powerful By adding more Graph Theory sugar, steal some work from the track building and become more flexible The GPU and CPU algorithms run in CMSSW and produce the same bit-by-bit result Transition to GPUs@HLT during Run3 smoother Running Pixel Tracking at the CMS HLT for every event would become cheap @PU ~ 50 – 70 Integration in the CMS High-Level Trigger farm under study

Backup felice@cern.ch

Impact parameters resolution The 2017 detector shows better performance than 2016 over all the η spectrum.

Transverse momentum resolution The performance between 2016 and 2017 is comparable.