Presentation is loading. Please wait.

Presentation is loading. Please wait.

L1 track trigger and pattern recognition applications Giovanni Punzi University & INFN-Pisa Thanks to: A. Annovi, F. Palla IFD2015 Torino, December 15-17,

Similar presentations


Presentation on theme: "L1 track trigger and pattern recognition applications Giovanni Punzi University & INFN-Pisa Thanks to: A. Annovi, F. Palla IFD2015 Torino, December 15-17,"— Presentation transcript:

1 L1 track trigger and pattern recognition applications Giovanni Punzi University & INFN-Pisa Thanks to: A. Annovi, F. Palla IFD2015 Torino, December 15-17, 2015

2 A summary view of Data Processing in HEP [S. Cittolin, Phil. Trans. R. Soc. A 2012 370] Experiment generations TIME LH C Tev-II Tev-I

3 A summary view of Data Processing in HEP This is NOT the full rate ! >10 2 reduction by Level-1 pre- selection Going to be more of a challenge in future [S. Cittolin, Phil. Trans. R. Soc. A 2012 370]

4 Level-1 traditionally based on simple quantities, that can be calcalculated fast, to give an easy and cheap way to reduce rate. (“The HIGH-PT paradigm”: look only at the hardest events) Future: More complex physics, ”precision physics”, and more events at the same time  No more an easily-extracted, smaller portion of the event data that can be used to reduce data for a more detailed processing later. – Examples:...LHCb has signal events at every collisions, CMS need to reduce data from the tracker even just to read it out... – In the future, all SM physics will look like “low-Pt physics” At FCC, the rate of top events will be 3kHz... → will need to actually process all data from each crossing. with larger event size... → LHCb upgrade stays at L~10 33 and is based on HLT only → CMS and ATLAS coping with HL by developing real-time tracking systems intended to operate at level 1 – this is tough ! HEP Trends, and Level-1

5 Track reconstruction by pattern-matching The fastest approach to tracking used up to now is direct parallel matching to a bank of stored templates: solve the hard combinatorial problem Use of custom ASICs implementing content-addressable memory (Associative Memory, [NIM A278, (1989), 436- 440]). An all-INFN product ! First large system to use this method has been CDF at the Tevatron: SVT capable of quality tracking in ~10µs at 30kHz, used for Flavor-physics applications (hadronic HF trigger) This same approach with updated hardware, is continuing in current FTK for ATLAS L2, and in the planned Phase 2 upgrade (joint CMS-ATLAS effort in CSN1: RD_PHASE2 )

6 PATTERN BANK 1234 … Track parameters found in a 2 nd step by linearized fitting - more sequential, but quite fast if used enough AM cells in the first stage (FPGA, DSP…) A pattern is a sequence of hits in the different layers, represented by coordinates. A particle trajectory is a specific sequence of hits. Hit are read out sequentially, and compared in parallel to a set of pre-calculated “track patterns” - NO combinatorics. Based on custom ASIC Matched patterns queued to output. Track reconstruction by pattern-matching using “Associative Memory”

7 F. Palla INFN Pisa AM chips status and prospects State of the art and current R&D -INFN/IN2P3 65 nm AM05 (3k patterns) in hand -Variable-resolution AM pattern matching: equivalent to 5x increase at ~10% area cost) [A. Annovi, et al. doi:10.1109/ANIMMA.2011.6172856] -AM06 (128k patterns), Early 2016. 8 input, 1 output serial lines@2 Gbps, 100 MHz Sufficient to test latency and projections -Started R&D for 28 nm: 0.5M pattern, 200+MHz speed -Also, first prototypes (4k pattern, 130 nm) of 3D AM chips from Fermilab (VIPRAM) JINST 10 (2015) 02, C02029 7

8 AMchip 28nm AM07a First 28nm AMchip (AM07a) submitted July Received and ready for tests area 0.6mm 2 a fraction of a MPW (6mm 2 ) Purpose test performance of a first AM cell at 28 nm Validate HW vs simulation Design next AM cell version with known silicon results A XOR-based Associative Memory Block in 28 nm CMOS for Interdisciplinary Applications submitted to IEEE ICECS Dec 4, 2015Alberto Annovi8 AM07a

9 Goals of L1 Track Triggers in Phase 2 Keep selectivity on basic objects (leptons, jets, taus, b-jets, MET) -Made difficult by the increased level of pileup events (average ~140) o Huge rate of µ from heavy flavors  need better p T resolution from tracker o Prompt electrons at L1 need to be separated from huge γ  Tracker tracks o High E T jets from (many) different primary vertices  need jet-vertex association o Photon isolation in Calorimeters compromised by large pileup  use tracks 9

10 ATLAS phase-II Track Trigger TARGET: Move L2 tracking (FTK) earlier in the trigger selection Regional tracking: -Input event rate 1 MHz (L1 input). 30µs max latency. -pT > 4 GeV, |η|<4 -Example usage: electrons, muons, hadronic taus Full scan tracking -Input event rate 100 kHz (After L1) -pT > 1 GeV, |η|<4 -Example usage: jet, MET, hadronic taus 4 pixel layers and 5*2 strip layers -(ITk layout being optimized) Rough occupancy numbers (challenging !) -O(80k) hits / event in pixel layers -O(25k) hits / event in strip layers AM patterns needed: 10^9 (FTK)  10^10 (Phase-2) 10 CERN-LHCC-2015-020 LHCC-G-166

11 Associative Memory Tracking Processor for 1st stage processing 11 ATCA FTK Data Formatter  AMTP http://www-ppd.fnal.gov/ATCA/ future AM chip FTK MA05 in the figure https://indico.cern.ch/event/299180/session/11/contribution/38/material/poster/0.pdf x16 Pattern recognition mezzanine (PRM) 16 AM chips 1 FPGA RAM(s)

12 ATLAS Processing model AMTP Receive ITk inputs Distribute ITk hits to other AMTPs and PRMs PRM: AM pattern reco. PRM: 1 st step track fit SSB track distribution TFM: 2 nd step track fit ITK inputs ITK inputs AMTP-AMTP exchange to/from other shelves AMTP-AMTP exchange to/from other shelves ATCA Full mesh data exchange AMTP-AMTP AMTP-SSB SSB-SSB AMTP PRM TFM SSB 1 st stage trks Output tracks SSB – SSB to/from other shelves ITk inputs (2 nd stage layers)

13 F. Palla INFN Pisa L1 with CMS Upgraded Tracker13 PS (Pixel-Strip) Pt modules 2S (Strip-Strip) Pt modules Pixel Outer Tracker Better resolution, less material L1 Latency 10 µs (20 µs in option) L1A rate ≤1 MHz HLT rate ≤10 kHz 7004 PS modules (60% in the barrel) 8344 2S modules (50% in the barrel) Level-1 trigger/readout scheme

14 The Importance of Geometry 14 Es. CMS 2S(trip) sensor modules 100 µm x 5 cm long strips on both sensors Readout by 8 chips (CBC) on each side Require coincidence between sensor planes (stub) Local Pt threshold saves big on data transport CBC2 architecture 10 cm x 10 cm 5 cm

15 Data organization and dispatch Subdivide tracker into trigger towers CMS: 8(r-ϕ) x 6(r-z) trigger sectors (some 10% overlapping) Each sector ~200 stubs on average; tails up to ~500 stubs/event in 140 evts pileup+ttbar (to be compared with ATLAS-Phase 1 ~2000) About 600 Gb/s per trigger tower Send data to Track-finding processors Full mesh ATCA shelfs Capable of “40G” full-mesh backplane on 14 slots = 7.2 Tb/s Several options being investigated, all include time multiplexing data transfer from a set of receiving processors boards to pattern recognition and track finding engines keep latency < 5 µs, including pattern recognition and track fitting 15 F. Palla INFN Pisa

16 Time multiplexing approach 16 Ten processors send data to target processor blade in round robin scheme. Each blade will have a few mezzanines to handle multiple events. Does not need to wait for last stub to start track finding.

17 16 AM chip FMC (INFN) FTK AM-chips AM05: 2k patt AM06: 128k patt. 1AM chip + 1 AM-FPGA FMC (FNAL) Pulsar 2b (FNAL) Pattern Recognition Board & Data Emulation Tracker Data Emulation L1 Track Trigger Tower FNAL ProtoVIPRAM02 16k patt CMS trigger tower (demonstrator) Final system: 4 M patterns/tower = Total 200M

18 First work in this direction in year 2000 [L. Ristori, “An Artificial retina for Fast Track Finding” NIM A453 (2000) 425-429] (historical reason for the name, although today we believe most of this processing actually happens in the primary visual cortex areas) Divide parameter space in cells, each performing a weighted sum of hit signals A valid track appears as a cluster of cell responses – parameters can then be determined by interpolation of nearby cells. Mathematically related to “Hough transform” [P.V.C. Hough, Conf.Proc. C590914 (1959) 554] – but the actual point is architectural implementation R&D: New approach to pattern-matching: “Retina Algorithm”

19 CSN5 “RETINA” project https://web2.infn.it/RETINA Motivation and advantages of retina approach Inspired by neural structure of natural vision (receptive fields) Still pattern-matching, but important differences with AM: -Hit processing in AM cells still happens serially, while in the visual system only relevant data reaches a cell. This allows processing power to be spread over a network, and is faster. -The AM has “rigid templates” with yes/no response, while the brain works by interpolation of analog responses. This saves internal storage and makes it easier to deal with missing hits. Goal is to approach the performance of natural vision (~25 clock cycles/image) and be able to take HL-LHC tracking at full 40MHz speed, with no time-multiplexing or data simplification. -Particularly important for “Low-Pt” @HL - but not only

20 Cellular Engines switching network Fitter Tracking layers Separate trigger-DAQ path Custom switching network delivers hits to appropriate cells Data organized by cell coordinates Blocks of cellular processors Track finding and parameter determination To DAQ Architecture of RETINA processor

21 o Hits must be delivered only to the cells that need them (there can be more than one) o Switch network “knows” where to deliver hits o All information embedded in the network via distributed LUTs Data processing happens while data is being moved - not afterwards Hit delivery via programmable switch logic Large internal bandwith needs (Tb/s)  helped by progress of telecom technology (fast FPGA)

22 Each node: - Performs calculation of weights for a hit into a cell - Handles time-skew between events (can have several) In second stage: - Deals with surrounding cells → local clustering → possibly local fitting -Queues results to output All the above happens in pipeline without stops (data-flow)  full exploitation of available gates, large throughput Cellular computing engine working principle

23 A Time between hit delivery and accumulator update B Time between end sequence and accumulator output Module B A B EndEvent AccumulatorsOutput Hit sequence Start AccumulatorsFilling 23 Processing time depends only on # of hits in the event - Results always available after fixed number of cycles Turns out ~20 clock cycles are sufficient Require 1 – 5 kLE of logic → O(10 3 ) cells/average FPGA A typical tracking system (tower) may need just O(10 5 ) cells Can build tracker with O(100) of today’s midsize FPGAs for L=10 33 Newer FPGA (and ASICs) can further improve RETINA Timing simulation on current FPGAs (ALTERA Stratix)  A promising R&D for the future Simulation showed high-quality tracking at L=2*10^33 of full event, in forward detector with low B field. [CERN-LHCb-PUB-2014-02]

24 Summary Future HEP experiments will increasingly depend on large processing power at the earliest possible stage A key enabler of progress will be the continuing development of technology of real-time reconstruction by special-purpose processors for Level-1 pattern recognition My personal (SciFi) view of future (see my past WN talks): detector-embedded reconstruction (in 4D !)

25 F. Palla INFN Pisa CMS System dimensioning Pattern matching (optimization ongoing) Fixed super-strip (32 strips each) size for all layers: ~4 M patterns/tower Unique roads fired per trigger ~<50 @ PU 200 and 3 GeV threshold Efficiency (µ, electrons)~99% Purity of stubs after AM filtering ~60% Further ~30% gain from stub pT info 25


Download ppt "L1 track trigger and pattern recognition applications Giovanni Punzi University & INFN-Pisa Thanks to: A. Annovi, F. Palla IFD2015 Torino, December 15-17,"

Similar presentations


Ads by Google