An online silicon detector tracker for the ATLAS upgrade

Slides:



Advertisements
Similar presentations
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Advertisements

JLab High Resolution TDC Hall D Electronics Review (7/03) - Ed Jastrzembski.
Track Trigger Designs for Phase II Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen, S.X. Wu, (Boston.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Track quality - impact on hardware of different strategies Paola FTK meeting Performances on WH and Bs   2.Now we use all the layers.
Using the Trigger Test Stand at CDF for Benchmarking CPU (and eventually GPU) Performance Wesley Ketchum (University of Chicago)
FTK poster F. Crescioli Alberto Annovi
Straw electronics Straw Readout Board (SRB). Full SRB - IO Handling 16 covers – Input 16*2 links 400(320eff) Mbits/s Control – TTC – LEMO – VME Output.
The GANDALF Multi-Channel Time-to-Digital Converter (TDC)  GANDALF module  TDC concepts  TDC implementation in the FPGA  measurements.
U N C L A S S I F I E D FVTX Detector Readout Concept S. Butsyk For LANL P-25 group.
S.Veneziano – INFN Roma July 2003 TDAQ week CMA LVL1 Barrel status ATLAS TDAQ week July 2003.
Technical Part Laura Sartori. - System Overview - Hardware Configuration : description of the main tasks - L2 Decision CPU: algorithm timing analysis.
11th March 2008AIDA FEE Report1 AIDA Front end electronics Report February 2008.
NA62 Trigger Algorithm Trigger and DAQ meeting, 8th September 2011 Cristiano Santoni Mauro Piccini (INFN – Sezione di Perugia) NA62 collaboration meeting,
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
Serial Data Link on Advanced TCA Back Plane M. Nomachi and S. Ajimura Osaka University, Japan CAMAC – FASTBUS – VME / Compact PCI What ’ s next?
G. Volpi - INFN Frascati ANIMMA Search for rare SM or predicted BSM processes push the colliders intensity to new frontiers Rare processes are overwhelmed.
ATLAS Trigger Development
2001/02/16TGC off-detector PDR1 Sector Logic Status Report Design Prototype-(-1) Prototype-0 Schedule.
1 FTK AUX Design Review Functionality & Specifications M. Shochet November 11, 2014AUX design review.
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
A Fast Hardware Tracker for the ATLAS Trigger System A Fast Hardware Tracker for the ATLAS Trigger System Mark Neubauer 1, Laura Sartori 2 1 University.
System Demonstrator: status & planning The system demonstrator starts as “vertical slice”: The vertical slice will grow to include all FTK functions, but.
Calliope-Louisa Sotiropoulou C OGNITIVE I MAGING U SING FTK H ARDWARE M EETING ON M EDICAL I MAGING.
Associative Memory design for the Fast Track processor (FTK) at Atlas I.Sacco (Scuola Superiore Sant’Anna) On behalf Amchip04 project (A. Annovi, M. Beretta,
PRM for AM06 Daniel Magalotti Collaboration between: KIT, INFN Pisa and INFN Perugia.
Future evolution of the Fast TracKer (FTK) processing unit C. Gentsos, Aristotle University of Thessaloniki FTK FP7-PEOPLE-2012-IAPP FTK executive.
Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,
Grzegorz Kasprowicz1 Level 1 trigger sorter implemented in hardware.
Summary of IAPP scientific activities into 4 years P. Giannetti INFN of Pisa.
GUIDO VOLPI – UNIVERSITY DI PISA FTK-IAPP Mid-Term Review 07/10/ Brussels.
Alberto Stabile 1. Overview This presentation describes status of the research and development of main boards for the FTK project. We are working for.
Eric Hazen1 Ethernet Readout With: E. Kearns, J. Raaf, S.X. Wu, others... Eric Hazen Boston University.
The AMchip on the AMBoard Saverio Citraro PhD Student University of Pisa & I.N.F.N. Pisa.
MADEIRA Valencia report V. Stankova, C. Lacasta, V. Linhart Ljubljana meeting February 2009.
Outline The Pattern Matching and the Associative Memory (AM)
Firmware development for the AM Board
The Associative Memory Chip
IAPP - FTK workshop – Pisa march, 2013
FTK: update on progress, problems, need
Saverio Citraro PhD Student University of Pisa & I.N.F.N. Pisa
ATLAS calorimeter and topological trigger upgrades for Phase 1
LAMB: Hardware & Firmware
Pulsar 2b AMchip05-based Pattern Recognition Mezzanine
FTK infrastructure in USA15 Status and plans for 2015 A
Project definition and organization milestones & work-plan
Production Firmware - status Components TOTFED - status
AM system Status & Racks/crates issues
2018/6/15 The Fast Tracker Real Time Processor and Its Impact on the Muon Isolation, Tau & b-Jet Online Selections at ATLAS Francesco Crescioli1 1University.
Pending technical issues and plans to address and solve
SLP1 design Christos Gentsos 9/4/2014.
Optical data transmission for
Overview of the ATLAS Fast Tracker (FTK) (daughter of the very successful CDF SVT) July 24, 2008 M. Shochet.
CoBo - Different Boundaries & Different Options of
Assembly Language for Intel-Based Computers, 5th Edition
Data Acquisition System for Gamma-Gamma Physics at KLOE
CMS EMU TRIGGER ELECTRONICS
Architecture & Organization 1
FTK variable resolution pattern banks
FPGA Implementation of Multicore AES 128/192/256
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Architecture & Organization 1
Example of DAQ Trigger issues for the SoLID experiment
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
The LHCb Level 1 trigger LHC Symposium, October 27, 2001
PID meeting Mechanical implementation Electronics architecture
SVT detector electronics
♪ Embedded System Design: Synthesizing Music Using Programmable Logic
Preliminary design of the behavior level model of the chip
Course Code 114 Introduction to Computer Science
Presentation transcript:

An online silicon detector tracker for the ATLAS upgrade - FTK reconstructs charged particle trajectories in the silicon tracker (Pixel & SCT) at “1.5” trigger level. - Extremely difficult task 100 kHz event rate ~70 pile-up events at top luminosity.

The Tracker: the most computing demanding reconstruction ATLAS ATLAS Silicon Tracker Pix2 Pix1 Pix0 SCTs Hard scattering!

FTK: Full event tracking in 2 steps Find coarse Roads first (Pattern Matching with Associative Memory, AM) From those Roads extract the full resolution Hits Combine the Hits to form Tracks, calculate χ2 and parameters (Track Fitter) SSIDs Pattern matching (Associative Memory - AM) Data Organizer (DO) Hits Roads Roads + hits Track Fitter (TF) Tracks parameters (d, pT, , h, z) Super Strip (SS) Track fitting using full resolution of the detector

FTK architecture 8x128 Slinks 8x128 Slinks @6Gb/s @6Gb/s ~800 GB/s ~380 ROLs @2 Gb/s 100 GB/s 8x128 Slinks @6Gb/s ~800 GB/s 8x128 Slinks @6Gb/s ~800 GB/s Only one AMB: 750 Slinks ~200 GB/s Only AMBs: ~25 TB/s

AM chip working principle 1 COMPARATOR Between the BUS & 1 stored 16-bits-WORD Pattern AM chip consumption: ~ 2.5 W for 128 kpatterns AMchip computing power: Each pattern, each 10 ns : 4x32 bits comparisons 128 kpat x4 = 500 K comparisons → 500 K x 100 M/s = 50x106 MIPs/CHIP Only comparisons! 8000 Amchips 4x1011 MIPs + 128 k coincidences w majority/chip + readout logic HIT HIT HIT HIT AM Memory Accesses: 128 k x 4x32 bits x 100 M/s= 1.6x1015 accesses/s per chip

Idea: serial link to transmit the data Recent developments of the Associative Memory chip AM chip 04: - Package: PQ208 - Parallel I/O interface - 8k patterns - Crazy routing of LAMB More performance needed with the new AMChip06 - Increase the number of pads. - Use BGA package for more pins - Simplify LAMB routing Idea: serial link to transmit the data

Associative Memory Chip - Family 05 We bought a IP-CORE to provide the chip with serialisers and deserialisers. MiniAMChip05 AMChip05 Package: QFN 64 Die: 3.7 mm^2 Board: MiniLAMB-SLP Status: under test Package: BGA 23 x 23 mm Die: 12 mm^2 Board: LAMB-SLP Status: submitted

LAMB-SLP - The final LAMB-SLP board will be ready in short time. - In one LAMB-SLP board will have 16 Amchips. → for a total of ~2 M of patterns - The routing is simplified. - This is a prototype of LAMB-SLP board with 4 MiniAM chips. → This board is important to confirm our idea and our solution on Serial Link.

MiniLAMB-SLP & LAMB-SLP - The final LAMB-SLP board will be ready in short time. - In one LAMB-SLP board will have 16 Amchips. → for a total of ~2 M of patterns - The routing is simplified. - This is a prototype of LAMB-SLP board with 4 MiniAM chips. → This board is important to confirm our idea and our solution on Serial Link.

AMB-SLP board The AMBSLP (Serial Link Processor) board design: Interface: - 12 input buses @ 24Gbps - 16 output buses @ 32Gpbs Three FPGA: - 1 for the input data distribution (ARTIX-7) - 1 for the output data distribution (ARTIX-7) - 1 FPGA for the data control logic (SPARTAN 6) We used only serial standard for data distribution to and from the AM chips

Test Stand Complete test with: AMB-SLP board. MiniLAMB-SLP board. Crate VME. CPU TDAQ 4. Strobe - We perform a Serial Link's test with a PRBS Generator. - We used a IBERT core in Xilinx's ISE.

Serial Links Both input and output serial links characterized for signal integrity. AM CHIP - Serial Link @ 2 Gb/s → Input path FPGA to FANOUT to AM Chip Intermediate buffers → Output path AM Chip to FPGA Intermediate repeater IN OUT

Result - Types of measure: .Jitter Analysis .BER .Eye diagram - Send PRBS data: PRBS checker BER < 10-14

Future Evolution: integration of many FTK functions in a small form factor Main goal is to integrate the FTK system in a more compact form First step will be to connect an AMChip, an FPGA and a RAM in a prototype board In the future the devices could be merged in a single package (AMSiP) That AMSiP will be the building block of the new processing unit, to be assembled in an ATCA board with new mezzanines AMSiP

Project goals: flexibility The main target will be the ATLAS detector L1 trigger (for the PhaseII upgrade), but for that development phase we will keep the FTK detector layout, that is 5 SCT and 3 pixel layers The end architecture should be flexible, and the resulting system easy to reprogram to target other applications (to be studied in the IAPP project) such as: Machine vision for embedded systems: low power edge detection, object detection Medical imaging applications Coprocessor for various High Performance Computing applications

FPGA firmware - overview Full resolution stored in a smart DB, while SSID of each hit is sent to the AM Database retreives all hits for each SSID of the detected Roads A very fast full resolution fit it is done for each possible track, fits are accepted or rejected according to x2 value AM performs pattern recognition and the ID value for each road (RoadID) Combiner unit computes all possible permutations of the hits to form tracks The RoadIDs get decoded in SSIDs for each layer, using an external RAM

Data Organizer Data Organizer

Data Organizer The hits are stored, in the order in which they arrive, in the hitmem The HLP keeps track of the address of the first hit of each SSID The nextmem holds, for each hit address, the location of the next hit of the same SSID Hits are cloned in two dual-port memories, reading uses 4 memory ports Max. No of Hits per event, no other restrictions on the input

Track Fitting Track Fitting

Track Fitting Track helix parameters and 2 can be extracted from linear equations in the local silicon hit coordinates The resolution of the linear fit is close to that of the full helical fit in a narrow region (sector) of the detector pi’s are the helix parameters and 2 components. xj’s are the hit coordinates in the silicon layers. aij & bi are stored constants from full simulation or real data tracks. The range of the linear fit is a “sector” which consists of a single silicon module in each detector layer. Using FPGA DSPs, very good performance can be achieved 5sct+3pix*2=11 11 coordinate Space to 5 dimensional surface

Track Fitting - Combiner Best fit is selected, others are discarded Visualization of the combiner function on an example Road

Track Fitting - Core Very fast FPGA implementation was developed for the fitter All multiplications are executed in parallel, giving 1 fit per clock Using dedicated DSP resources, the frequency of the fitter is 550MHz 4 such fitters run in parallel in the device

FPGA implementation The main components of the design are already implemented Placement on the device was made for estimating true achievable clock rates and the power dissipation Target device is XC7K325T-900ffg-3 To achieve high clock rates, careful RTL coding and advanced design techniques must be followed Pipelining of control signals and memory buses Careful fan-out control for many signals Manual CLB resource instantiation Manual floorplanning of key components Utilization of dedicated routing wherever possible Local clock buffers for 550MHz areas

FPGA implementation (device floorplanning view) Track Fitter instances Local clock routing for the Track Fitters

Device utilization & power Power estimation of 15.5W is the absolute worst-case figure Simple improvements to the design are expected to reduce this by 15% If the design is migrated to a newer Xilinx UltraScale device an additional 10-15% reduction is expected

Speed and latency @550MHz 450-1800 Mhits per layer @450MHz 2200Mfits 50MRoads 576MBit RLDRAM3 57.6Gb/s 10ns tRC Units are per second

Speed and minimum latency ~10ns ~70ns ~50ns System latency (from last hit to first computed parameters) <0.3us Figures represent latency from last incoming hit to first output track ~40ns Speed and latency are data dependent, further system simulation needed for precision There are ideas to further improve performance, if needed

Sample event processing time In a typical event there are 500 hits/layer 50 roads are assumed to be produced by the AM 4 layers are assumed to have 1 hit/road, the rest 4 to have 2hits/road each Event processing time with current AMChip 5us (SSIDs to AM) 1us (roads from AM) > 0.36us (processing time for all the roads) 0.17us (latency from RAM, DO, Combiner, TF) Total: 6.17us, which is close to 6us which is the limit for L1 tracking Event processing time after a reasonable AMChip upgrade 2.5us (SSIDs to AM) 0.25us (roads from AM) < 0.36us (processing time for all the roads) Total: 3us, which is half the L1 limit, lots of headroom for bigger events

Conclusions Main functions already implemented in the FPGA device Speed and latency figures of the system are promising Simulation data for the L1 tracker is essential in proper testing and evaluation of system speed Versatility of the design allows for quick modifications to upgrade performance in case the speed is not enough

Thank you!