Future evolution of the Fast TracKer (FTK) processing unit C. Gentsos, Aristotle University of Thessaloniki FTK 324318 FP7-PEOPLE-2012-IAPP FTK executive.

Slides:

Advertisements

Similar presentations

System Integration and Performance

Advertisements

Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.

JLab High Resolution TDC Hall D Electronics Review (7/03) - Ed Jastrzembski.

Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Super Fast Camera System Performed by: Tokman Niv Levenbroun Guy Supervised by: Leonid Boudniak.

Moving NN Triggers to Level-1 at LHC Rates Triggering Problem in HEP Adopted neural solutions Specifications for Level 1 Triggering Hardware Implementation.

Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Part A Final Presentation.

Status of FTK simulation June 16,2005 G. Punzi, Pisa.

Yu. Artyukh, V. Bespal’ko, E. Boole, V. Vedin Institute of Electronics and Computer Science Riga, LATVIA 16th International Workshop on Laser.

GPGPU platforms GP - General Purpose computation using GPU

Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.

Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.

FTK poster F. Crescioli Alberto Annovi

Calliope-Louisa Sotiropoulou on behalf of the FTK Pixel Clustering Team A H IGH P ERFORMANCE M ULTI -C ORE FPGA I MPLEMENTATION FOR 2D P IXEL C LUSTERING.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Straw electronics Straw Readout Board (SRB). Full SRB - IO Handling 16 covers – Input 16*2 links 400(320eff) Mbits/s Control – TTC – LEMO – VME Output.

Electronics for PS and LHC transformers Grzegorz Kasprowicz Supervisor: David Belohrad AB-BDI-PI Technical student report.

Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.

The GANDALF Multi-Channel Time-to-Digital Converter (TDC)  GANDALF module  TDC concepts  TDC implementation in the FPGA  measurements.

SVT workshop October 27, 1998 XTF HB AM Stefano Belforte - INFN Pisa1 COMMON RULES ON OPERATION MODES RUN MODE: the board does what is needed to make SVT.

Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.

Microcontroller Presented by Hasnain Heickal (07), Sabbir Ahmed(08) and Zakia Afroze Abedin(19)

U N C L A S S I F I E D FVTX Detector Readout Concept S. Butsyk For LANL P-25 group.

S.Veneziano – INFN Roma July 2003 TDAQ week CMA LVL1 Barrel status ATLAS TDAQ week July 2003.

K.C.RAVINDRAN,GRAPES-3 EXPERIMENT,OOTY 1 Development of fast electronics for the GRAPES-3 experiment at Ooty K.C. RAVINDRAN On Behalf of GRAPES-3 Collaboration.

PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.

11th March 2008AIDA FEE Report1 AIDA Front end electronics Report February 2008.

The CDF Online Silicon Vertex Tracker I. Fiori INFN & University of Padova 7th International Conference on Advanced Technologies and Particle Physics Villa.

NA62 Trigger Algorithm Trigger and DAQ meeting, 8th September 2011 Cristiano Santoni Mauro Piccini (INFN – Sezione di Perugia) NA62 collaboration meeting,

1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.

AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.

Xiangming Sun1PXL Sensor and RDO review – 06/23/2010 STAR XIANGMING SUN LAWRENCE BERKELEY NATIONAL LAB Firmware and Software Architecture for PIXEL L.

Chapter 4 MARIE: An Introduction to a Simple Computer.

G. Volpi - INFN Frascati ANIMMA Search for rare SM or predicted BSM processes push the colliders intensity to new frontiers Rare processes are overwhelmed.

1 07/10/07 Forward Vertex Detector Technical Design – Electronics DAQ Readout electronics split into two parts – Near the detector (ROC) – Compresses and.

80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.

A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.

KLM Trigger Status Barrel KLM RPC Front-End Brandon Kunkler, Gerard Visser Belle II Trigger and Data Acquistion Workshop January 17, 2012.

ATLAS Trigger Development

2001/02/16TGC off-detector PDR1 Sector Logic Status Report Design Prototype-(-1) Prototype-0 Schedule.

1 FTK AUX Design Review Functionality & Specifications M. Shochet November 11, 2014AUX design review.

DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:

1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.

LHCb upgrade Workshop, Oxford, Xavier Gremaud (EPFL, Switzerland)

A Fast Hardware Tracker for the ATLAS Trigger System A Fast Hardware Tracker for the ATLAS Trigger System Mark Neubauer 1, Laura Sartori 2 1 University.

Calliope-Louisa Sotiropoulou C OGNITIVE I MAGING U SING FTK H ARDWARE M EETING ON M EDICAL I MAGING.

Associative Memory design for the Fast Track processor (FTK) at Atlas I.Sacco (Scuola Superiore Sant’Anna) On behalf Amchip04 project (A. Annovi, M. Beretta,

PRM for AM06 Daniel Magalotti Collaboration between: KIT, INFN Pisa and INFN Perugia.

Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,

GUIDO VOLPI – UNIVERSITY DI PISA FTK-IAPP Mid-Term Review 07/10/ Brussels.

Alberto Stabile 1. Overview This presentation describes status of the research and development of main boards for the FTK project. We are working for.

Enhancement Presentation Carlos Abellan Barcelona September, 9th 2009.

MADEIRA Valencia report V. Stankova, C. Lacasta, V. Linhart Ljubljana meeting February 2009.

Firmware development for the AM Board

Wu, Jinyuan Fermilab May. 2014

IAPP - FTK workshop – Pisa march, 2013

Pulsar 2b AMchip05-based Pattern Recognition Mezzanine

An online silicon detector tracker for the ATLAS upgrade

2018/6/15 The Fast Tracker Real Time Processor and Its Impact on the Muon Isolation, Tau & b-Jet Online Selections at ATLAS Francesco Crescioli1 1University.

Pending technical issues and plans to address and solve

SLP1 design Christos Gentsos 9/4/2014.

Overview of the ATLAS Fast Tracker (FTK) (daughter of the very successful CDF SVT) July 24, 2008 M. Shochet.

Instructor: Dr. Phillip Jones

FTK variable resolution pattern banks

FPGA Implementation of Multicore AES 128/192/256

Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Interfacing Memory Interfacing.

Timing Analysis 11/21/2018.

SVT detector electronics

♪ Embedded System Design: Synthesizing Music Using Programmable Logic

Presentation transcript:

Future evolution of the Fast TracKer (FTK) processing unit C. Gentsos, Aristotle University of Thessaloniki FTK FP7-PEOPLE-2012-IAPP FTK executive board 21/7/2014 1

Presentation Overview 2  Key Fast TracKer (FTK) components  Goal of my work  FPGA firmware  FPGA Device utilization and power  System latency and processing speed  Progress

Project goals: integration of many FTK functions in a small form factor  Main goal is to integrate the FTK system in a more compact form  First step will be to connect an AMChip, an FPGA and a RAM in a prototype board  In the future the devices could be merged in a single package (AMSiP)  That AMSiP will be the building block of the new processing unit, to be assembled in an ATCA board with new mezzanines 3 AMSiP

Project goals: flexibility  The main target will be the ATLAS detector L1 trigger (for the PhaseII upgrade), but for that development phase we will keep the FTK detector layout, that is 5 silicon strip detector layers and 3 pixel detector layers  The latency requirements are very demanding, leaving just 8μs for the L1 tracking  The final architecture should be flexible, and the resulting system easy to reprogram to target other applications (to be studied in the IAPP project) such as:  Machine vision for embedded systems: low power edge detection, object detection  Medical imaging applications  Coprocessor for various High Performance Computing applications 4

FPGA firmware - overview Full resolution stored in a smart DB, while SuperStrip ID of each hit is sent to the AM AM performs pattern recognition and returns the ID value for each road (RoadID) The RoadIDs get decoded to SuperStrip IDs for each layer, using an external RAM Database retreives all hits for each SuperStrip ID of the detected Roads Combiner unit computes all possible permutations of the hits to form tracks A very fast full resolution fit is done for each possible track, fits are accepted or rejected according to χ 2 value 5 Write part −−−−−−− Read part −−−−−−−

Data Organizer 6

The hits are stored sequentially in the hitmem, whatever is the order in which they arrive Hits are cloned in two dual-port memories, reading uses 4 memory ports Max. No of Hits per event, no other restrictions on the input 7 The nextmem holds, for each hit address, the location of the next hit of the same SuperStrip ID The HLP keeps track of the address of the first hit of each SuperStrip ID Write part −−−−−−− Read part −−−−−−−

Data Organizer – latest improvements The HLP width is increased to 320bits, giving access to ranges of 32 memory locations. 8 Write part −−−−−−− Read part −−−−−−− In this way we can check for data on groups of 8 SuperStrips in parallel for DC At the same time, the BRAM formation is more compact, requiring 10% less resources

Data Organizer – latest improvements The HLP function also changes, now it keeps the location of the last hit location in the hitmem, eliminating the need for a lastmem 9 Write part −−−−−−− Read part −−−−−−− The freed up BRAMS are eventually put to use to make the reading rate data- independent

Track Fitting 10

Track Fitting  Track helix parameters and  2 can be extracted from linear equations in the local silicon hit coordinates  The resolution of the linear fit is close to that of the full helical fit in a narrow region (sector) of the detector  p i ’s are the helix parameters and  2 components.  x j ’s are the hit coordinates in the silicon layers.  a ij & b i are stored constants from full simulation or real data tracks.  The range of the linear fit is a “sector” which consists of a single silicon module in each detector layer.  Using FPGA DSPs, very good performance can be achieved 5sct+3pix*2=11 11 coordinate Space to 5 dimensional surface 11

Track Fitting - Combiner Visualization of the combiner function on an example Road Best fit is selected, others are discarded 12 Road

Track Fitting: Scalar Product Calculation Pipeline Very fast FPGA implementation was developed for the fitter All multiplications are executed in parallel, giving 1 fit per clock Using dedicated DSP resources, the frequency of the fitter is 550MHz 4 such fitters run in parallel in the device 13 DSP

FPGA implementation  The main components of the design have already been implemented  Placement on the device has been made for estimating true achievable clock rates and the power dissipation  Target device is 28nm, Xilinx 7-series XC7K325T- 900ffg-3  The clock frequencies we are targeting are close to the actual limits of the device  To achieve such clock rates, many coding guidelines and advanced design techniques must be followed 14

FPGA implementation (device floorplanning view) Track Fitter instances Local clock routing for the Track Fitters 15

Device utilization & power  Power estimation of 15.5W is the absolute worst-case figure  Simple power optimization moves and migration to a newer 20nm device family is expected to reduce this by 30% or more 16

Speed and latency 100MHz input bus 50MHz output bus MHz DDR

Speed and latency 225MHits per layer 50MRoads Units are per second 2200Mfits Mhits per layer MBit RLDRAM3 57.6Gb/s 10ns t RC Total speed will be data dependent, further system simulation needed for precision

There are ideas to further improve performance, if needed Speed and minimum latency ~40ns ~10ns ~50ns ~70ns Figures represent latency from last incoming hit to first output track 19 Minimum system latency (from last hit to first computed parameters) <0.3μs

Sample event processing time  In a typical event there are 500 hits/layer  50 roads are assumed to be produced by the AM  4 layers are assumed to have 1 hit/road, the rest 4 to have 2hits/road each Event processing time with current AMChip (100MHz input)  5μs (SSIDs to AM)  1μs (roads from AM) > 0.36μs (processing time for all the roads)  0.17μs (latency from RAM, DO, Combiner, TF)  Total: 6.17μs, which is less than 8μs which is considered to be the limit for L1 tracking Event processing time after a reasonable AMChip upgrade (200MHz input)  2.5μs (SSIDs to AM)  0.25μs (roads from AM) < 0.36μs (processing time for all the roads)  0.17μs (latency from RAM, DO, Combiner, TF)  Total: 3μs, which is less than half the L1 limit, lots of headroom for bigger events 20

Progress  Data organizer and Track Fitter already implemented in the FPGA device  After the last improvements on the Data Organizer are over, the combiner and external memory interface will be implemented  Speed and latency figures of the system are promising  After design completion, testing on the prototype board will follow 21

Thank you! 22

Backup  Backup slides, way more to be added 23

FPGA implementation Data organizers are more spread in the device 24

 A shortlist of design techniques necessary for such high- speed implementation  Pipelining of control signals and memory buses  Careful fan-out control for many signals  Manual device resource instantiation  Manual floorplanning of key components  Utilization of dedicated routing wherever possible  Local clock buffers for 550MHz areas FPGA implementation 25

FPGA firmware More detailed overview of the design, showing serial transceivers and more details of the DO 26

DO function 27

Latency, processing speed  To write 1k hits to the DO, 4.5us is needed  To forward 100 roads worth of hits to the Track Fitter another ~ us is necessary (data dependent)  The TF itself because of its high operating frequency does not have any noticeable latency, the maximum processing speed would be close to 2.4Gfits/sec if it was utilized 100% of the event time  There are ideas on how to increase all those numbers if there is need  Simulations will be done to show if the current performance is enough 28

Latency, processing speed  There are ideas on how to increase all those numbers if there is need  Just migrating to an UltraScale device can increase the performance, by utilizing even more parallelism and clock rate capabilities  Simulations have to be done to show if the current performance is enough, but the figures seem good for now 29