Download presentation
Presentation is loading. Please wait.
Published byTheodora French Modified over 8 years ago
1
Producing FPGA Firmware- 1 U. Wisconsin, February 19, 2009 Calorimeter Algorithm Firmware Calorimeter Trigger Upgrade Firmware Michael Schulte, Katherine Compton, Tony Gregerson, Ben Buchli, and Amin Farmahini-Farahani U. Wisconsin - Madison February 19, 2009 In collaboration with Wesley Smith, Sridhara Dasu, Michail Bachtis, Kevin Flood, Tom Gorski, David Hinkemeyer, Shuvra Bhattacharyya, William Plishker, George Zaki, Nimish Sane, and Soujanya Kedilaya
2
Producing FPGA Firmware- 2 U. Wisconsin, February 19, 2009 IntroductionIntroduction Motivation and Goals Design Platform and Methodology Preliminary Designs and Results Input RocketIO and Input Buffering Particle Cluster Finder Cluster Overlap Filter Planned Implementation on the Calorimeter Trigger Prototype Planned Tools and Techniques
3
Producing FPGA Firmware- 3 U. Wisconsin, February 19, 2009 Motivation and Goals The upgraded Calorimeter Trigger will require new algorithms Modern FPGAs provide efficient platforms for these algorithms Implement Calorimeter Trigger using A unified design platform Unified design and test methodologies Techniques that facilitate future upgrades Start by implementing a baseline design for the new algorithms
4
Producing FPGA Firmware- 4 U. Wisconsin, February 19, 2009 Initial Design Platform Xilinx Virtex-5 devices contain Virtex-5 Slices (4 LUTs and 4 flip-flops) DSP48E Slices (multiplier, adder, and accumulator) Block RAM (36 Kbits) RocketIO Transceivers GTP transfers up to 3.75 Gbps GTX transfers up to 6.50 Gbps Initial designs synthesized for Xilinx Virtex-5 LX110T and TX240T FPGAs FPGA Virtex-5 Slices DSP48E Slices Block RAM (Kbits) RocketIO Transceivers LX110T17,280645,32816 GTP TX240T37,4409611,66448 GTX
5
Producing FPGA Firmware- 5 U. Wisconsin, February 19, 2009 Initial Design Methodology Designs start with the algorithms Physicists and engineers collaborate Evaluate algorithm/implementation tradeoffs Designs specified using VHDL, Verilog, and Xilinx Core Generator Designs implemented and tested using Xilinx ISE v10.1 ModelSim Xilinx Edition v6.3 Gather results for Input RocketIO, input buffering, particle cluster finder, and cluster overlap filter
6
Producing FPGA Firmware- 6 U. Wisconsin, February 19, 2009 Rocket IO and Buffering Our initial design on TX240T FPGAs uses Xilinx’s Aurora protocol for RocketIO inputs Each GTX Dual Tile de-serializes 2x8x16 = 256 bits every 25ns. 16 16-bit registers store data for 15 towers for 25ns. GTX Dual Tile Serial RocketIO Tower Input Serial RocketIO Tower Input 8 16-bit Registers 8 16-bit Registers 16 1 1 16 16-bit Registers 16 15 16 ECAL/HCAL E t [0] ECAL/HCAL E t [14] ECAL Finegrain Bits Ref. Clock (640 MHz) RocketIO Ref. Clock/2 (320 MHz) Input Buffers Ref Clock/16 (40 HMz) Cluster Input Particle Cluster Finder Inputs
7
Producing FPGA Firmware- 7 U. Wisconsin, February 19, 2009 Rocket IO and Buffering Each pair of RocketIO links provides 17-bit input data for 15 towers every 25ns A 10 x 10 grid requires 14 RocketIO links A 17 x 17 grid requires 40 RocketIO links Resource10 x 10 Grid17 x 17 Grid RocketIO Links29%83% Virtex-5 Slices3%8% Virtex-5 Resource Utilization for RocketIO and Input Buffering on TX240T FPGA Tower
8
Producing FPGA Firmware- 8 U. Wisconsin, February 19, 2009 Process data in 2x2 clusters of towers Inputs: 17 bits per tower (4x17 bits) 8 ECAL E t bits 8 HCAL E t bits 1 ECAL finegrain bit Algorithm is applied on overlapping clusters Step of one tower Identify if cluster contains “useful” particle energy Eliminate some noise Detect particle type Particle Cluster Finder Threshold Pattern comparator Threshold Pattern Decision Check Finegrain OR EPIM Tower Energy Sums 1 bit Zero (38 bits) 4x9=36 bits no yes 17 bits 2x2 Tower Cluster (4 x 17 bits) match?
9
Producing FPGA Firmware- 9 U. Wisconsin, February 19, 2009 Input tower data Apply threshold Boolean result, single bit per tower Compare Boolean tower pattern to stored patterns No match: output 38 zeros Match: output 38 bits OR of the finegrain bits e/γ compatibility bit Energy sums 4 Towers (4x9 bits, E+H) AlgorithmAlgorithm Threshold Pattern comparator Threshold Pattern Decision Check Finegrain OR EPIM Tower Energy Sums 1 bit Zero (38 bits) 4x9=36 bits no yes 17 bits 2x2 Tower Cluster (4 x 17 bits) match?
10
Producing FPGA Firmware- 10 U. Wisconsin, February 19, 2009 The electron/photon identification module (EPIM) Is the most complex module in the particle cluster finder Currently sets the e/γ compatibility bit if Various implementations were investigated Multiplier based – can easily change Egamma_Threshold Static tables – reconfigure FPGA to change EPIM algorithm Dynamic tables – change EPIM algorithm by reloading table Electron/Photon Identification
11
Producing FPGA Firmware- 11 U. Wisconsin, February 19, 2009 Cluster Particle Finder Resource Usage for a Single EPIM on TX240T FPGA CategoryType Primary Resource Slice Register Usage Slice LUT Usage BRAM Usage DSP Usage Multipliers DSP-BasedDSP Block0.02% ---1.0% LUT-BasedLogic Slice0.08%0.04%--- HybridDSP Block0.01% ---4.2% Static Tables LUT TreeLogic Slice0.01%14.6%--- Distributed ROM Logic Slice0.01%0.12%--- Dynamic Tables FullBRAM0.01%0.02%9.8%--- PartialBRAM0.01%0.02%0.62%---
12
Producing FPGA Firmware- 12 U. Wisconsin, February 19, 2009 Cluster Particle Finder Frequencies and maximum grid sizes for Particle Cluster Finder on TX240T FPGA CategoryType Max Freq. (MHz) Actual Freq (MHz) Max EPIMs Max Grid (w/o I/O) Max Grid (w I/O) Multipliers DSP-Based3702009622x2217x17 LUT-Based440200106073x7317x17 Hybrid3202002411x11 Static Tables LUT Tree888054x4 Distributed ROM 27020063057x5717x17 Dynamic Tables Full390200108x8 Partial45020016229x2917x17
13
Producing FPGA Firmware- 13 U. Wisconsin, February 19, 2009 Particle Cluster Finder Resource utilization for Particle Cluster Finder with Partial Dynamic Tables on TX240T FPGA Resource10 x 10 Grid17 x 17 Grid Virtex-5 Slices12%39% BRAMs19%53% Particle Cluster Finder Synthesized for a 200 MHz clock (5 ns cycle time) Latency of nine cycles (45 ns @ 200 MHz)
14
Producing FPGA Firmware- 14 U. Wisconsin, February 19, 2009 Applied on clusters produced by the Particle Cluster Finder Ensure that a tower only “belongs” to a single cluster Input: 9 clusters A central cluster The 8 neighboring clusters Determine to which cluster each tower should belong Keep towers in clusters with the most energy Prune towers from other clusters Cluster Overlap Filter Central cluster Neighbor cluster Pruned tower Cluster origin (holds all cluster info) 38 bits per input NEESES SWWNWN
15
Producing FPGA Firmware- 15 U. Wisconsin, February 19, 2009 For each “centeral” cluster, Consider each neighbor If central E t < neighbor E t, neighbor cluster is “stronger” Remove overlapping towers from central cluster Otherwise central cluster is “stronger” Remove overlapping towers from neighbor If no towers removed from central cluster, set its “central” bit Next apply threshold to cluster energy Output: 14 bits 11 bits of cluster energy, 1 Finegrain bit, 1 e/γ bit, 1 central bit AlgorithmAlgorithm Central cluster Neighbor cluster Pruned tower Cluster origin (holds all cluster info) 38 bits per input NEESES SWWNWN
16
Producing FPGA Firmware- 16 Cluster Overlap Filter Design NE E SE S SW W NW N Energy Adder Energy Adder Energy Adder Energy Adder Energy Adder Energy Adder Energy Adder Energy Adder Energy Adder Central Central < NE? Central <= E? Central <= SE? Central <= S? Central <= SW? Central <W ? Central < NW? Central < N? 1bit 11b tower bit sequence (4x9 bits) E1E2E3E4 E1E2E3E4 E1E2E3E4 E1E2E3E4 E1E2E3E4 E1E2E3E4 E1E2E3E4 E1E2E3E4 E1E2 E3E4 Energy Adder E1+E2+E3+E4 Cluster Threshold E>X? Energy (11bits) Finegrain,e/ γ 2 bits Central (1 bit)
17
Producing FPGA Firmware- 17 U. Wisconsin, February 19, 2009 Cluster Overlap Filter Synthesized for a 200 MHz clock (cycle time of 5 ns) Latency of five cycles (25 ns @ 200 MHz) Operates in parallel with EPIM No DSP48E or Block RAM resources needed Cluster Overlap Filter Results FPGA10 x 10 Grid17 x 17 Grid LX110T18%58% TX240T8%27% Virtex-5 Slice Utilization for Cluster Overlap Filter
18
Producing FPGA Firmware- 18 U. Wisconsin, February 19, 2009 Estimated latencies are given in the table below Clock rate of 200MHz (cycle time of 5 ns) Cluster Overlap Filter operated in parallel with part of Particle Cluster Finder Latency Estimates ComponentLatency (cycles)Latency (ns) Input RocketIO1050 Input Buffers525 Particle Finder and Overlap Filter945 Total Estimated Latency24120 Estimated Latencies on TX240T FPGAs
19
Producing FPGA Firmware- 19 U. Wisconsin, February 19, 2009 Estimated resources are given in the table below Includes input RocketIO, input buffers, particle finder, and overlap filter Additional grid sizes and FPGA devices should be considered Overall Resource Estimates Resource10 x 10 Grid17 x 17 Grid RocketIO Links29%83% Virtex-5 Slices23%74% Block Rams19%53% Overall Resource Utilization on TX240T FPGA
20
Producing FPGA Firmware- 20 U. Wisconsin, February 19, 2009 Implement the rest of the Calorimeter Trigger Particle Isolation and Particle ID Jet Reconstruction Particle Sorter MET,HT,MHT Calculation Perform more in-depth testing and analysis of the designs Enhance the initial designs Prototype the Calorimeter Trigger designs Calorimeter Trigger Prototype
21
Producing FPGA Firmware- 21 U. Wisconsin, February 19, 2009 We are working with U. of Maryland researchers to investigating new tools and techniques to design, test, and upgrade the CMS firmware Dataflow languages DIF and OpenDF Tools and techniques for Unit testing and automated testing Efficient designs with multiple FPGAs Generating FPGA firmware and simulator code from a single high-level specification Web-base repositories and version tracking Consistent (automated) documentation practices New Tools and Techniques
22
Producing FPGA Firmware- 22 U. Wisconsin, February 19, 2009 The preliminary firmware for the Calorimeter Trigger Upgrade has been developed Initial results look promising Additional designs are planned for this spring and summer Still need to work on Making the designs more easily upgradable Experimenting with new algorithms Helping to establish a unified platform plus unified design and test methodologies New tools and techniques to facilitate future firmware development and upgrades ConclusionsConclusions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.