Tiny Triplet Finder Jinyuan Wu, Z. Shi Dec. 2003.

Slides:



Advertisements
Similar presentations
Lecture 15 Finite State Machine Implementation
Advertisements

Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
On-Chip Processing for the Wave Union TDC Implemented in FPGA
Jan. 2009Jinyuan Wu & Tiehui Liu, Visualization of FTK & Tiny Triplet Finder Jinyuan Wu and Tiehui Liu Fermilab January 2010.
Some Thoughts on L1 Pixel Trigger Wu, Jinyuan Fermilab April 2006.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
FSMs 1 Sequential logic implementation  Sequential circuits  primitive sequential elements  combinational logic  Models for representing sequential.
Overview Logic Combinational Logic Sequential Logic Storage Devices SR Flip-Flops D Flip Flops JK Flip Flops Registers Addressing Computer Memory.
Overview Recall Combinational Logic Sequential Logic Storage Devices
Programmable logic and FPGA
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Overview Sequential Circuit Design Specification Formulation
Revision Mid 2 Prof. Sin-Min Lee Department of Computer Science.
CMPUT Computer Organization and Architecture II1 CMPUT329 - Fall 2003 Topic: Internal Organization of an FPGA José Nelson Amaral.
CS61C L15 Synchronous Digital Systems (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
Basic Adders and Counters Implementation of Adders in FPGAs ECE 645: Lecture 3.
Resource Saving in Micro-Computer Software & FPGA Firmware Designs Wu, Jinyuan Fermilab Nov
Without hash sorting, all O(n 2 ) combinations must be checked. Hash Sorter - Firmware Implementation and an Application for the Fermilab BTeV Level 1.
Christian Steinle, University of Mannheim, Institute of Computer Engineering1 L1 Tracking – Status CBMROOT And Realisation Christian Steinle, Andreas Kugel,
1 Random-Access Memory (RAM) Note: We’re skipping Sec 7.5 Today: First Hour: Static RAM –Section 7.6 of Katz’s Textbook –In-class Activity #1 Second Hour:
Advanced Topics on FPGA Applications Screen B Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007.
SVT workshop October 27, 1998 XTF HB AM Stefano Belforte - INFN Pisa1 COMMON RULES ON OPERATION MODES RUN MODE: the board does what is needed to make SVT.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007.
Department of Communication Engineering, NCTU 1 Unit 5 Programmable Logic and Storage Devices – RAMs and FPGAs.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
Power-Aware RAM Processing for FPGAs December 9, 2005 Power-aware RAM Processing for FPGA Embedded Memory Blocks Russell Tessier University of Massachusetts.
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
8279 KEYBOARD AND DISPLAY INTERFACING
Basic Sequential Components CT101 – Computing Systems Organization.
EKT221 ELECTRONICS DIGITAL II CHAPTER 4: Computer Design Basics
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
NA62 Trigger Algorithm Trigger and DAQ meeting, 8th September 2011 Cristiano Santoni Mauro Piccini (INFN – Sezione di Perugia) NA62 collaboration meeting,
A Pattern Recognition Scheme for Large Curvature Circular Tracks and Its FPGA Implementation Example Using Hash Sorter Jinyuan Wu and Z. Shi Fermi National.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
May Wu Jinyuan, Fermilab 1 FPGA and Reconfigurable Computing Wu, Jinyuan Fermilab ICT May, 2009.
DIGITAL 2 : EKT 221 RTL : Microoperations on a Single Register
Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007.
EKT 221 : Chapter 4 Computer Design Basics
Triplet Finder Jinyuan Wu Dec Triplets Triplet Finding (1) Filling Bit Arrays Note: Flipped Bit Order Physical Planes Bit Arrays For any hit…
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
8279 KEYBOARD AND DISPLAY INTERFACING
Fast Tracking of Strip and MAPS Detectors Joachim Gläß Computer Engineering, University of Mannheim Target application is trigger  1. do it fast  2.
Mid presentation Part A Project Netanel Yamin & by: Shahar Zuta Moshe porian Advisor: Dual semester project November 2012.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 22 Memory Definitions Memory ─ A collection of storage cells together with the necessary.
Oct. 2007, Wu Jinyuan, FermilabIEEE NSS Refresher Course1 Digital Design with FPGAs: Examples and Resource Saving Tips Screen B Wu, Jinyuan Fermilab IEEE.
Chapter 8 Solving Larger Sequential Problems.
The SLHC CMS L1 Pixel Trigger & Detector Layout Wu, Jinyuan Fermilab April 2006.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
Proposal for a “Switchless” Level-1 Trigger Architecture Jinyuan Wu, Mike Wang June 2004.
Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,
Lecture #17: Clocked Synchronous State-Machine Analysis
Wu, Jinyuan Fermilab May. 2014
Backprojection Project Update January 2002
SLP1 design Christos Gentsos 9/4/2014.
DIGITAL 2 : EKT 221 RTL : Microoperations on a Single Register
FPGA Implementation of Multicore AES 128/192/256
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember Hello to James Muerle in the.
CSE 370 – Winter Sequential Logic-2 - 1
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
RTL Design Methodology Transition from Pseudocode & Interface
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
RTL Design Methodology Transition from Pseudocode & Interface
Lecture 18 Logistics Last lecture Today HW5 due today (with extra 10%)
Preliminary design of the behavior level model of the chip
Presentation transcript:

Tiny Triplet Finder Jinyuan Wu, Z. Shi Dec. 2003

Hits, Hit Data & Triplets Hit data come out of the detector planes in random order. Hit data from 3 planes generated by same particle tracks are organized together to form triplets.

Three data items must satisfy the condition: x A + x C = 2 x B. A total of N 3 combinations must be checked (e.g. 5x5x5=125). Three layers of loops if the process is implemented in software. Large silicon resource may be needed without careful planning. Triplet Finding Plane APlane BPlane C

Tiny Triplet Finder (Animation) x1 x2 x3 Implement x1 and x3, the plane x2 is for eye- guide only. (1) Fill in hits to x1 and x3 planes. (2) Cycle through x2 hits. Line matched.

Tiny Triplet Finder Bit Array/Shifter C Bit Array/Shifter A Bit-wise Logic Block Hash Sorter C Hash Sorter A

TTF Operations Phase I: Filling Bit Arrays Note: Flipped Bit Order Physical Planes Bit Array/Shifters For any hit… Fill a corresponding logic cell. x A + x C = 2 x B x A = - x C + constant

TTF Operations Phase II: Making Match For any center plane hit… Logically shift the bit array. Perform bit- wise AND in this range. Triplet is found. Physical Planes Bit Array/Shifters

TTF Operations Phase II: Making Match (Next Hit) Triplet is found through bit-wise AND. Fake triplet may exist. It will be cut out in the later stages. Physical Planes Loop to next center plane hit… Logically shift the bit array. Bit Array/Shifters

TTF Operations Phase II: Making Match (More…) Physical Planes Loop to next center plane hit… Logically shift the bit array. Triplet is found through bit-wise AND. Bit Array/Shifters

TTF Operations Phase II: Making Match (and More…) Physical Planes Loop to next center plane hit… Logically shift the bit array. Triplet is found through bit-wise AND. Bit Array/Shifters

TTF Operations Phase II: Making Match (Last Hit) Physical Planes Loop to next center plane hit… Logically shift the bit array. Triplet is found through bit-wise AND. Bit Array/Shifters

Multiple Hits & Triplets Keep Them All Each bin may be filled with more than one hits. Hits data are kept in hash sorters – allowing multiple hits per bin. There are may be more than one match in each bit-wise AND operation. They are all sent out, one-by-one, to the later stages for fine cut and arbitration processes.

Boundary Issues Beyond Just Bit-wise AND When the track hits near the boundary of a bin, simple bit-wise AND may miss the triplet. The bit-wise OR-AND logic will cover the boundary. The logic cells in most of today’s FPGA’s have 4 inputs. So the OR- AND bit-wise logic doesn’t increase any resource usage.

DA BitLogic KA XB EN CKC Pop Halt Bit64A Q KS WR EN SR Bit64C Q KS WR EN SR Hash2blk DIQQ K Push Pop WT RePop SR RDY MD Hash2blk DIQQ K Push Pop WT RePop SR RDY MD Cut and Arbitration QT DA EN DB RDYDC Shift Reg. 6 steps DQ4 EN FIFO Interface Hit Feeder A QAD EN KA Push FIFO Interface Hit Feeder C QCD EN KC Push A FIFO Interface Hit Feeder B SAD EN QB Chk SC XB 172 LUT FF 92 LUT FF 192 LUT 111 FF 77 LUT 66 FF 77 LUT 66 FF Tiny Triplet Finder: Block Diagram

Schematics

Simulation (1) Filling hits on Plane A and C. (2) Looping over hits on Plane B. Pipelined internal operations… Triplets are grouped together. This combination is a fake triplet. The extra combination halts the pipeline for proper operation.

Logic Cell Usage Both 64- and 128-bit TTF designs fit $100 FPGA comfortably. A simple 64-bit Hough transform design is shown for scale. A $1200 FPGA is shown for scale.

Use 4 bit arrays. There are 3 constraints total. More constraints help to eliminating fake tracks. It is possible to use bit- wise majority logic (such as 3-out-of-4) to accommodate detector inefficiency issues. Pentlet Finding Beyond Just Bit-wise AND Plane APlane CPlane EPlane BPlane D

Comparison of Baseline and Tiny Triplet Finder: Data Flow FIFO AM: Long Doublet FIFO Station N-1 bend Station N-1 Non-bend Station N Non-bend FIFO AM: Triplet FIFO AM: Short Doublet FIFO AM: Short Doublet AM: Short Doublet Station N bend Station N+1 bend Station N+1 Non-bend Station N-1 bend Station N-1 Non-bend Station N Non-bend FIFO FTF: Triplet FIFO Hash Sorter: Short Doublet Station N bend Station N+1 bend Station N+1 Non-bend FIFO Hash Sorter: Short Doublet Hash Sorter: Short Doublet Hash Sorter Hash Sorter

The End Thanks

Hash2blk DIQQ K Push Pop WT RePop SR RDY MD Cut, Arbitration and Combine QT DA EN DT RDYDC Shift Reg. 2 steps DQQ EN FIFO Interface Hit Feeder A’ QAD EN KA Push 77 LUT 66 FF Short Doublet Stage: 1 of 3 Identical Ones Triplet Feeder SAD EN QT Push Station N-1 Non-bend FIFO Hash Sorter: Short Doublet Triplets In Triplets Out EPStnIDHitMap y2y (8+3bits) x1x (8+3bits)  y1(400u) y1y (8+3bits)  x1(400u) y3y (8+3bits) H EPStnIDHitMap y2y (8+3bits) y1y (8+3bits) y3y (8+3bits) H x1y (7bits) x2y (7bits) x3y (7bits) x2y (7bits) x3y (7bits)

A3 A2 A1 A0 S0,S1 K(8:0) DI(27:0) DOB(27:0) IdN(7:0) CB8RE CE SR Q CB10E CE Q A0 A1 S0 COMP A EQ B FD DQ DQ EN FD DQ DQ EN RAM 512x18 D A O WE EN DO FD DQ RAM 256x36 D A O WE EN DO FD DQ EN RDY MD PUSH POP WT REPOP DUMPD SR CLK POPQ PUSHQ DIQ(27:0) IdP(7:0)IdQ(7:0) IdK(7:0)IdKQ(7:0) EVeq EV(9:0) EV(8:0) Vertex II Implementation Hash Sorter: Vertex-II Implementation

A3 A2 A1 A0 S0,S1 K(7:0) DI(28:0) DOB(28:0) IdN(6:0) CB7RE CE SR Q CB9E CE Q A0 A1 S0 COMP A EQ B FD DQ DQ EN FD DQ DQ EN RAM 256x16 D A O WE EN DO FD DQ RAM 128x36 D A O WE EN DO FD DQ EN RDY MD PUSH POP WT REPOP DUMPD SR CLK POPQ PUSHQ DIQ(28:0) IdP(6:0)IdQ(6:0) IdK(6:0)IdKQ(6:0) EVeq EV(8:0) EV(7:0) Cyclone Implementation Hash Sorter: Cyclone Implementation

Bit Array/shifter XILINX Implementation DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0) Each Bit Array/Shifter has 64 bins. Each bin is a 16-bit RAM (look-up table). The RAM operates like a D Flip-flop. Each xc2v1000 FPGA has 10,240 such RAM’s. DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0)

Bit Array/shifter Filling the Hits Each hit is written into 8 to 16 different RAM’s with different addresses. One clock cycle (not 8 to 16) is used to fill a hit.

Bit Array/shifter Shift Reading With given addresses of the RAM’s, hit patterns appear at the outputs of the arrays. Changing addresses shifts the hit patterns. One array shifts in unit of 1 bin, the other in unit of 8 bins. The relative shift of the two patterns covers 128 bins. One clock cycle is used for a reading, regardless the distance of the relative shifting.

Bit Array/shifter Comments: Xilinx or Not DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0) Writing/reading operations need only single clock cycle. Storage and shifter are combined — big resource saving. Maximum 16 clock cycles are needed for resetting.  Non-xilinx implementation: Additional 64x8=512 LE’s may be needed.  But resetting needs only one clock cycle. DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0) DQ WE A(3:0)

A triplet has two parameters. With two hits in two planes, two parameters can be determined. The 3 rd hit in the 3 rd plane will provide the first constraint. Assume each parameter is sliced into 64 bins. (That’s not too many.) There are 64 x 64 = 4096 “roads”, i.e., possible track configurations. Any hit can belong to 64 possible roads. Two hits in two planes have one common road. Three hits must have one common road to be in a triplet. Challenge on Triplet Finding (1) The Facts in “Road” Language

It is possible to use (static) contents addressable memory (CAM) to implement triplet finding. Hit patterns are pre-stored in CAM. When a hit pattern match the pre-stored one, the address represent the road. Each of 3 planes has 64 possible hit locations. (64x64=4096 roads) Assume one road can generate one hit pattern, a CAM array with 64x3=192 input bits, 4096 words is needed. It can be done with 128 small CAM’s (32-bit 32-words). EPC20K1000E has 160. (80%) Oops: boundary issues have not be considered. (More patterns  ) Challenge on Triplet Finding (2) A Possible Implementation Using CAM CAM D( ) A WE Match Plane APlane BPlane C Road CAM D( ) A WE Match CAM D( ) A WE Match

Book a 2-D (64x64) histogram. Each bin of the histogram represent a road. Each hit from plane A, C or B increments the counts of the 64 bins in a row, column or diagonal. The bins with count 3 are valid triplet roads. Each histogram bin is a counter plus selection logic. Assume each bin can be implemented with 4 Altera logic elements (LE’s) bins need 16,384 LE’s, EPC20K1000E has 38,400. (16,384/38,400 = 43%) However: How to find the bins with count 3 and encode them is not trivial. Challenge on Triplet Finding (3) A Possible Implementation Using Hough Transformation

Silicon Usage (Estimate) The Tiny Triplet Finder (TTF) silicon usage is acceptable. The hash sorter silicon usage is acceptable. The Baseline, if not implemented automatically, could reduce the size a lot.

RAM D A O WE EN D CC8RLE Q L CE SR A0 A1 A2 A3 S CB8SE S1 CE SS Q FD DQ CE RAM D A O WE EN DO NBin DI DOB IdNext FD DQ CE