Advanced Topics on FPGA Applications Screen B Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007.

Slides:



Advertisements
Similar presentations
Microprocessors.
Advertisements

Chapter 9 Computer Design Basics. 9-2 Datapaths Reminding A digital system (or a simple computer) contains datapath unit and control unit. Datapath: A.
Track Trigger Designs for Phase II Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen, S.X. Wu, (Boston.
Jan. 2009Jinyuan Wu & Tiehui Liu, Visualization of FTK & Tiny Triplet Finder Jinyuan Wu and Tiehui Liu Fermilab January 2010.
Some Thoughts on L1 Pixel Trigger Wu, Jinyuan Fermilab April 2006.
ADC and TDC Implemented Using FPGA
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
Processor Technology and Architecture
FSMs 1 Sequential logic implementation  Sequential circuits  primitive sequential elements  combinational logic  Models for representing sequential.
1 ReCPU:a Parallel and Pipelined Architecture for Regular Expression Matching Department of Computer Science and Information Engineering National Cheng.
Data Reduction Processes Using FPGA for MicroBooNE Liquid Argon Time Projection Chamber Jinyuan Wu (For MicroBooNE Collaboration) Fermilab May 2010.
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Programmable logic and FPGA
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
Distributed Arithmetic: Implementations and Applications
GCSE Computing - The CPU
CS61C L15 Synchronous Digital Systems (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
Resource Saving in Micro-Computer Software & FPGA Firmware Designs Wu, Jinyuan Fermilab Nov
Without hash sorting, all O(n 2 ) combinations must be checked. Hash Sorter - Firmware Implementation and an Application for the Fermilab BTeV Level 1.
FPGA IRRADIATION and TESTING PLANS (Update) Ray Mountain, Marina Artuso, Bin Gui Syracuse University OUTLINE: 1.Core 2.Peripheral 3.Testing Procedures.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Resource Awareness FPGA Design Practices for Reconfigurable Computing: Principles and Examples Wu, Jinyuan Fermilab, PPD/EED April 2007.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
TDC for SeaQuest Wu, Jinyuan Fermilab Jan Jan. 2011, Wu Jinyuan, Fermilab TDC for SeaQuest 2 Introduction on FPGA TDC There are.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Mar Wu Jinyuan, Fermilab 1 FPGA: From Flashing LED to Reconfigurable Computing Wu, Jinyuan Fermilab IIT Mar, 2009.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
A Pattern Recognition Scheme for Large Curvature Circular Tracks and Its FPGA Implementation Example Using Hash Sorter Jinyuan Wu and Z. Shi Fermi National.
5-1 Chapter 5—Processor Design—Advanced Topics Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan Chapter.
May Wu Jinyuan, Fermilab 1 FPGA and Reconfigurable Computing Wu, Jinyuan Fermilab ICT May, 2009.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
Advanced Topics on FPGA Applications Screen A Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007.
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Readout Processing and Noise Elimination Firmware for the Fermilab Beam Loss Monitor System Wu, Jinyuan C. Drennan, R. Thurman-Keup, Z. Shi, A. Baumbaugh.
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Oct. 2007, Wu Jinyuan, FermilabIEEE NSS Refresher Course1 Digital Design with FPGAs: Examples and Resource Saving Tips Screen B Wu, Jinyuan Fermilab IEEE.
MICROPROCESSOR DETAILS 1 Updated April 2011 ©Paul R. Godin prgodin gmail.com.
Tiny Triplet Finder Jinyuan Wu, Z. Shi Dec
The SLHC CMS L1 Pixel Trigger & Detector Layout Wu, Jinyuan Fermilab April 2006.
Sp09 CMPEN 411 L21 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey’s Digital Integrated Circuits,
Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course 1 Digital Design with FPGAs: Examples and Resource Saving Tips Screen A Wu, Jinyuan Fermilab.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Logic Gates Dr.Ahmed Bayoumi Dr.Shady Elmashad. Objectives  Identify the basic gates and describe the behavior of each  Combine basic gates into circuits.
FPGA based signal processing for the LHCb Vertex detector and Silicon Tracker Guido Haefeli EPFL, Lausanne Vertex 2005 November 7-11, 2005 Chuzenji Lake,
Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,
June 2009, Wu Jinyuan, Fermilab MicroBooNe Design Review 1 Some Data Reduction Schemes for MicroBooNe Wu, Jinyuan Fermilab June, 2009.
Data Reduction Schemes for MicroBoone Wu, Jinyuan Fermilab.
TDC and ADC Implemented Using FPGA
GCSE Computing - The CPU
Wu, Jinyuan Fermilab May. 2014
Dr.Ahmed Bayoumi Dr.Shady Elmashad
William Stallings Computer Organization and Architecture 8th Edition
Computer Design Basics
Instructor: Dr. Phillip Jones
Instructor: Alexander Stoytchev
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember Hello to James Muerle in the.
Computer Organization
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Computer Design Basics
ARM ORGANISATION.
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Basic components Instruction processing
GCSE Computing - The CPU
Presentation transcript:

Advanced Topics on FPGA Applications Screen B Wu, Jinyuan Fermilab IEEE NSS 2007 Refresher Course Supplemental Materials Oct, 2007

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 2 Outline Digital Design with FPGAs (This 45 min. Course)  Logic Element in a Nutshell  Variations of the Registered Adders  Tricks of Using RAM  RAM based histograms  Topics on Multipliers  Curved Track Fitter Advanced Topics on FPGA Applications (Included as Supplemental Materials)  Doublet Finding, Hash Sorter  Triplet Finding, Tiny Triplet Finder (TTF)  Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 3 y x z y1a y1b x1a x1b y2a y2b x2a x2b y3a y3b x3a x3b 2*y1 = y2 3*y1 = y3 Doublet Matching

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 4 Example of Evaluating the Key Number 3*y1 = y3 K= 3*y1/8 K= y3/8 *3 y1y3 K K

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 5 DINDOUT Index RAM Pointer RAM DATA RAM K Link List Structure of Hash Sorter

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 6 Histogram with Fast Reset D Q K DV RAM QD WA WE RA D Q +1 D Q 0 RAM QD WA WE RA == RC CE RESET

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 7 An Example of Track Recognition: Hits

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 8 An Example of Track Recognition: Doublets Hits are paired together as doublet. Ghost doublets may exist.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 9 An Example of Track Recognition: Histogram 00 c0c0 Two track parameters can be calculated for each doublet. A 2-D histogram is booked. Doublets from same track are entered into same bin, (since they have same track parameters). Sometimes they are stored in clusters. This is a “ghost”.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 10 An Example of Track Recognition: Tracks All doublets from a track are contained in a cluster.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 11 Simulation Results An event with 200 tracks It still works at 1000 tracks/event

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 12 Example: Finding “Soft Jets” A simulated event with 200 tracks. Flat distributions. Min. R = 55 cm 16 soft tracks are added. They are grouped in 2 small initial angle regions, i.e., 2 “soft jets”. 00 00 Can you see the “soft jets”? Can you see the “soft jets” now?

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 13 Outline Digital Design with FPGAs (This 45 min. Course)  Logic Element in a Nutshell  Variations of the Registered Adders  Tricks of Using RAM  RAM based histograms  Topics on Multipliers  Curved Track Fitter Advanced Topics on FPGA Applications (Included as Supplemental Materials)  Doublet Finding, Hash Sorter  Triplet Finding, Tiny Triplet Finder (TTF)  Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 14 y x z u1a u1b v1a v1b u2a u2b v2a v2b u3a u3b v3a v3b u v

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 15 Three data items must satisfy the condition: x A + x C = 2 x B. A total of n 3 combinations must be checked (e.g. 5x5x5=125). Three layers of loops if the process is implemented in software. Large silicon resource may be needed without careful planning: O(N 2 ) Triplet Finding Plane APlane BPlane C

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 16 Block Diagram, Step 1

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 17 Block Diagram, Step 2

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 18 Circular Tracks from Collision Point on Cylindrical Detectors For a given hit on layer 3, the coincident between a layer 2 and a layer 1 hit satisfying coincident map signifies a valid circular track. A track segment has 2 free parameters, i.e., a triplet. The coincident map is invariant of rotation.  1 -  3 )+64  2 -  3 )+64

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 19 Logarithmic Shifter S1 S2 S4 # of bits: N Shift distance: L # of stages: log 2 L Total LE usage: N*log 2 L A shift of X bit of the bit pattern is done in one clock cycle rather than X cycles. Logarithmic shifter is also known as “barrel shifter”, but the term “logarithmic” reflects nature of implementation, resource usage and propagation delay better.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 20 Logic Cell Usage Both 64- and 128-bit TTF designs fit $100 FPGA comfortably. A simple 64-bit Hough transform design is shown for scale. A $1200 FPGA is shown for scale. TTF64 TTF128 $100 $1200 Hough64

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 21 u1 v1 u2 v2 u3 v3 u4 v4 y5 x5 Complex Triplet Finding Problems

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 22 Outline Digital Design with FPGAs (This 45 min. Course)  Logic Element in a Nutshell  Variations of the Registered Adders  Tricks of Using RAM  RAM based histograms  Topics on Multipliers  Curved Track Fitter Advanced Topics on FPGA Applications (Included as Supplemental Materials)  Doublet Finding, Hash Sorter  Triplet Finding, Tiny Triplet Finder (TTF)  Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 23 FPGA Process Sequencing Options Program Type Program Length (CLK cycles) ReprogramResource Usage Finite State Machine (FSM) Fixed Wired 10HardSmall Enclosed Loop Micro- Sequencer (ELMS) Memory Stored Program EasySmall Microprocessor (MP) Memory Stored Program >1000EasyLarge

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 24 The Between Counter 0,1,2,3,4,5,6,7,8,9,A 5,6,7,8,9,A SLOAD D[] SCLR N Q[] M-1 == A[] B[] T 5,6,7,8,9,A 5,6,7,8,9,A,B,C,D,E,F… PC0: instr0 PC1: instr1 PC2: instr2 PC3: instr3 PC4: instr4 PC5: instr5 PC6: instr6 PC7: instr7 PC8: instr8 PC9: instr9 PCA: instrA PCB: instrB PCC: instrC PCD: instrD T ROM Between Counter Control Signals

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 25 ELMS – Detailed Block Diagram User Control Signals FORBckA1 EndA1 #n LDR2, #addr_a LDR3, #addr_X LDR7, #0 BckA1LDR4, (R2) INCR2 LDR5, (R3) INCR3 MULR6, R4, R5 EndA1ADDR7, R7, R6 LDR8, R7 The Stack supports nested loops, up to 128 layers.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 26 What’s Good About ELMS FOR Loops at Machine Code Level Looping sequence is known in this example before entering the loop. Regular micro-processor treat the sequence as unknown. ELMS supports FOR loops with pre-defined iterations at machine code level. Execution time is saved and micro-complexities (branch penalty, pipeline bubble, etc.) associated with conditional branches are avoided. LDR1, #n LDR2, #addr_a LDR3, #addr_X LDR7, #0 BckA1LDR4, (R2) INCR2 LDR5, (R3) INCR3 MULR6, R4, R5 EndA1ADDR7, R7, R6 DECR1 BRNZBckA1 FORBckA1 EndA1 #n LDR2, #addr_a LDR3, #addr_X LDR7, #0 BckA1LDR4, (R2) INCR2 LDR5, (R3) INCR3 MULR6, R4, R5 EndA1ADDR7, R7, R6 25% MicroprocessorThe ELMS Conditional Branch

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 27 Outline Digital Design with FPGAs (This 45 min. Course)  Logic Element in a Nutshell  Variations of the Registered Adders  Tricks of Using RAM  RAM based histograms  Topics on Multipliers  Curved Track Fitter Advanced Topics on FPGA Applications (Included as Supplemental Materials)  Doublet Finding, Hash Sorter  Triplet Finding, Tiny Triplet Finder (TTF)  Options of Sequence Control, Recursive Structure, etc.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 28 The Problem: 3  60Hz AC Rectify noise from power supply using 3-phase 60Hz AC are picked up by the input cable laying in the accelerator tunnel. Time Domain Frequency Domain ADC 21  s/sample

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 29 Filtering Results Noises >360Hz, the dominating portion, are filtered out in both filter functions. CIC sum is a lot smoother than the sliding sum. But small signals are still buried under ripples of 60 and 180 Hz. Sliding Sum CIC Sum Signals

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 30 Recursive Implementation of CIC Sum The non-recursive implementation needs: 248 memory fetches, 248 multiplications, 248 additions and more ops for longer sum lengths. + s[n] -x[n-K] x[n] + y[n] -s[n-K] + u[n] -2x[n-K] x[n] + y[n] x[n-2K]  x[n] y[n] *h1 *h2 *h[K] The CIC sum constructed as a sliding sum of sliding sums: 2 memory fetches, 0 multiplications, 4 add/sub ops for any sum length. The re-formulated CIC sum uses the raw data buffer rather than a separate buffer. CIC Sum

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 31 Exponential Sequence Generator  Q SET D if (CO==1) {Q = Q - Q/32;} This is also an example of recursive structure. This is IIR but it is stable. Dropping exponential components are used to stabilize other recursive structures.

Oct. 2007, Wu Jinyuan, Fermilab IEEE NSS Refresher Course, Supplemental Materials 32 The End Thanks