9/11/01 CDRZ Fitter1 Z Fitter Algorithm and Implementation Masahiro Morii, Harvard U. Requirements, I/O Algorithm Implementation Resources & Latency.

Slides:

Advertisements

Similar presentations

Decision Structures - If / Else If / Else. Decisions Often we need to make decisions based on information that we receive. Often we need to make decisions.

Advertisements

Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.

ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.

Jan. 2009Jinyuan Wu & Tiehui Liu, Visualization of FTK & Tiny Triplet Finder Jinyuan Wu and Tiehui Liu Fermilab January 2010.

JET Algorithm Attila Hidvégi. Overview FIO scan in crate environment JET Algorithm –Hardware tests (on JEM 0.2) –Results and problems –Ongoing work on.

FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.

DCZ status & results B A B AR Trigger Workshop, December 2004 Jamie Boyd University of Bristol for the Trigger Upgrade Group.

L1 CTT May 14, 2002 New Scheme for Tracking Marvin Johnson.

Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)

Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik

C++ for Engineers and Scientists Third Edition

ZPD Overview Stephen Bailey Harvard University ZPD Conceptual Design Review 11 September 2001.

Eunil Won Harvard University April 11, 2003 for ZPD FDR 1 ZPD Prototype Tests and DAQ Implementation Introduction Prototype Tests - Electrical Signals.

PRE-PROGRAMMING PHASE

Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.

Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.

1 DIGITAL DESIGN I DR. M. MAROUF FPGAs AUTHOR J. WAKERLY.

Low Cost TDC Using FPGA Logic Cell Delay Jinyuan Wu, Z. Shi For CKM Collaboration Jan

Viterbi Decoder Project Alon weinberg, Dan Elran Supervisors: Emilia Burlak, Elisha Ulmer.

The Track-Finding Processor for the Level-1 Trigger of the CMS Endcap Muon System D.Acosta, A.Madorsky, B.Scurlock, S.M.Wang University of Florida A.Atamanchuk,

Some mediation on fast track reconstruction of BESIII Wang Dayong Mar 10,2004.

Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.

AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili.

Allen Michalski CSE Department – Reconfigurable Computing Lab University of South Carolina Microprocessors with FPGAs: Implementation and Workload Partitioning.

SVT workshop October 27, 1998 XTF HB AM Stefano Belforte - INFN Pisa1 COMMON RULES ON OPERATION MODES RUN MODE: the board does what is needed to make SVT.

DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.

Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt.

© 2003 Xilinx, Inc. All Rights Reserved Answers DSP Design Flow.

MdcPatRec Tracking Status Zhang Yao, Zhang Xueyao Shandong University.

1 Conditions Logical Expressions Selection Control Structures Chapter 5.

Trigger processor John Huth Harvard. NSW + TGC of BW’s track fitting track position (R,  ) d  : deviation of incidence angle from infinite pT muons.

1 2-Hardware Design Basics of Embedded Processors (cont.)

Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.

By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.

1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.

AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.

Computer Architecture Lecture 32 Fasih ur Rehman.

80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.

Transfering Trigger Data to USA15 V. Polychonakos, BNL.

Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.

Building Your Own Machine The Universal Machine (UM) Introduction Noah Mendelsohn Tufts University Web:

ZPD Project Overview B A B AR L1 DCT Upgrade FDR Masahiro Morii Harvard University Design Overview Progress and Changes since CDR Current Status Plans.

November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.

Fast Tracking of Strip and MAPS Detectors Joachim Gläß Computer Engineering, University of Mannheim Target application is trigger  1. do it fast  2.

2001/02/16TGC off-detector PDR1 Sector Logic Status Report Design Prototype-(-1) Prototype-0 Schedule.

© 2003 Xilinx, Inc. All Rights Reserved Answers DSP Design Flow.

December 16, 2005M Jones1 New Level 1 Track Triggers Outputs from SLAM provide lists of axial tracks plus associated stereo information Consider new Level.

The SLHC CMS L1 Pixel Trigger & Detector Layout Wu, Jinyuan Fermilab April 2006.

Bo LI Kalman filter algorithm for tracking 2  Kalman filter prediction provides a good criterion for hit selection;  Kalman filter accumulates.

The Universal Machine (UM) Implementing the UM Noah Mendelsohn Tufts University Web:

STAR Simulation. Status and plans V. Perevoztchikov Brookhaven National Laboratory,USA.

Preliminary Design of Trigger System for BES III Zhen’an LIU Inst of High Energy Physics,Beijing Oct

M. Ellis - MICE Collaboration Meeting - Wednesday 27th October Sci-Fi Tracker Performance Software Status –RF background simulation –Beam simulation.

Tracking software of the BESIII drift chamber Linghui WU For the BESIII MDC software group.

Trigger for MEG2: status and plans Donato Nicolo` Pisa (on behalf of the Trigger Group) Lepton Flavor Physics with Most Intense DC Muon Beam Fukuoka, 22.

FPGA Helix Tracking Algorithm -- VHDL implementation Yutie Liang, Martin Galuska, Thomas Geßler, Wolfgang Kühn, Jens Sören Lange, David Münchow, Björn.

Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,

1 Chapter 1: Basic Concepts Assembly Language for Intel-Based Computers, 4th edition Kip R. Irvine 9/6/2003.

FPGA Helix Tracking Algorithm with STT

Introduction to the FPGA and Labs

Update on the FPGA Online Tracking Algorithm

Study of T0 Extraction in the Online Tracking Algorithm

FIT Front End Electronics & Readout

RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS

CSE 370 – Winter 2002 – Comb. Logic building blocks - 1

L1 simulation review Aug 2005 Jamie Boyd

Whitebox Testing.

Whitebox Testing.

The Trigger Control System of the CMS Level-1 Trigger

Software Testing.

Presentation transcript:

9/11/01 CDRZ Fitter1 Z Fitter Algorithm and Implementation Masahiro Morii, Harvard U. Requirements, I/O Algorithm Implementation Resources & Latency

9/11/01 CDRZ Fitter2 Executive Summary Z Fitter measures track’s z 0, p T, tan Algorithm demonstrated in C++ emulation LUTs reduce real-time computation to minimum Resource and timing evaluated Use 4% of CLBs in XC2V4000 Can process 3 seeds/CLK4 in pipeline Latency < 1 CLK4 for each seed FPGA implementation ready to start

9/11/01 CDRZ Fitter3 Functionality Fit seed tracks from the Finder to a helix A seed track = a set of 10 TSF segments Measure z 0, p T, tan  Decision Module segment fitted track z0z0

9/11/01 CDRZ Fitter4 I/O Inputs from Seed Finder TSF segments, hit map Curvature  (or 1/p T ), tan FPGA internal bus Outputs to Decision Module Fit results: z 0, z 0 error, , tan Hit map Sent over 10 traces at 45 MHz

9/11/01 CDRZ Fitter5 Inputs TSF segments 10 bit  and 4 bit error  is relative to the seed segment  Only 9 segments/seed are needed Error is not used by the Fitter Hit map Which layer had a segment  10 bits

9/11/01 CDRZ Fitter6 Inputs Curvature  Fitter needs a 1 st guess of  from the Finder 6-bit resolution tan Not used by the Fitter Finder provides 6-bit resolution

9/11/01 CDRZ Fitter7 Outputs Fit results Hit map Passed through from input QuantityUnitResolutionLimits z0z0 cm8 bits±127 cm z 0 errorcm4 bits0 to 15 cm  cm bits|p T | > 145 MeV/c tan bits |tan | < 3.97

9/11/01 CDRZ Fitter8 Algorithm Step 1: r-  fit Ignore stereo information 6 measurements of  at different r Find seed  and  that minimize  2 in r-  Step 2: z 0 fit Use stereo information 6 measurements of z at different r Find z 0 and tan that minimize  2 in r-z

9/11/01 CDRZ Fitter9 r-  Fit Merge stereo layers  virtual axial layers 3 U+V pairs plus 3 axial  6 r-  measurements  = 0 Subtract shift due to curvature using input  Residuals are due to errors in  and  (seed)

9/11/01 CDRZ Fitter10 r-  Fit  residual Calculate and minimize Error of  Error of seed   =  + 

9/11/01 CDRZ Fitter11 z 0 Fit Go back to 6 stereo layers Apply corrections for  and   =  Subtract shift due to curvature using fitted  Residuals are due to stereo angles

9/11/01 CDRZ Fitter12 z 0 Fit  residual 6 z i make a straight line in d-z plane Stereo angle z d z0z0

9/11/01 CDRZ Fitter13 z 0 Fit Minimize Assume error

9/11/01 CDRZ Fitter14 Implementation Biggest concern: Speed Pre-compute as much as possible Most computation packed in LUTs Only additions and multiplications at run time First step: Software emulation Bit-wise emulation of what hardware will do Validate the algorithm Study and optimize the performance

9/11/01 CDRZ Fitter15 Software Emulation bool L1DczNIFitter::zFitter(const L1DczNIFtable* table, int hitmap, const int* segphi, int rhoin, int &z0, int &z0err, int &rhoout, int &dipout) const { if (!table->fitok(hitmap)) { z0 = -128; z0err = 15; rhoout = -128; dipout = -128; return false; } int phi[9]; int rh = table->rh5(rhoin); int hitax = table->hitax(hitmap); int sumr2phi = 0; int sumr2phidpdr = 0; int i; for (i = 0; i < 9; i++) { phi[i] = (segphi[i]*table->phiconv(i))>>13; phi[i] += table->twistzero(i); if (rhoin >= 0) phi[i] += table->curvcorr(i,rh); else phi[i] -= table->curvcorr(i,rh); if (table->useax(hitax,i)) { sumr2phi += table->wr2(i)*phi[i]; sumr2phidpdr += table->wr2dpdr(i,rh)*phi[i]; } int dPhi1 = (sumr2phi*table->sumr2dpdr2(hitax,rh))>>13; int dPhi2 = (sumr2phidpdr*table->sumr2dpdr(hitax,rh))>>13; int dPhi3 = (dPhi1-dPhi2)*table->denomrp(hitax,rh); int dPhi = dPhi3>>16; int dRho1 = (sumr2phi*table->sumr2dpdr(hitax,rh))>>14; int dRho2 = (sumr2phidpdr*table->sumr2(hitax))>>14; int dRho3 = (dRho1-dRho2)*table->denomrp(hitax,rh); int dRho = dRho3>>16; rhoout = (rhoin<<2) + dRho; hitmap &= 63; rh = table->rh3(rhoout); int sumzs2 = 0; int sumzds2 = 0; for (i = 0; i < 6; i++) { phi[i] += -dPhi + ((table->dphidrho(i,rh)*dRho)>>6); int z = (table->rstereo(i)*phi[i])>>8; if (hitmap & (1<<i)) { sumzs2 += z*table->sigma2z(i); sumzds2 += z*table->dsigma2z(i,rh); } int z01 = (sumzs2*table->sumd2s2(hitmap,rh))>>6; int z02 = (sumzds2*table->sumds2(hitmap,rh))>>6; z0 = ((z01-z02)*table->denomzt(hitmap,rh))>>16; int td1 = (sumzds2*table->sums2(hitmap))>>1; int td2 = (sumzs2*table->sumds2(hitmap,rh))>>1; dipout = ((td1-td2)*table->denomzt(hitmap,rh))>>16; z0err = table->z0err(hitmap,rh); return true; } The whole code (58 lines C++) is made of LUTs, additions, multiplications and bit shifts

9/11/01 CDRZ Fitter16 Engineering Constraints FPGA: Xilinx Virtex-II XC2V4000 chosen for the Seed Track Finder Allow much smaller resources than Finder As few as possible CLBs Latency: as short as possible Ideally < 1 CLK4 FPGA runs at 180 MHz  48 ticks/CLK4 Most logic operations take 1 tick 18-bit multiplication takes 2 ticks

9/11/01 CDRZ Fitter17 Seed Counting 12 seeds/module/CLK4  4 Engines Each Finder/Fitter pair processes 3 seeds Fitter receives a new seed every 1/3 CLK4 Fitting takes ~3/4 CLK4 for each seed  Pipeline processing Finder Fitter ~3/4 CLK4 A seed arriving every 1/3 CLK4 Decision Module

9/11/01 CDRZ Fitter18 Data Flow r-  fit z 0 fit

9/11/01 CDRZ Fitter19 r-  Fit Block Input 9 segments 3 segs/pipeline Unit conversion Stereo cancellation Curvature subtraction Accumulate Carry stereo segments to z 0 Fit

9/11/01 CDRZ Fitter20 Accumulator Accumulate 2 quantities from 3 sources arriving every 2 ticks Pipeline 1 Pipeline 2 Pipeline 3

9/11/01 CDRZ Fitter21  and  Calculation 6 multiplications and 2 additions in 5 ticks

9/11/01 CDRZ Fitter22 z 0 Fit Block  and  from r-  Fit 6 Stereo segments 3 segs/pipeline ,  correction   z conversion Accumulate Done!

9/11/01 CDRZ Fitter23 Resources Dominated by the LUTs

9/11/01 CDRZ Fitter24 Timing Input arrives r-  pipeline  and  calculated DPM z 0 pipeline z 0 and tan calculated 37 ticks Input arrives Next seed arrives Time is in CLK180 ticks

9/11/01 CDRZ Fitter25 Timing and Latency Separation between two input seeds  15 OK to process MHz 2 seeds/CLK4 if 120 MHz (same as Finder) Latency for z 0 and tan = 37 ticks ~3/4 180 MHz Output will add ~5 ticks  Still < 1 CLK4 If 120 MHz, ~1.3 CLK4

9/11/01 CDRZ Fitter26 Executive Summary Z Fitter measures track’s z 0, p T, tan Algorithm demonstrated in C++ emulation LUTs reduce real-time computation to minimum Resource and timing evaluated Use 4% of CLBs in XC2V4000 Can process 3 seeds/CLK4 in pipeline Latency < 1 CLK4 for each seed FPGA implementation ready to start