TELL40 VELO time ordering Pablo Vázquez, Jan Buytaert, Karol Hennessy, Marco Gersabeck, Pablo Rodríguez P. Vazquez (U. Santiago)112/12/2013.

Slides:



Advertisements
Similar presentations
IO Interfaces and Bus Standards. Interface circuits Consists of the cktry required to connect an i/o device to a computer. On one side we have data bus.
Advertisements

What are FPGA Power Management HDL Coding Techniques Xilinx Training.
Lecture 4. Topics covered in last lecture Multistage Switching (Clos Network) Architecture of Clos Network Routing in Clos Network Blocking Rearranging.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh.
1 RAMP 100K Core Breakout Assorted RAMPants RAMP Retreat, UC San Diego June 14, M.
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
1 Tell10 Guido Haefeli, Lausanne Electronics upgrade meeting 10.February 2011.
LAV firmware status Francesco Gonnella Mauro Raggi 23 rd May 2012 TDAQ Working Group Meeting.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
EstiNet Network Simulator & Emulator 2014/06/ 尉遲仲涵.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
High Speed Digital Design Project SpaceWire Router Student: Asaf Bercovich Instructor: Mony Orbach Semester: Winter 2009/ Semester Project Date:
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Wireless Sensor Monitoring Group Members: Daniel Eke (COMPE) Brian Reilly (ECE) Steven Shih (ECE) Sponsored by:
Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
11/01/2006Wilco Vink / Martin van Beuzekom / L. Wiggers L0 ECS Workshop Pile-Up System.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Department of Communication Engineering, NCTU 1 Unit 5 Programmable Logic and Storage Devices – RAMs and FPGAs.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
2/2/2009 Marina Artuso LHCb Electronics Upgrade Meeting1 Front-end FPGAs in the LHCb upgrade The issues What is known Work plan.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
First ideas for the Argontube electronics Shaper, simulations Block Diagram for analog path Delta Code Data Reduction Bus system, Controller Max.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Peter JansweijerATLAS week: February 24, 2004Slide 1 Preparatory Design Studies MROD-X Use Xilinx Virtex II Pro –Rocket IO –Power PC –Port the current.
Performed by:Yulia Turovski Lior Bar Lev Instructor: Mony Orbach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
FPGA firmware of DC5 FEE. Outline List of issue Data loss issue Command error issue (DCM to FEM) Command lost issue (PC with USB connection to GANDALF)
The ATLAS Global Trigger Processor U. Schäfer Phase-2 Upgrade Uli Schäfer 1.
01/04/09A. Salamon – TDAQ WG - CERN1 LKr calorimeter L0 trigger V. Bonaiuto, L. Cesaroni, A. Fucci, A. Salamon, G. Salina, F. Sargeni.
CERN, 18 december 2003Coincidence Matrix ASIC PRR Coincidence ASIC modifications E.Petrolo, R.Vari, S.Veneziano INFN-Rome.
Edge Detection. 256x256 Byte image UART interface PC FPGA 1 Byte every a few hundred cycles of FPGA Sobel circuit Edge and direction.
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
JRA-1 Meeting, Jan 25th 2007 A. Cotta Ramusino, INFN Ferrara 1 EUDRB: A VME-64x based DAQ card for MAPS sensors. STATUS REPORT.
® Virtex-E Extended Memory Technical Overview and Applications.
Peter JansweijerATLAS week: February 24, 2004Slide 1 Preparatory Design Studies MROD-X Use Xilinx Virtex II Pro –RocketIO –PowerPC –Port the current MROD-In.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
Integration and commissioning of the Pile-Up VETO.
Virtual-Channel Flow Control William J. Dally
ATLAS RoI Builder + CDF ● Brief reminder of ATLAS Level 2 ● Opportunities for CDF (what fits - what doesn't) ● Timescales (more of what fits and what doesn't)
LHCb upgrade Workshop, Oxford, Xavier Gremaud (EPFL, Switzerland)
Proposal for a “Switchless” Level-1 Trigger Architecture Jinyuan Wu, Mike Wang June 2004.
Providing QoS in IP Networks
AMC40 firmware Data format discussions 17/10/2013Guillaume Vouters1.
Artur BarczykRT2003, High Rate Event Building with Gigabit Ethernet Introduction Transport protocols Methods to enhance link utilisation Test.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
CEC 220 Digital Circuit Design Decoders, Encoders, & ROM Wed, February 19 CEC 220 Digital Circuit Design Slide 1 of 18.
29/05/09A. Salamon – TDAQ WG - CERN1 LKr calorimeter L0 trigger V. Bonaiuto, L. Cesaroni, A. Fucci, A. Salamon, G. Salina, F. Sargeni.
TELL1 A common data acquisition board for LHCb
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
SLP1 design Christos Gentsos 9/4/2014.
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
ADDRESSING Before you can send a message, you must know the destination address. It is extremely important to understand that each computer has several.
Field Programmable Gate Array
Centar ( Global Signal Processing Expo
Programmable Logic- How do they do that?
Improving Memory System Performance for Soft Vector Processors
TELL1 A common data acquisition board for LHCb
Multiprocessors and Multi-computers
Presentation transcript:

TELL40 VELO time ordering Pablo Vázquez, Jan Buytaert, Karol Hennessy, Marco Gersabeck, Pablo Rodríguez P. Vazquez (U. Santiago)112/12/2013

Time ordering Packet(34b)=ASIC(4b)+SuperpixelAddress(13b)+BCID(9b)+Hitpattern(8b) Data packets arrive to the TELL40 disordered in time Time ordering = grouping of packets by 9bit BCID value Grouping is done in 2 steps: – Router  based on 4-bits of BCID (MSB or LSB, see later) – Memory storage  remaining 5-bits BCID 12/12/2013P. Vazquez (U. Santiago)2  Goal of this talk Decoding routing memory Data processing “Standard LHCb” “VELO”

4-bit BCID VELO packet router Non blocking scheme using internal memory (FIFO’s) Different number of input links considered: – MHz (320) for a half (full) module as described in the TDR – MHz for a full module, when reducing optical links 2 architectures under study: – “Cascade” of 1-bit router elements. Each element routes packets based on 1-bit – “crossbar” : single stage N x 2 M ports routes packets based on all bits Use of altera library megafunctions: lpm_scfifo, lpm_compare... 12/12/2013P. Vazquez (U. Santiago)3

1-bit router elements 12/12/2013P. Vazquez (U. Santiago)4 Data 0 w0 Out 0 Out 1 r0 r1 If (BID[i]=0) Write FlipFlop0 else Write FlipFlop1 Data 0 Data 1 w0 w1 If (BID[i]=0) Write FIFO_00 else Write FIFO_01 Out 0 Out 1 r0 r1 If (BID[i]=0) Write FIFO_10 else Write FIFO_11 More Full? More Full? 2x2 router element uses 4 fifos while 1x1 router element doesn’t use fifos

10x16 router with “1-bit router elements” 12/12/2013P. Vazquez (U. Santiago)5 xxxx xxx1 xxx0 2x2 port 1-bit router xx11 xx01 1x2 port 1-bit router

FPGA resources and max speed Several configurations compiled for the actual device: altera stratix V 5SGXEA7N3F45C2 Router can 320 MHz with 1-5% of resources with a cross-sectional bandwidth higher than required (2G pakets/s peak for full module) 12/12/2013P. Vazquez (U. Santiago)6 In x Out links16x16 (TDR)10x16 (Optimized) Fifo depth (words)512 Logic utilization (in ALMs)3,582 ( 2 % )2,722 ( 1 % ) Total registers Total block memory bits58,920 ( < 1 % )1,274,940 ( 2% ) M20K blocks128 ( 5 % )84 ( 3 % ) Fmax 85ºC (MHz) Bandwidth links x 320MHz (G packet/s)

Simulation Generate simulated input data using the known distribution of the event size and latency (see next slides) Aim: the dependency of packet loss vs fifo depth and routing strategy (MSB vs LSB bits) – Aim: data packet loss << 0.1%? Every 34-bit FIFO is allocated on a M20K memory block => optimal is 512 x 40bit. Could be extended We will start simulating the MHz case 12/12/2013P. Vazquez (U. Santiago)7

Event size distribution The 20 inputs of a full module are below 40% of peak packet rate capacity 12/12/2013P. Vazquez (U. Santiago)8 Data from Martin van Beuzekom At the input of MHz Links 1-8mean 0.37 pkt/clk Links 9-12 mean 0.31 pkt/clk Link 13,14mean 0.25 pkt/clk Link 15,16mean 0.30 pkt/clk Link 17,18mean 0.26 pkt/clk Link 19,20mean 0.16 pkt/clk

Simulation latency Packets arrive at the tell40 with variable latency: Routing with the MSB bits generates data rate peaks 10 times higher than routing with LSB bits – increases packet loss probability (but maybe still acceptable), – but is nicer for MEP: consecutive events are stored in the same memory – Maybe better suited for data- processing (idle time between peaks) 12/12/2013P. Vazquez (U. Santiago)9

Summary TELL40 VELO router block has been investigated Compilation of designs based on 1-bit router elements shows that with 1-5% of FPGA resources, the router can handle easily the required bandwidth Simulations with self-generated data will show the packet loss (with respect to fifo depth and MSB/LSB routing strategy) 12/12/2013P. Vazquez (U. Santiago)10

backup 12/12/2013P. Vazquez (U. Santiago)11

10x16 router 1 stage 12/12/2013P. Vazquez (U. Santiago)12 Data 0 Data 9 w0 w9 Out 0 Out 15 r0 r15