Modular Design Techniques for the FPX

Slides:



Advertisements
Similar presentations
Basic HDL Coding Techniques
Advertisements

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Configuration. Mirjana Stojanovic Process of loading bitstream of a design into the configuration memory. Bitstream is the transmission.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 9 : MP3 Working Draft Washington University Fall 2002
Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
MICROPROCESSOR INPUT/OUTPUT
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
The Layered Protocol Wrappers 1 Florian Braun, Henry Fu The Layered Protocol Wrappers: A Solution to Streamline Networking Functions to Process ATM Cells,
Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.
Top Level View of Computer Function and Interconnection.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Gigabit Kits Workshop August Washington WASHINGTON UNIVERSITY IN ST LOUIS IP Processing Wrapper Tutorial Gigabitkits Workshop August 2001
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 4 : Demonstration of Machine Problem 1 : CAM-based Firewall Washington.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
FPX Network Platform 1 John Lockwood, Assistant Professor Washington University Department of Computer Science Applied Research.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.
Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.
Field Programmable Port Extender (FPX) 1 NCHARGE: Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied.
Field Programmable Port Extender (FPX) 1 Example RAD Design: IP Router using Fast IP Lookup.
Field Programmable Port Extender (FPX) 1 Software Tools for the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied Research.
بسم الله الرحمن الرحيم MEMORY AND I/O.
PARBIT Tool 1 PARBIT Partial Bitfile Configuration Tool Edson L. Horta Washington University, Applied Research Lab August 15, 2001.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 11 : Priority and Per-Flow Queuing in Machine Problem 3 (Revision 2) Washington.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the Field Programmable Port Extender John Lockwood and David Taylor Washington University.
Field Programmable Port Extender (FPX) 1 Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied Research.
The FPX KCPSM Module 1 Henry Fu The FPX KCPSM Module: An Embedded, Reconfigurable Active Processing Module for the FPX Henry Fu Washington University.
Introduction to the FPGA and Labs
Interconnection Structures
Department of Computer Science and Engineering
Chapter 6 Input/Output Organization
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
CPU Sequencing 6/30/2018.
Dr. Michael Nasief Lecture 2
FPGA Implementation of Multicore AES 128/192/256
CprE / ComS 583 Reconfigurable Computing
System Interconnect Fabric
Introduction to cosynthesis Rabi Mahapatra CSCE617
We will be studying the architecture of XC3000.
Limitations of STA, Slew of a waveform, Skew between Signals
AT91 Memory Interface This training module describes the External Bus Interface (EBI), which generatesthe signals that control the access to the external.
FPro Bus Protocol and MMIO Slot Specification
The Xilinx Virtex Series FPGA
XC4000E Series Xilinx XC4000 Series Architecture 8/98
Timing Analysis 11/21/2018.
Field-programmable Port Extender (FPX) January 2001 Workshop
FPGA Tools Course Answers
Overview of Computer Architecture and Organization
Washington University, Applied Research Lab
Remote Management of the Field Programmable Port Extender (FPX)
Layered Protocol Wrappers Design and Interface review
Win with HDL Slide 4 System Level Design
The Xilinx Virtex Series FPGA
"Computer Design" by Sunggu Lee
THE ECE 554 XILINX DESIGN PROCESS
William Stallings Computer Organization and Architecture 7th Edition
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
NetFPGA - an open network development platform
THE ECE 554 XILINX DESIGN PROCESS
CPU Sequencing 7/20/2019.
♪ Embedded System Design: Synthesizing Music Using Programmable Logic
Presentation transcript:

Modular Design Techniques for the FPX

Overview Motivation RAD Logic Resources RAD Infrastructure Modules Reconfiguration Control SRAM Interface Control Cell Processor RAD Module Interface Top Level RAD Design Pins and layout overview Module instantiation

Motivation for Modular Design Definitions Modules: entities that perform network data processing FPX Applications: packet classification, compression, etc. Infrastructure: all other entities necessary for system functionality Memory interfaces, control cell processor, reconfiguration control, etc. Assume most applications do not need all available logic and memory resources Higher performance and flexibility are achievable via multiple modules Standard module interface Ensures module interoperability Reduces design redundancy Shortens module design cycle

Dynamic Hardware Plugins (DHP) Programmable router with software and reconfigurable hardware packet processing Hardware plugins Static interfaces for I/O and off-chip memory User defined on-chip memory Infrastructure IOC Slotted ring interface Application Controller Reconfiguration control Memory Interfaces SRAM/SDRAM interfaces Applications Position independent Dynamically loadable Prototype with WUGS/SPC/FPX Partially reconfigure RAD FPGA for new applications

RAD FPGA Logic Resources Virtex 1000E –7 FPGA 4 Global Clock Trees (2) 100MHz clocks from FPX board Globally accessible IOBs Versa-Ring routing 3 flops for tri-state bussing 64 x 96 CLB array 2 flops/LUTs per Slice 2 Slices per CLB Total = 24,576 flops/LUTs 96 Block SelectRAMs 4096 bits per block 6 columns of 16 blocks 6 columns of dedicated interconnect Total = 393,216 bits

Reconfiguration Control Module Partial reconfiguration controller for RAD FPGA Executes reconfiguration handshake with NID FPGA and RAD modules Module interface Localized synchronous reset Enable Ready

SRAM Interface Module Interface to off-chip ZBT SRAM Abstracts modules from device specific timing Independent interface for each module Arbitrates requests and issues grant to winning module Modules retain access by holding request high after receiving grant Modules responsible for preventing starvation

Control Cell Processor Captures control cells for off-chip memory transactions SRAM read/write SDRAM read/write Not yet implemented Checks for correct HEC VPI = 0x000 VCI = 0x0023 (35) Modifiable register ModuleID = 0x00 OpCodes Even OpCodes for command cells Response OpCode = 1+OpCode OpCodes 0x00 to 0x0F reserved for common operations Updates CRC for response cells

RAD Module Interface Cell I/O and Flow Control Off-chip Memory Access 32-bit wide UTOPIA-style interface w/ unique timing Off-chip Memory Access Arbitrated access to SRAM and SDRAM via standard interface Control (clock, reset, and reconfiguration control)

Control Interface 100MHz global clock (CLK) All I/O signals should be synchronous to CLK Synchronous reset (RESET_L) Asserted low for 1 clock cycle Reconfiguration handshake (ENABLE_L, READY_L) Enable asserted low at reset Module must pull READY_L high after reset, prior to accepting cells in order to prevent reconfiguration during operation Enable asserted high prior to reconfiguration Module stops accepting cells, flushes internal pipelines, and asserts READY_L for at least one clock cycle

Cell Input Interface Start of Cell (SOC_MOD_IN) Signals the first word of the ATM cell 32-bit wide data path (D_MOD_IN) ATM cells transferred as (14) 32-bit words First word arrives with SOC_MOD_IN Remaining 13 words arrive on subsequent clock cycles Transmit Cell Available (TCA_MOD_IN) Signals module’s ability to accept a cell Must be valid 6 clock cycles prior to the last cycle of the current cell transfer

Cell Output Interface Start of Cell (SOC_OUT_MOD) Signals the first word of the ATM cell 32-bit wide data path (D_OUT_MOD) ATM cells transferred as (14) 32-bit words First word sent with SOC_MOD_IN Remaining 13 words sent on subsequent clock cycles Transmit Cell Available (TCA_OUT_MOD) Signals output’s ability to accept a cell Modules must sample TCA_OUT_MOD no sooner than 3 clock cycles prior to asserting SOC_OUT_MOD

HOLD SRAM_RW HIGH TO PREVENT OVERWRITING VALID MEMORY DATA SRAM Interface Arbitration Handshake SRAM_REQ requests and holds memory access SRAM_GR grants access and initiates access termination Module may retain memory access for duration of transaction set If grant is de-asserted, module must complete current transaction and release memory Module is responsible for preventing starvation Reads Hold SRAM_RW high, issue address Data appears inside module 6 clock cycles later Writes Assert SRAM_RW low, issue address and data Data will be written 5 clock cycles later IMPORTANT: HOLD SRAM_RW HIGH TO PREVENT OVERWRITING VALID MEMORY DATA

SRAM Interface Timing All I/O signals must be flopped at module boundary to ensure timing constraints are met Timing diagrams take reference point from inside module and assume boundary flops

RAD Pin Mappings RAD FPGA (Chip View) Ingress Path (LC) Input SOC_LC_NID D_LC_NID TCAFF_LC_RAD Output SOC_LC_RAD D_LC_RAD TCAFF_LC_NID Egress Path (SW) SOC_SW_NID D_SW_NID TCAFF_SW_RAD SOC_SW_RAD D_SW_RAD TCAFF_SW_NID SRAM Interfaces SDRAM Interfaces RAD FPGA (Chip View) Input Output Egress Path (SW) Input Output Ingress Path (LC) SDRAM2 SDRAM1 SRAM2 SRAM1

Design Issues & Recommendations Keep routing delays in mind during initial design phase, use conservative estimates Conform to the Module Interface Specification Use provided infrastructure Flop all module I/O signals Position independent modules Use synchronous reset Perform cell I/O simulations Experiment with synthesis and PAR options Over-constrain timing delays Significant deviations in timing results occur with various options, including hierarchy ungrouping and routing algorithms Share experience and wisdom with other developers

Example RAD Design: IP Router using Fast IP Lookup

Overview FPX file tree Design Overview Fast IP Lookup Module Overview Use of Infrastructure Modules Top-level RAD Design Design Flow (UNIX, Exemplar, Xilinx) Module design and functional simulation (ModelSim) Top-level design and functional simulation (ModelSim) Synthesis (Exemplar Leonardo & Spectrum) Place and Route (Xilinx Alliance Series) Constraint passing caveats Floorplanning to meet timing Backannotated Gate-level Simulation (ModelSim)

FPX File Tree Provided directories in all CAPS Distinguishes original (sub)directories from those added by Kits members Create subdirectory for new module designs under MODULES Perform local simulation and synthesis Create subdirectory for new top-level builds under TOP Instantiate modules and necessary infrastructure Perform system-level simulation, top-level synthesis

Design Overview SRAM1 IP Lookup Engine On-Chip Cell Store SRAM2 Packet 1 SRAM1 Interface Remap VCIs for IP packets Extract IP Headers Request Grant IP Lookup Engine counter On-Chip Cell Store SRAM2 Packet Reassembler Control Cell Processor RAD FPGA NID FPGA LC SW

Fast IP Lookup Module Overview

Top-level RAD Design with FIPL Module

End of Presentation

IP Lookup Design Constraints Maximum WUGS line rate = 1.2 Gb/s Minimum packet length = 1 cell Lookup period < 323ns Access to one 256K x 36 SRAM (Micron ZBT) Minimum memory latency = 4 clock cycles Memory accesses per lookup (IPv4, worst case) = 11 Single worst case lookup: (memory accesses)x(clock cycles/access)x(Tclk)=tlookup 11 x 4 x 10ns = 440ns Must use parallel engines and pipeline memory accesses to achieve desired performance. Reality check: FPGA routing delays comprise ~ 50% to 80% of total signal delay

IP Lookup Design Techniques Design (VHDL) Simulate design/algorithm with C program Identify constraints Design with conservative delay estimates Flops for Cell I/O Allow one clock cycle for next address calculation Simulation (Mentor Graphics ModelSim) Experimental data structure written to memory from input file via “fake” control cell processor Used “fake” NID model with file I/O to pass cells in and out Synthesis (Exemplar) Targeted 9ns clock period Place and Route (Xilinx Alliance Series) Used constraint file with pin mappings Weighted delay vs. area Used DFS routing algorithm vs. KPATHS

IP Lookup Status and Changes Initial design simulates, synthesizes, and PARs Timing reports specify maximum clock frequency of 58MHz… need ~ 2x speedup Experimenting with floorplanning Maintain hierarchy through synthesis Hand-place data path CLBs Redesign pipeline Add flops to SRAM interface signals Increases memory latency to 6 clock cycles Achieve 1.2Gb/s lookups with two engines Create position independent module Perform final gate-level simulation with robust test vectors and sample data structures

Dynamic Hardware Plugins (DHP) Application for partial FPGA reconfiguration Ingress/Egress plugin modules Modules are position independent plugins Multiplexed Daisy-Chain enables plugin permutations Dynamic reconfiguration Plugins are dynamically loaded into running device Plugins may be bypassed during re-configuration Central control block Cell routing, flow control Memory mgmt. Plugin reconfiguration control NID FPGA Interface (Cell I/O) DHP Control SDRAM Interface SRAM Interface DHP Module BlockRAM Ingress Path Egress Path

IP Lookup as a DHP Module Ingress module Cell I/O Process all IP data flows passing through switch port Watch for control cell updates to root node pointer Requires access to SRAM Tree bitmap data structure stored in off-chip SRAM Implements Cell Store, IP Address FIFO, and Output VCI FIFO in Block SelectRAM NID FPGA Interface (Cell I/O) DHP Control SDRAM Interface SRAM Interface DHP Module BlockRAM Ingress Path Egress Path Fast IP Lookup Engine IP Wrapper Extract IP Address Remap VCIs Cell Store Cells IN Cells OUT SRAM Interface

Challenges DHP Module control Cell routing to correct permutation of plugin modules Flow classification and tagging of cells Flow control Asynchronous (non-flywheel) cell I/O interfaces Plugins may arbitrarily delay cells Plugins may inject more traffic than they absorb and vice versa Implementing and maintaining static DHP Module interfaces Signal route locks for plugin module interface Signal route locks for memory and control signals Reservation of logic and routing resources Memory resource arbitration Sharing off-chip memory resources between a dynamic set of applications Maintaining flow state between plugins