Field-programmable Port Extender (FPX) January 2001 Workshop

Slides:



Advertisements
Similar presentations
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Advertisements

EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
Evolution of implementation technologies
Programmable logic and FPGA
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Gigabit Routing on a Software-exposed Tiled-Microprocessor
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.
Department of Computer Science and Engineering Applied Research Laboratory A TCP/IP Based Multi-Device Programming Circuit David V. Schuehler – Harvey.
The Layered Protocol Wrappers 1 Florian Braun, Henry Fu The Layered Protocol Wrappers: A Solution to Streamline Networking Functions to Process ATM Cells,
Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Reconfigurable.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
J. Christiansen, CERN - EP/MIC
Gigabit Kits Workshop August Washington WASHINGTON UNIVERSITY IN ST LOUIS IP Processing Wrapper Tutorial Gigabitkits Workshop August 2001
Programmable Logic Devices
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Port Extender (FPX) 1 Simulation of the Hello World Application for the Field-programmable Port Extender (FPX) John W. Lockwood, Washington.
FPX Network Platform 1 John Lockwood, Assistant Professor Washington University Department of Computer Science Applied Research.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.
Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Washington WASHINGTON UNIVERSITY IN ST LOUIS 1 DTI Visit - John DeHart- 4/25/2001 Agenda l WU/ARL Background – John DeHart (15 minutes) l DTI Background.
Field Programmable Port Extender (FPX) 1 NCHARGE: Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied.
Field Programmable Port Extender (FPX) 1 Example RAD Design: IP Router using Fast IP Lookup.
Lecture Note on Switch Architectures. Function of Switch.
Field Programmable Port Extender (FPX) 1 Software Tools for the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied Research.
PARBIT Tool 1 PARBIT Partial Bitfile Configuration Tool Edson L. Horta Washington University, Applied Research Lab August 15, 2001.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 11 : Priority and Per-Flow Queuing in Machine Problem 3 (Revision 2) Washington.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the Field Programmable Port Extender John Lockwood and David Taylor Washington University.
Field Programmable Port Extender (FPX) 1 Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied Research.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
The FPX KCPSM Module 1 Henry Fu The FPX KCPSM Module: An Embedded, Reconfigurable Active Processing Module for the FPX Henry Fu Washington University.
Introduction to the FPGA and Labs
Washington University
Modular Design Techniques for the FPX
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
HyperTransport™ Technology I/O Link
Introduction to Programmable Logic
ECE 4110–5110 Digital System Design
Addressing: Router Design
Instructor: Dr. Phillip Jones
CprE / ComS 583 Reconfigurable Computing
Interfacing Memory Interfacing.
We will be studying the architecture of XC3000.
The Xilinx Virtex Series FPGA
MICROPROCESSOR MEMORY ORGANIZATION
Overview of Computer Architecture and Organization
Washington University, Applied Research Lab
Remote Management of the Field Programmable Port Extender (FPX)
Layered Protocol Wrappers Design and Interface review
Implementing an OpenFlow Switch on the NetFPGA platform
EE 122: Lecture 7 Ion Stoica September 18, 2001.
The Xilinx Virtex Series FPGA
Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
Chapter 3 Part 3 Switching and Bridging
"Computer Design" by Sunggu Lee
Project proposal: Questions to answer
Bob Reese Micro II ECE, MSU
NetFPGA - an open network development platform
FPGA’s 9/22/08.
Programmable logic and FPGA
Multiprocessors and Multi-computers
Presentation transcript:

Field-programmable Port Extender (FPX) January 2001 Workshop John Lockwood Washington University Applied Research Lab Supported by: NSF ANI-0096052 and Xilinx Corp. http://www.arl.wustl.edu/arl/projects/fpx/workshop_0101/agenda.html Abstract By incorporating Field Programmable Gate Arrays (FPGAs) at the edge of a network switch, it is possible to implement hardware-based modules that have both the flexiblity to be reconfigured and the performance to process packets at the full rate of the Internet backbone. The Field Programmable Port Extender (FPX) has been built to augment an existing Gigabit switch systems with reprogrammable hardware modules. The FPX system includes a high-speed network interface, multiple banks of memory, and FPGA logic. The FPX allows hardware modules to be dynamically installed into the router by using partial reprogramming of an FPGA. Applications have been developed for the FPX that include IP packet routing, queuing, and data compression.

Technologies for Implementing Networks Microprocessors Fully Reprogrammable Silicon resources dictated by CPU Vendor Mostly Sequential Processing Custom Hardware Highly concurrent processing Silicon resources optimized for application Static Functionality Reprogrammable Hardware Highly concurrent processing Silicon resources optimized for application Fully Preprogrammable

Integrating FPGAs into an Internet Router IP Packets IP Packets The FPX processes packets at the edge of a switch, in between the fiber-optic line card and the backplane switch fabric. The FPX sees both flows in the ingress (input to the swich) and egress (exit from the switch). The throughput of the packet processing system must be the same or greater than the maximum link speed of the fiber optic card. The current FPX handles link rates up to OC48. FPX Modules distributed across each port of a switch IP packets (over ATM) enter and depart line card Packet fragments processed by modules

Hardware Device The FPX is implemented on a 12-layer circuit board

Architecture of the FPX RAD Large Xilinx FPGA Attaches to SRAM and SDRAM Reprogrammable over network Provides two user-defined Module Interfaces NID Provides Utopia Interfaces between switch & line card Forwards cells to RAD Programs RAD The FPX is implemented with two FPGAs: the RAD and the NID. The Reprogrammable Application Device (RAD) contains the user-defined modules. External SRAM and SDRAM connect to the RAD. The RAD contains the multiple modules of application-specific functionaity The NID contains the logic to: Interface with switch and line card Route traffic flows to the RAD Reprogram the modules on the RAD

Infrastructure Services

Routing Traffic Flows Between Modules Traffic flows routed among Switch Line Card RAD.Switch RAD.Linecard VC NID Functions Check packets for errors Process commands Control, status, & reprogramming Implement per-flow forwarding The NID provides a non-blocking, multi-port switch for forwarding data to the appropriate module. As flows enter the system Switch LineCard ccp EC

Typical Flow Configurations VC EC ccp RAD Switch LineCard Default Flow Action (Bypass) VC EC ccp RAD Switch LineCard (Per-flow Output Queueing) Egress Processing VC EC ccp RAD Switch LineCard Ingress Processing (IP Routing) VC EC ccp RAD Switch LineCard Full RAD Processing (Packet Routing and Reassembly) VC EC ccp LineCard Switch RAD Full Loopback Testing (System Test) VC EC ccp RAD Switch LineCard Partial Loopback Testing (Egress Flow Processing Test) Several configurations of flow routing are possible. By default, the FPX forwards all flows directly between the ingress and egress ports To process packets just on the egress path, flows can be routed from the switch to a RAD module; then routed from the RAD module to the line card. To process packets on the ingress path, flows are routed from the line card to a RAD module; then routed from the RAD module to the switch. Using both modules, packets can be processed on the ingress and egress paths Modules can also be chained, allowing packets to be processed by multiple modules as they pass through the FPX

Reprogramming Logic NID programs at boot from EPROM Switch Controller writes RAD configuration memory to NID Bitfile for RAD arrives transmitted over network via control cells Switch Controller issues {Full/Partial} reconfigure command NID reads RAD config memory to program RAD Performs complete or partial reprogramming of RAD Switch Element IPP OPP LC

Software Services for Controlling the FPX Fip Memory Methods of Communication - Fpx_control - Telnet - Web Interface / CGI - Basic_send - User Applications Software Plug-ins - Concepts - Functionality Emulation Nid_listener Rad_listener Remote Manager Applications Basic Telnet Read WEB Send Fip Access CGI Basic Send Software Controller fpx_control fpx_control 0.0 7.1 VCI 76 (NID), VCI 100 (RAD) VCI 115 (NID), VCI 123 (RAD) OC-3 Link (up to 32 VCIs) Washington University Gigabit Switch NID NID RAD RAD

Pictorial view of fpx_control interfaced with hardware {0-7}.{0/1}

Combination Router Hardware and Software Implement link speed opertions on hardware Implement higher-level functions in software Migrate functionality on the critical path

FPX Hardware

FPX SRAM Provide low latency for fast table-lookups Zero Bus Turnaround (ZBT) allows back-to-back read / write operations every 10ns Dual, Independent Memories 36-bit wide bus The SRAM memories are piplined, so that they can be fully utilized. Use Zero Byte Turnaround (ZBT) memory, a memory access has four cycles of latcency. The SRAM is well suited for fast memory lookups.

FPX SDRAM Dual, independent SDRAM memories 64-bit wide, 100 MHz 64MByte / Module : 128 Mbyte total [expandable] Burst-based transactions [1-8 word transfers] Latency of 14 cycles to Read/Write 8-word burst SDRAM provides cost-effective storage of data. A 64-bit wide, pipelined module allows full-throughput queuing of traffic to and from the RAD.

Hardware Device The FPX is implemented on a 12-layer circuit board

Development of FPX Applications

FPX Interfaces Provides Well defined Interface Utopia-like 32-bit fast data interface Flow control allows back-pressure Flow Routing Arbitrary permutations of packet flows through ports Dynamically Reprogrammable Other modules continue to operate even while new module is being reprogrammed Memory Access Shared access to SRAM and SDRAM Request/Grant protocol

Network Module Interface D_MOD_IN[31:0] D_MOD_OUT[31:0] Data Interface SOC_MOD_IN SOC_MOD_OUT TCA_MOD_OUT TCA_MOD_IN Module Logic SRAM_GR SRAM_REQ SRAM_D_IN[35:0] SRAM Interface SRAM_D_OUT[35:0] SRAM_ADDR[17:0] SRAM_RW SDRAM_GR SDRAM_REQ SDRAM_DATA[63:0] SDRAM_DATA[63:0] SDRAM Interface SRAM_ADDR[17:0] SRAM_RW CLK Module Interface RESET_L ENABLE_L READY_L

Reprogrammable Application Device (RAD) Spatial Re-use of FPGA Resources Modules implemented using FPGA logic Module logic can be individually reprogrammed Shared Access to off-chip resources Memory Interfaces to SRAM and SDRAM Common Datapath to send and receive data

Combining Modules within the Chip Modules fit together at static I/O interfaces Partial reprogramming of FPGA used to install/remove modules Modules added and removed while other modules process packts Statically-configured ‘Long Lines’ provide chip-wide routing Data Intrachip Module Switching FPX Module FPX Module FPX Module SRAM SRAM SRAM SDRAM ... SDRAM Modules can be chained SDRAM FPGA’s Long Lines Module Loading / Unloading

SDRAM Controller Interface Implements Burst Read/Writes to SDRAM Provides refresh signals to SDRAM Asserts RAS / CAS signals for address Provides standard Interface to Application SDRAM Interface SDRAM_DATA SDRAM_BL SDRAM_ADDR SDRAM_EN SDRAM_RD_WR SDRAM_RQ(1 to n) SDRAM_GR(1 to n) OP_FIN(1 to n) Access Bus Arbitration Signals

On-Chip sharing of SDRAM Implements on-chip and off-chip tri-state buses Shared wire resources used on-chip Arbitrates among multiple modules Allows multiple modules to share 1 SDRAM Module(I) SDRAM Interface SDRAM Access Bus SDRAM_RQ(I) SDRM_GR(I) OP_FIN(I)

Applications for the FPX

Pattern Matching Use Hardware to detect a pattern in data Modify packet based on match Pipeline operation to maximize throughput

“Hello, World” Module Function

Logical Implementation Append “WORLD” to payload VCI Match New Cell

Source: Concurrent VHDL Statements BData_Out_process: process (clkin) begin -- buffer signal assignments: if clkin'event and clkin = '1' then d_sw_rad <= BData_Out; -- (Data_Out = d_sw_rad) BData_in <= d_sw_nid; -- (Data_In = d_sw_nid) BSOC_In <= soc_sw_nid; -- (SOC_In = soc_sw_nid) BSOC_Out <= BSOC_In; BTCA_In <= tcaff_sw_nid; -- (TCA_In = tcaff_sw_nid) BTCA_Out <= BTCA_In; ... counter <= nx_counter; -- next state assignments state <= nx_state; -- next state assignments:

Manifest of Files in HelloTestbench.tar http://www.arl.wustl.edu/arl/projects/fpx/workshop_0101/HelloTestbench.tar Contains: README.txt: General Information Makefile: Build and complile programs TESTCELL.DAT: Cells written into simulation (Hex) CELLSOUT.DAT: Data written out from simulation Hex.txt: HEX/ASCII Table fake_NID_in.vhd: Utilities to save cells to file fake_NID_out.vhd: Utility to read cells from file top.vhd: Top level design helloworld.vhd: Top-level helloworld design pins.ucf: Pin mapping for RAD FPGA

TestBench configuration TESTCELL.DAT top NID_Out HelloWorld soc Data tcaff Clk Reset NID_In soc Data tcaff CELLSOUT.DAT

Post-Synthesis Signal Timing Start_of_cell (SOC): Buffered across Edge flops data_in : VCI=5, Payload=“HELLOEEO…” data_out : “HELLO WORLD.”

Higher-Level Application Wrappers

The wrapper concept

AAL5 Encapsulation Payload is packed in cells Padding may be added 64 bit Trailer at end of cell Trailer contains CRC-32 Last Cell indication bit (last bit of PTI field)

HelloBob module HelloBob/MODULES/HelloBob/vhdl/module.vhdl

Applications : IP Lookup Algorithm

Fast IP Lookup Algorithm Function: Search for best matching prefix using Trie algorithm Contributors Will Eatherton, Zubin Dittia, Jon Turner, David Taylor, David Wilke, Prefix Next Hop * 01* 4 7 10* 2 110* 9 0001* 1 1011* 00110* 5 01011* 3 1

Hardware Implementation in the FPX SRAM1 1 SRAM1 Interface Remap VCIs for IP packets Extract IP Headers Request Grant IP Lookup Engine counter On-Chip Cell Store SRAM2 Packet Reassembler Control Cell Processor RAD FPGA NID FPGA LC SW

Fast IP Lookup (FIPL) Application Route add 141.142.5.0/24 8 Route delete 141.142.0.0/16 Commands FIPL Memory Manager Software Control cells RAM External SRAM Hardware Lookup FIPL Fast IP Lookup Lookup (X.Y.Z.W) Nexthop FPGA

Conclusions

Conclusions (1) Reprogrammable Hardware Networking Module Enables fine-grain, concurrent processing Provides “Sea of functions” Software upgradable Networking Module Contains a well-defined interface for implementation of network function in hardware Includes SRAM and SDRAM for table storage and queuing Data Interface Module Logic SRAM Interface SDRAM Interface Module Interface

Conclusions (2) Field Programmable Port Extender (FPX) Network-accessible Hardware Reprogrammable Application Device Module Deployment Modules implement fast processing on data flow Network allows Arbitrary Topologies of distributed systems Project Website http://www.arl.wustl.edu/arl/projects/fpx/

FPX Workshop Agenda: Times and Location Thursday, Jan 11, 2001 8am: Breakfast 5th floor Jolley Atrium 9am-Noon: Session I Sever 201 Lab Lunch 1pm-5pm: Session II Friday, Jan 12, 2001 8am: Breakfast 5th floor Jolley Atrium 9am-Noon: Session III Sever 201 Lab Lunch 1pm-5pm: Session IV On-line Agenda: http://www.arl.wustl.edu/arl/projects/fpx/workshop_0101/agenda.html

End of Presentation

Implementing DHP Modules in Virtex1000E Virtex 1000E logic resources Globally accessible IOBs 64 x 96 CLB array 4 flops/LUTs per CLB 96 Block SelectRAMs 4096 bits per block 6 columns of 16 blocks 6 columns of dedicated interconnect Ingress Path Egress Path DHP Module Double DHP Module DHP Modules 64 x 12 CLB array (768 CLBs, 3072 flops) Double DHP Modules 64 x 24 CLB array (1536 CLBs, 6144 flops) 16 BRAMs (8KB) per Module IOB Ring VersaRing CLB columns BRAMs BRAM Interconnect 3 DHP Modules per path 1 SRAM interface per path 1 SDRAM interface per path

FPGA: Design Flow Application groups develop RAD module VHDL EDIF BIT Download Xilinx bit VHDL Design Spectrum Xilinx Backend file to FPX FPGA Timing Logical Simulation Verification Application groups develop RAD module Compile of Architecture Synthesize into LUT functions Route and place into CLB Array Verify timing of circuit to 100 MHz

Hello, World – Silicon Layout View

Post-Synthesis Signal Timing Start_of_cell (SOC): Buffered across Edge flops data_in : VCI=5, Payload=“HELLOEEO…” data_out : “HELLO WORLD.”

Results: Performance Operating Frequency: 119 MHz. 8.4ns critical path Well within the 10ns period RAD's clock. Targeted to RAD’s V1000E-FG680-7 Maximum packet processing rate: 7.1 Million packets per second. (100 MHz)/(14 Clocks/Cell) Circuit handles back-to-back packets Slice utilization: 0.4% (49/12,288 slices) Less than one half of one percent of chip resources Search technique can be adapted for other types of data matching and modification Regular expressions Parsing image content …

Analysis of Pipelined FIPL Operations Generate Address Generate Address Latch ADDR into SRAM SRAM D < M[A] Latch Data into FPGA Compute Time (cycles) Time (cycles) Space (Parallel lookup units on FPGA) Throughput : Optimized by interleaving memory accesses Operate 5 parallel lookups t_pipelined_lookup = 550ns / 5 = 110 ns Throughput = 9.1 Million packets / second

Hello, World Entity RAD NID