Field-programmable Port Extender (FPX) January 2001 Workshop

Field-programmable Port Extender (FPX) January 2001 Workshop
John Lockwood Washington University Applied Research Lab Supported by: NSF ANI and Xilinx Corp. Abstract By incorporating Field Programmable Gate Arrays (FPGAs) at the edge of a network switch, it is possible to implement hardware-based modules that have both the flexiblity to be reconfigured and the performance to process packets at the full rate of the Internet backbone. The Field Programmable Port Extender (FPX) has been built to augment an existing Gigabit switch systems with reprogrammable hardware modules. The FPX system includes a high-speed network interface, multiple banks of memory, and FPGA logic. The FPX allows hardware modules to be dynamically installed into the router by using partial reprogramming of an FPGA. Applications have been developed for the FPX that include IP packet routing, queuing, and data compression.

Technologies for Implementing Networks
Microprocessors Fully Reprogrammable Silicon resources dictated by CPU Vendor Mostly Sequential Processing Custom Hardware Highly concurrent processing Silicon resources optimized for application Static Functionality Reprogrammable Hardware Highly concurrent processing Silicon resources optimized for application Fully Preprogrammable

Integrating FPGAs into an Internet Router
IP Packets IP Packets The FPX processes packets at the edge of a switch, in between the fiber-optic line card and the backplane switch fabric. The FPX sees both flows in the ingress (input to the swich) and egress (exit from the switch). The throughput of the packet processing system must be the same or greater than the maximum link speed of the fiber optic card. The current FPX handles link rates up to OC48. FPX Modules distributed across each port of a switch IP packets (over ATM) enter and depart line card Packet fragments processed by modules

Hardware Device The FPX is implemented on a 12-layer circuit board

Architecture of the FPX
RAD Large Xilinx FPGA Attaches to SRAM and SDRAM Reprogrammable over network Provides two user-defined Module Interfaces NID Provides Utopia Interfaces between switch & line card Forwards cells to RAD Programs RAD The FPX is implemented with two FPGAs: the RAD and the NID. The Reprogrammable Application Device (RAD) contains the user-defined modules. External SRAM and SDRAM connect to the RAD. The RAD contains the multiple modules of application-specific functionaity The NID contains the logic to: Interface with switch and line card Route traffic flows to the RAD Reprogram the modules on the RAD

Infrastructure Services

Routing Traffic Flows Between Modules
Traffic flows routed among Switch Line Card RAD.Switch RAD.Linecard VC NID Functions Check packets for errors Process commands Control, status, & reprogramming Implement per-flow forwarding The NID provides a non-blocking, multi-port switch for forwarding data to the appropriate module. As flows enter the system Switch LineCard ccp EC

Typical Flow Configurations
VC EC ccp RAD Switch LineCard Default Flow Action (Bypass) VC EC ccp RAD Switch LineCard (Per-flow Output Queueing) Egress Processing VC EC ccp RAD Switch LineCard Ingress Processing (IP Routing) VC EC ccp RAD Switch LineCard Full RAD Processing (Packet Routing and Reassembly) VC EC ccp LineCard Switch RAD Full Loopback Testing (System Test) VC EC ccp RAD Switch LineCard Partial Loopback Testing (Egress Flow Processing Test) Several configurations of flow routing are possible. By default, the FPX forwards all flows directly between the ingress and egress ports To process packets just on the egress path, flows can be routed from the switch to a RAD module; then routed from the RAD module to the line card. To process packets on the ingress path, flows are routed from the line card to a RAD module; then routed from the RAD module to the switch. Using both modules, packets can be processed on the ingress and egress paths Modules can also be chained, allowing packets to be processed by multiple modules as they pass through the FPX

Reprogramming Logic NID programs at boot from EPROM
Switch Controller writes RAD configuration memory to NID Bitfile for RAD arrives transmitted over network via control cells Switch Controller issues {Full/Partial} reconfigure command NID reads RAD config memory to program RAD Performs complete or partial reprogramming of RAD Switch Element IPP OPP LC

Software Services for Controlling the FPX
Fip Memory Methods of Communication - Fpx_control - Telnet - Web Interface / CGI - Basic_send - User Applications Software Plug-ins - Concepts - Functionality Emulation Nid_listener Rad_listener Remote Manager Applications Basic Telnet Read WEB Send Fip Access CGI Basic Send Software Controller fpx_control fpx_control 0.0 7.1 VCI 76 (NID), VCI 100 (RAD) VCI 115 (NID), VCI 123 (RAD) OC-3 Link (up to 32 VCIs) Washington University Gigabit Switch NID NID RAD RAD

Pictorial view of fpx_control interfaced with hardware
{0-7}.{0/1}

Combination Router Hardware and Software
Implement link speed opertions on hardware Implement higher-level functions in software Migrate functionality on the critical path

FPX Hardware

FPX SRAM Provide low latency for fast table-lookups
Zero Bus Turnaround (ZBT) allows back-to-back read / write operations every 10ns Dual, Independent Memories 36-bit wide bus The SRAM memories are piplined, so that they can be fully utilized. Use Zero Byte Turnaround (ZBT) memory, a memory access has four cycles of latcency. The SRAM is well suited for fast memory lookups.

FPX SDRAM Dual, independent SDRAM memories 64-bit wide, 100 MHz
64MByte / Module : 128 Mbyte total [expandable] Burst-based transactions [1-8 word transfers] Latency of 14 cycles to Read/Write 8-word burst SDRAM provides cost-effective storage of data. A 64-bit wide, pipelined module allows full-throughput queuing of traffic to and from the RAD.

Hardware Device The FPX is implemented on a 12-layer circuit board

Development of FPX Applications

FPX Interfaces Provides
Well defined Interface Utopia-like 32-bit fast data interface Flow control allows back-pressure Flow Routing Arbitrary permutations of packet flows through ports Dynamically Reprogrammable Other modules continue to operate even while new module is being reprogrammed Memory Access Shared access to SRAM and SDRAM Request/Grant protocol

Network Module Interface
D_MOD_IN[31:0] D_MOD_OUT[31:0] Data Interface SOC_MOD_IN SOC_MOD_OUT TCA_MOD_OUT TCA_MOD_IN Module Logic SRAM_GR SRAM_REQ SRAM_D_IN[35:0] SRAM Interface SRAM_D_OUT[35:0] SRAM_ADDR[17:0] SRAM_RW SDRAM_GR SDRAM_REQ SDRAM_DATA[63:0] SDRAM_DATA[63:0] SDRAM Interface SRAM_ADDR[17:0] SRAM_RW CLK Module Interface RESET_L ENABLE_L READY_L

Reprogrammable Application Device (RAD)
Spatial Re-use of FPGA Resources Modules implemented using FPGA logic Module logic can be individually reprogrammed Shared Access to off-chip resources Memory Interfaces to SRAM and SDRAM Common Datapath to send and receive data

Combining Modules within the Chip
Modules fit together at static I/O interfaces Partial reprogramming of FPGA used to install/remove modules Modules added and removed while other modules process packts Statically-configured ‘Long Lines’ provide chip-wide routing Data Intrachip Module Switching FPX Module FPX Module FPX Module SRAM SRAM SRAM SDRAM ... SDRAM Modules can be chained SDRAM FPGA’s Long Lines Module Loading / Unloading

SDRAM Controller Interface
Implements Burst Read/Writes to SDRAM Provides refresh signals to SDRAM Asserts RAS / CAS signals for address Provides standard Interface to Application SDRAM Interface SDRAM_DATA SDRAM_BL SDRAM_ADDR SDRAM_EN SDRAM_RD_WR SDRAM_RQ(1 to n) SDRAM_GR(1 to n) OP_FIN(1 to n) Access Bus Arbitration Signals

On-Chip sharing of SDRAM
Implements on-chip and off-chip tri-state buses Shared wire resources used on-chip Arbitrates among multiple modules Allows multiple modules to share 1 SDRAM Module(I) SDRAM Interface SDRAM Access Bus SDRAM_RQ(I) SDRM_GR(I) OP_FIN(I)

Applications for the FPX

Pattern Matching Use Hardware to detect a pattern in data
Modify packet based on match Pipeline operation to maximize throughput

“Hello, World” Module Function

Logical Implementation
Append “WORLD” to payload VCI Match New Cell

Source: Concurrent VHDL Statements
BData_Out_process: process (clkin) begin -- buffer signal assignments: if clkin'event and clkin = '1' then d_sw_rad <= BData_Out; -- (Data_Out = d_sw_rad) BData_in <= d_sw_nid; -- (Data_In = d_sw_nid) BSOC_In <= soc_sw_nid; -- (SOC_In = soc_sw_nid) BSOC_Out <= BSOC_In; BTCA_In <= tcaff_sw_nid; -- (TCA_In = tcaff_sw_nid) BTCA_Out <= BTCA_In; ... counter <= nx_counter; -- next state assignments state <= nx_state; -- next state assignments:

Manifest of Files in HelloTestbench.tar
Contains: README.txt: General Information Makefile: Build and complile programs TESTCELL.DAT: Cells written into simulation (Hex) CELLSOUT.DAT: Data written out from simulation Hex.txt: HEX/ASCII Table fake_NID_in.vhd: Utilities to save cells to file fake_NID_out.vhd: Utility to read cells from file top.vhd: Top level design helloworld.vhd: Top-level helloworld design pins.ucf: Pin mapping for RAD FPGA

TestBench configuration
TESTCELL.DAT top NID_Out HelloWorld soc Data tcaff Clk Reset NID_In soc Data tcaff CELLSOUT.DAT

Post-Synthesis Signal Timing
Start_of_cell (SOC): Buffered across Edge flops data_in : VCI=5, Payload=“HELLOEEO…” data_out : “HELLO WORLD.”

Higher-Level Application Wrappers

The wrapper concept

AAL5 Encapsulation Payload is packed in cells Padding may be added
64 bit Trailer at end of cell Trailer contains CRC-32 Last Cell indication bit (last bit of PTI field)

HelloBob module HelloBob/MODULES/HelloBob/vhdl/module.vhdl

Applications : IP Lookup Algorithm

Fast IP Lookup Algorithm
Function: Search for best matching prefix using Trie algorithm Contributors Will Eatherton, Zubin Dittia, Jon Turner, David Taylor, David Wilke, Prefix Next Hop * 01* 4 7 10* 2 110* 9 0001* 1 1011* 00110* 5 01011* 3 1

Hardware Implementation in the FPX
SRAM1 1 SRAM1 Interface Remap VCIs for IP packets Extract IP Headers Request Grant IP Lookup Engine counter On-Chip Cell Store SRAM2 Packet Reassembler Control Cell Processor RAD FPGA NID FPGA LC SW

Fast IP Lookup (FIPL) Application
Route add /24 8 Route delete /16 Commands FIPL Memory Manager Software Control cells RAM External SRAM Hardware Lookup FIPL Fast IP Lookup Lookup (X.Y.Z.W) Nexthop FPGA

Conclusions

Conclusions (1) Reprogrammable Hardware Networking Module
Enables fine-grain, concurrent processing Provides “Sea of functions” Software upgradable Networking Module Contains a well-defined interface for implementation of network function in hardware Includes SRAM and SDRAM for table storage and queuing Data Interface Module Logic SRAM Interface SDRAM Interface Module Interface

Conclusions (2) Field Programmable Port Extender (FPX)
Network-accessible Hardware Reprogrammable Application Device Module Deployment Modules implement fast processing on data flow Network allows Arbitrary Topologies of distributed systems Project Website

FPX Workshop Agenda: Times and Location
Thursday, Jan 11, 2001 8am: Breakfast 5th floor Jolley Atrium 9am-Noon: Session I Sever 201 Lab Lunch 1pm-5pm: Session II Friday, Jan 12, 2001 8am: Breakfast 5th floor Jolley Atrium 9am-Noon: Session III Sever 201 Lab Lunch 1pm-5pm: Session IV On-line Agenda:

End of Presentation

Implementing DHP Modules in Virtex1000E
Virtex 1000E logic resources Globally accessible IOBs 64 x 96 CLB array 4 flops/LUTs per CLB 96 Block SelectRAMs 4096 bits per block 6 columns of 16 blocks 6 columns of dedicated interconnect Ingress Path Egress Path DHP Module Double DHP Module DHP Modules 64 x 12 CLB array (768 CLBs, 3072 flops) Double DHP Modules 64 x 24 CLB array (1536 CLBs, 6144 flops) 16 BRAMs (8KB) per Module IOB Ring VersaRing CLB columns BRAMs BRAM Interconnect 3 DHP Modules per path 1 SRAM interface per path 1 SDRAM interface per path

FPGA: Design Flow Application groups develop RAD module
VHDL EDIF BIT Download Xilinx bit VHDL Design Spectrum Xilinx Backend file to FPX FPGA Timing Logical Simulation Verification Application groups develop RAD module Compile of Architecture Synthesize into LUT functions Route and place into CLB Array Verify timing of circuit to 100 MHz

Hello, World – Silicon Layout View

Post-Synthesis Signal Timing
Start_of_cell (SOC): Buffered across Edge flops data_in : VCI=5, Payload=“HELLOEEO…” data_out : “HELLO WORLD.”

Results: Performance Operating Frequency: 119 MHz.
8.4ns critical path Well within the 10ns period RAD's clock. Targeted to RAD’s V1000E-FG680-7 Maximum packet processing rate: 7.1 Million packets per second. (100 MHz)/(14 Clocks/Cell) Circuit handles back-to-back packets Slice utilization: 0.4% (49/12,288 slices) Less than one half of one percent of chip resources Search technique can be adapted for other types of data matching and modification Regular expressions Parsing image content …

Analysis of Pipelined FIPL Operations
Generate Address Generate Address Latch ADDR into SRAM SRAM D < M[A] Latch Data into FPGA Compute Time (cycles) Time (cycles) Space (Parallel lookup units on FPGA) Throughput : Optimized by interleaving memory accesses Operate 5 parallel lookups t_pipelined_lookup = 550ns / 5 = 110 ns Throughput = 9.1 Million packets / second

Hello, World Entity RAD NID

Field-programmable Port Extender (FPX) January 2001 Workshop

Similar presentations

Presentation on theme: "Field-programmable Port Extender (FPX) January 2001 Workshop"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Field-programmable Port Extender (FPX) January 2001 Workshop

Similar presentations

Presentation on theme: "Field-programmable Port Extender (FPX) January 2001 Workshop"— Presentation transcript:

Similar presentations

About project

Feedback