NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla (asingla@stanford.edu) Gene Juknevicius (genej@stanford.edu)

Agenda  NetFPGA Development Board  Project Introduction  Design Analysis Bandwidth Analysis Top Level Architecture Data Path Design Overview Control Path Design Overview Verification and Synthesis Update  Conclusion

NetFPGA Development Board

Project Introduction  4 Port Layer-2/3 Output Queued Switch Design  Ethernet (Layer-2), IPv4, ICMP, and ARP  Programmable Routing Tables – Longest Prefix Match, Exact Match  Register support for Switch Fwd On/Off, Statistics, Queue Status, etc.  Layer-2 Broadcast, and limited Layer-3 Multicast support  Limited support for Access Control  Highly Modular Design for future expandability

 Available Data Bandwidth Memory bandwidth: 32 bits * 25 MHz = 800 Mbits/sec CFPGA to Ingress FIFO/Control Block bandwidth: 32 bits * 25 MHz / 4 = 200 Mbits/sec Packet Queue to Egress bandwidth: 32 bits * 25 MHz / 4 = 200 Mbits/sec  Packet Processing Requirements 4 ports operating at 10 Mbits/sec => 40 Mbits/sec Minimum size packet 64 Byte => 512 bits 512 bits / 40 Mbits/sec = 12.8 us Internal clock is 25 MHz 12.8 us * 25 MHz = 320 clocks to process one packet Bandwidth Analysis

Top Level Architecture

Data Flow Diagram  Output Queued Shared Memory Switch  Round Robin Scheduling  Packet Processing Engine provides L2/L3 functionality  Coarse Pipelined Arch. at the Block Level

Master Arbiter  Round Robin Scheduling of service to Each Input and Output  Interfaces Rest of the Design with Control FPGA  Co-ordinates activities of all high level blocks  Maintains Queue Status for each Output

Ingress FIFO Control Block  Interfaces three blocks Control FPGA Forwarding Engine Packet Buffer Controller  Dual Packet Memories for coarse pipelining  Responsible for Packet Replication for Broadcast

Packet Processing Engine Overview  Goals Features – L3/L2/ICMP/ARP Processing Performance Requirements – 78Kpps Fit within 60% of Single User FPGA Block Modularity / Scalability Verification / Design Ease  Actual Support for all required features + L2 broadcast, L3 multicast, LPM, Statistics and Policing (coarse access control) Performance Achieved – 234Kpps ( worst case 69Kpps for ICMP echo requests 1500bytes ) Requires only 12% of Single UFPGA resources Highly Modular Design for design/verification/scalability ease

Pkt Processing Engine Block Diagram Forwarding Master State Machine First Level Parsing Packet Memory0 ARP ProcessingL3 Processing Native Packet To Packet Buffer Packet Memory1 ICMP ProcessingL2 Processing Statistics and Policing From CFPGA

Forwarding Master State Machine  Responsible for controlling individual processing blocks  Request/Grant Scheme for future expandability  Initiates a Request for Packet to Ingress FIFO and then assigns to responsible agents based on packet contents  Replication of MSM to provide more throughput

L3 Processing Engine  Parsing of the L3 Information: Src/Dest Addr, Protocol Type, Checksum, Length, TTL  Longest Prefix Match Engine Mask Bits to represent the prefix. Lookup Key is Dest Addr Associated Info Table (AIT) Indexed using the entry hit AIT provides Destination Port Map, Destination L2 Addr, Statistics Bucket Index Request/Done scheme to allow for expandability (e.g. future m-way Trie implementation project)  ICMP Support Engine Request (if Dest Addr is Routers IP Address + Protocol Type is ICMP)  Total 85 cycles for Packet Processing with 80% of the cycles spent on Table Lookup If using 4-way trie, total processing time can be reduced to less than 30 cycles.

L2 Processing Engine  If there is any processing problems with ARP, ICMP, and/or L3, then L2 switching is done  Exact Match Engine Re-use of the LPM match engine but with Mask Bits set to all 1’s. Associated Info Table (AIT) Indexed using the entry hit AIT provides Destination Port Map, and Statistics Bucket Index Request/Done scheme to allow for expandability (e.g. future Hash implementation project)  Learning Engine removed because of Switch/Router Hardware Verification problems (HP Switch bug)  Total 76 cycles for Packet Processing with over 80% of the cycles spent on Table Lookup If using Hashing Function, total processing time can be reduced to less than 20 cycles.

Packet Buffer Interface  Interfaces with Master Arbiter and Forward Engine  Output Queued Switch Statically Assigned Single Queue per port  Off-chip ZBT SRAM on NetFPGA board

Control Block  Typical Register Rd/Wr Functionality Status Register Control Register (forwarding disable, reset) Router’s IP Addresses (port 1-4) Queue Size Registers Statistics Registers Layer-2 Table Programming Registers Layer-3 Table Programming Registers

Verification  Three Levels of Verification Performed Simulations:  Module Level – to verify the module design intent and bus functional model  System Level – using the NetFPGA verification environment for packet level simulations Hardware Verification  Ported System Level tests to create tcpdump files for NetFPGA traffic server  Very good success on Hardware with all System Level tests passing.  Only one modification required (reset generation) after Hardware Porting  Demo - Greg can provide lab access to anyone interested

Synthesis Overview  Design was ported to Altera EP20K400 Device  Logic Elements Utilized – 5833 (35% of Total LEs)  RAM ESBs Used – 46848 (21% of Total ESBs)  Max Design Clock Frequency ~ 31MHz  No Timing Violations Design Block Name Flip-flops (Actual) Ram bits (Actual) Gates (Actual) Main Arbiter7101500 Memory Controller10902000 Control Block60805000 Ingress FIFO Controller60640001200 Switching and Routing Engine92514000 Total17737800023700

Conclusion  Easy to achieve “required” performance in an OQ Shared Memory Switch in NetFPGA  Modularity of the design allows more interesting and challenging future projects  Design/Verification Environment was essential to meet schedule  NetFPGA is an excellent design exploration platform

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

Similar presentations

Presentation on theme: "NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

Similar presentations

Presentation on theme: "NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius"— Presentation transcript:

Similar presentations

About project

Feedback