Download presentation
Presentation is loading. Please wait.
Published byBruno Brooks Modified over 6 years ago
1
Building Gigabit-rate Routers with the NetFPGA: NICTA Tutorial at UNSW
Presented by: John W. Lockwood, Jad Naous, Glen Gibb (Stanford University) Hosted by: Lavy Libman (NICTA) and Philip Allen (UNSW) February 6, 2008: 9am-5pm Lab 343A, Electrical Engineering Building (G17) Kensington Campus, University of New South Wales Sydney, Australia
2
What is the NetFPGA? Networking Software running on a standard PC
PC with NetFPGA Networking Software running on a standard PC CPU Memory PCI A hardware accelerator built with Field Programmable Gate Array driving Gigabit network links FPGA Memory 1GE NetFPGA Board
3
Introduction Who uses the NetFPGA How they use the NetFPGA Teachers
Students Researchers How they use the NetFPGA To run the Router Kit To build modular reference designs IPv4 router 4-port NIC Ethernet switch, … To create new systems
4
Running the Router Kit User-space development, 4x1GE line-rate forwarding
Usage #1 OSPF BGP My Protocol user kernel Routing Table CPU Memory PCI “Mirror” IPv4 Router 1GE Fwding Table Packet Buffer FPGA Memory 1GE
5
Building Modular Router Modules
Usage #2 NetFPGA Driver PW-OSPF Verilog EDA Tools (Xilinx, Mentor, etc.) Design Simulate Synthesize Download CPU Memory Java GUI Front Panel (Extensible) PCI In Q Mgmt IP Lookup L2 Parse L3 Out Q 1GE Verilog modules interconnected by FIFO interfaces 1GE FPGA 1GE 1GE My Block Memory 1GE
6
(1GE MAC is soft/replaceable)
Creating new systems Usage #3 NetFPGA Driver 1GE My Design (1GE MAC is soft/replaceable) Verilog EDA Tools (Xilinx, Mentor, etc.) Design Simulate Synthesize Download CPU Memory PCI 1GE FPGA 1GE 1GE Memory 1GE
7
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
8
Basic Operation of an IP Router
D E F R5 D R5 F R3 E D Next Hop Destination
9
What does a router do? R3 A B C R1 R2 R4 D E F R5 32 Data
16 32 4 1 Data Options (if any) Destination Address Source Address Header Checksum Protocol TTL Fragment Offset Flags Fragment ID Total Packet Length T.Service HLen Ver 20 bytes D R5 F R3 E D Next Hop Destination
10
What does a router do? A B C R1 R2 R3 R4 D E F R5
11
Basic Components of an IP Router
Software Hardware Management & CLI Routing Protocols Control Plane Routing Table Datapath per-packet processing Forwarding Table Switching
12
Per-packet processing in an IP Router
1. Accept packet arriving on an incoming link. 2. Lookup packet destination address in the forwarding table, to identify outgoing port(s). 3. Manipulate IP header: e.g., decrement TTL, update header checksum. 5. Buffer packet in the output queue. 6. Transmit packet onto outgoing link.
13
Generic Datapath Architecture
Header Processing Data Hdr Lookup IP Address Update Header Data Hdr Queue Packet Forwarding Table IP Address Next Hop Buffer Memory
14
CIDR and Longest Prefix Matches
The IP address space is broken into line segments. Each line segment is described by a prefix. A prefix is of the form x/y where x indicates the prefix of all addresses in the line segment, and y indicates the length of the segment. e.g. The prefix 128.9/16 represents the line segment containing addresses in the range: … 142.12/19 65/8 128.9/16 232-1 216
15
Classless Interdomain Routing (CIDR)
/24 /24 /20 /20 Most specific route = “longest matching prefix” 128.9/16 232-1
16
Techniques for LPM in hardware
Linear search Direct lookup Currently requires too much memory Updating a prefix leads to many changes Tries Deterministic lookup time Easily pipelined But requires multiple memories/references TCAM (Ternary CAM) Simple and widely used But low-density, high-power Gradually being replaced by new algorithms
17
An IP Router on NetFPGA Linux user-level Software processes Verilog on
Hardware Management & CLI Linux user-level processes Routing Protocols Exception Processing Routing Table Verilog on NetFPGA PCI board Forwarding Table Switching
18
NetFPGA Router Open-source FPGA hardware Open-souce Software
Function 4 Gigabit Ethernet ports Fully programmable FPGA hardware Low cost Open-source FPGA hardware Verilog base design Open-souce Software Drivers in C and C++
19
NetFPGA Platform Major Components Interfaces Memories FPGA Resources
4 Gigabit Ethernet Ports PCI Host Interface Memories 36Mbits Static RAM 512Mbits DDR2 Dynamic RAM FPGA Resources Block RAMs Configurable Logic Block (CLBs) Memory Mapped Registers
20
Packet Forwarding Table NetFPGA Router Hardware
NetFPGA System User Space Linux Kernel CAD Tools Monitor Software Web & Video Server Browser & Video Client Packet Forwarding Table PCI PCI-e VI VI VI VI NIC NetFPGA Router Hardware GE GE GE GE GE GE (nf2c0 .. 3) (eth1 .. 2)
21
NetFPGA Hardware
22
NetFPGA System Implementation
NetFPGA Blocks Virtex-2 Pro FPGA 4.5MB ZBT SRAM 64MB DDR2 DRAM PCI Host Interface 4 Gigabit Ethernet ports Intranet Test Ports Dual or Quad Gigabit Etherents on PCI-e Internet Gigabit Ethernet on Motherboard Processor Dual-Core CPU Operating System Linux CentOS 4.4
23
NetFPGA Lab Setup Dual NIC CPU x2 Net-FPGA (eth1 .. 2) Client Server
Eth2 : Server PCI-e GE (eth1 .. 2) Eth1 : Local host Server GE Net-FPGA GE Nf2c3 : Adj. Server NetFPGA Control SW PCI Internet Router Hardware Nf2c2 : Local Host GE GE Nf2c1 : Adjacent Nf2c0 : Adjacent CAD Tools GE
24
NetFPGA Hardware Set for Demo #1
CPU x2 NIC Video Server PCI-e PCI-e GE GE Net-FPGA GE PCI Internet Router Hardware GE GE Server delivers streaming HD video through a chain of NetFPGA Routers GE Net-FPGA GE Internet Router Hardware GE GE GE … CPU x2 NIC PCI-e GE GE Net-FPGA GE Video Display PCI Internet Router Hardware GE GE CAD Tools GE
25
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
26
Topology of NetFPGA Routers
Demo 1 Video Server HD Display
27
Setup for the Reference Router
Demo 1 Video Server Each NetFPGA card has four ports Port 2 connected to Client / Server Ports 0 and 3 connected to adjacent NetFPGA cards NetFPGA NetFPGA Video Client NetFPGA 27
28
Demo 1: Logical Topology
.1.1 .4.1 .7.1 .10.1 .13.1 .16.1 .1.2 .4.2 .7.2 .10.2 .13.2 .16.2 .3.1 .6.2 .9.2 .12.2 .15.2 .17.1 .2.1 .3.2 .6.1 .9.1 .12.1 .15.1 .30.2 .5.1 .8.1 .11.1 .14.1 .18.1 .30.1 .26.1 .23.1 .18.2 .27.2 .24.2 .21.2 .20.1 Explain the reason why we chose this toplogy Explain how we will run the video. Will it be projected on the screen or will we ask users to do it themselves? .29.1 .27.1 .24.1 .21.1 .28.2 .25.2 .22.2 .19.2 .28.1 .25.1 .22.1 .19.1 Video Server Video Client Shortest Path 28 28
29
Working IP Router 29 Objectives
Demo 1 Objectives Become familiar with Stanford Reference Router Observe PW-OSPF re-routing traffic around a failure 29
30
Streaming Video through the NetFPGA
Demo 1 Video server Source files /var/www/html/video Network URL : Video client Windows Media Player Linux mplayer Video traffic MPEG2 HDTV (35 Mbps) MPEG2 TV (9 Mbps) DVI (3 Mbps) WMF (1.7 Mbps)
31
Step 1 – Observe the Routing Tables
Demo 1 The router is already configured and running on your machines The routing table has converged to the routing decisions with minimum number of hops Next, break a link … 31
32
Step 2 - Dynamic Re-routing
Demo 1 Break the link between video server and video client Routers re-route traffic around the broken link and video continues playing .1.1 .4.1 .7.1 .10.1 .13.1 .16.1 .1.2 .4.2 .7.2 .10.2 .13.2 .16.2 .3.1 .6.2 .9.2 .12.2 .15.2 .2.1 .3.2 .6.1 .9.1 .12.1 .15.1 .17.1 .30.2 .5.1 .8.1 .11.1 .14.1 .18.1 .30.1 .26.1 .23.1 .18.2 .27.2 .24.2 .21.2 .29.1 .20.1 .27.1 .24.1 .21.1 .28.2 .25.2 .22.2 .19.2 .28.1 .25.1 .22.1 .19.1 32
33
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
34
Integrated Circuit Technology
Full-custom Design Complementary Metal Oxide Semiconductor (CMOS) Semi-custom ASIC Design Gate array Standard cell Programmable Logic Device Programmable Array Logic Field Programmable Gate Arrays Processors
35
Look-Up Tables Combinatorial logic is stored in Look-Up Tables (LUTs)
Also called Function Generators (FGs) Capacity is limited only by number of inputs, not complexity Delay through the LUT is constant A B C D Z 1 . Combinatorial Logic A B C D Z Diagram From: Xilinx, Inc
36
Xilinx CLB Structure Each slice has four outputs
Two registered outputs, two non-registered outputs Two BUFTs associated with each CLB, accessible by all 16 CLB outputs Carry logic run vertically Signals run upwards Two independent carry chains per CLB Slice 0 LUT Carry D Q CE PRE CLR LUT Carry D Q CE PRE CLR The major parts of a slice include two look-up tables (LUTs), two sequential elements, and carry logic. The LUTs are known as the F LUT and the G LUT. The sequential elements can be programmed to be either registers or latches. The next several slides cover the LUT, carry logic, and flip-flops in detail. Diagram From: Xilinx, Inc (Courtesy Jeff Weintraub)
37
Field Programmable Gate Arrays
CLB Primitive element of FPGA Routing Module Global routing Local interconnect Macro Blocks Block Memories Microprocessor I/O Block
38
NetFPGA Block Diagram
39
Details of NetFPGA Fits into Standard PCI slot
Standard Bus : 32 bits, 33 MHz Provides Interfaces for processing network packets 4 Gigabit Ethernet Ports Allows hardware-accelerated processing Implemented with Field Programmable Gate Array (FPGA) Logic
40
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
41
Hardware Description Languages
Concurrent By Default, Verilog statements evaluated concurrently Express fine grain parallelism Allows gate-level parallelism Provides Precise Description Eliminates ambiguity about operation Synthesizable Generates hardware from description
42
Verilog Data Types reg [7:0] A; // 8-bit register, MSB to LSB // (Preferred bit order for NetFPGA) reg [0:15] B; // 16-bit register, LSB to MSB B = {A[7:0],A[0:7]}; // Assignment of bits reg [31:0] Mem [0:1023]; // 1K Word Memory integer Count; // simple signed 32-bit integer integer K[1:64]; // an array of 64 integers time Start, Stop; // Two 64-bit time variables From: CSCI 320 Computer Architecture Handbook on Verilog HDL, by Dr. Daniel C. Hyde :
43
Signal Multiplexers Two input multiplexer (using if / else) reg y;
if (select) y = a; else y = b; Two input multiplexer (using ternary operator ?:) wire t = (select ? a : b); From:
44
Larger Multiplexers Three input multiplexer reg s;
begin case (select2) 2'b00: s = a; 2'b01: s = b; default: s = c; endcase end
45
Synchronous Storage Elements
Values change at times governed by clock Clock Din Dout Q D Clock Transition t=0 t=1 t=2 1 Clock time Clock Input to circuit Clock Event Example: Rising edge Din A B C t=0 Flip/Flop Transfers Value From Din to Dout on Clock event Clock Transition Dout A B S0 t=0
46
Finite State Machines
47
Synthesizable Verilog : Delay Flip/Flops
D-type flip flop reg q; (posedge clk) q <= d; D type flip flop with data enable reg q; (posedge clk) if (enable) q <= d; From:
48
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
49
Reference Router Pipeline
Exercise 1 MAC RxQ CPU Input Arbiter Output Port Lookup TxQ Output Queues Five stages Input Input Arbitration Routing Decision and packet modification Output Queuing Output Packet-based module interface Pluggable design
50
Make your own router Objectives: Execution Learn how to build hardware
Exercise 1 Objectives: Learn how to build hardware Run the software Explore router architecture Execution Start synthesis Rerun the GUI with the new hardware Test connectivity and statistics with pings Explore pipeline in the details page Explore detailed statistics in the details page
51
Step 1 - Build the hardware
Exercise 1 Start terminal, cd to “NF2/projects/tutorial_router/synth” Start synthesis with “make”
52
Step 2 - Run Homemade Router
Exercise 1 cd to “NF2/projects/tutorial_router/sw” Type: “tutorial_router_gui.pl” to use the just built router hardware The same interface should start again
53
Step 4 - Connectivity and Statistics
Exercise 1 Ping any addresses x.y where x is from 1-20 and y is 1 or 2 Open the statistics tab in the Quickstart window to see some statistics Explore more statistics in modules under the details tab
54
Step 5 - Explore Router Architecture
Exercise 1 Click the Details tab of the Quickstart window This is the reference router pipeline – a canonical, simple to understand, modular router pipeline 54
55
Step 6 - Explore Output Queues
Exercise 1 Click on the Output Queues module in the Details tab The page gives configuration details …and statistics
56
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
57
Buffer Requirements in a Router
Buffer size matters: Small queues reduce delay Large buffers are expensive Theoretical tools predict requirements Queuing theory Large deviation theory Mean field theory Yet, there is no direct answer. Flows have a closed-loop nature Question arises on whether focus should be on equilibrium state or transient state.. Having said that, one might think the buffer sizing problem must be very well understood. After all, we are equipped with tools like queueing theory, large deviations theory, and mean field theory which are focused on solving exactly this type of problem. You would think this is simply a matter of understanding the random process that describes the queue occupancy over time. Unfortunately, this is not the case. The closed-loop nature of the flows, and the fact that flows react to the state of the system makes it necessary to use control theoretic tools, but those tools emphasize on the equilibrium state of the system, and fail to describe transient delays.
58
Rule-of-thumb Universally applied rule-of-thumb: Context
Source Router Destination C 2T Universally applied rule-of-thumb: A router needs a buffer size: 2T is the two-way propagation delay (or just 250ms) C is capacity of bottleneck link Context Mandated in backbone and edge routers. Appears in RFPs and IETF architectural guidelines. Already known by inventors of TCP [Van Jacobson, 1988] Has major consequences for router design So if the problem is not easy, what do people do in practice? Buffer sizes in today’s Internet routers are set based on a rule-of-thumb which says If we want the core routers to have 100% utilization, The buffer size should be greater than or equal to 2TxC Here 2T is the two way propagation delay of packets going through the router And C is the capacity of the target link. This rule is mandated in backbone and edge routers, and Appears in RFPs and IETF architectural guidelines. It has been known almost since the time TCP was invented. Note that if the capacity of the network is increased, based on this rule, we need to increase the buffer size linearly with capacity. We don’t expect the propagation delay changed that much over time, but we expect the capacity to grow very rapidly, Therefore, this rule can have major consequences in router design, and that’s exactly why today’s routers have so much buffering as I showed you a few moments ago.
59
The Story So Far 10,000 20 1,000,000 # packets at 10Gb/s
After this relatively long introduction, let me give an overview of the rest of my presentation. I'll talk about three different rules for sizing router buffers. The first rule is the rule-of-thumb which I just described. As I mentioned, this rule is based on the assumption that we want to have 100% link utilization at the core links. The second rule is a more recent result proposed by Appenzeller, Keslassy, and McKeown which basically challenges the original rule-of-thumb. Based on this rule if we have N flows going through the router, we can reduce the buffer size by a factor of sqrt(N) The underlying assumption is that we have a large number of flows, and the flows are desynchronized. Finally, the third rule which I’ll talk about today, says that If we are willing to sacrifice a very small amount of throughput, i.e. if having a throughput less than 100% is acceptable, We might be able to reduce the buffer sizes significantly to just O(log(W)) packets. Here W is the maximum congestion window size. If we apply each of these rules to a 10Gb/s link We will need to buffer 1,000,000 packets based on the first rule, About 10,000 packets based on the 2nd one, And only 20 packets based on the 3rd rule. For the rest of this presentation I’ll show you the intuition behind each of these rules; and Will provide some evidence that validates the rule. Let’s start with the rule-of-thumb. Assume: Large number of desynchronized flows; 100% utilization Assume: Large number of desynchronized flows; <100% utilization
60
Using NetFPGA to explore buffer size
Need to reduce buffer size and measure occupancy Alas, not possible in commercial routers So, we will use NetFPGA instead Objective: Use NetFPGA to understand how large a buffer we need for a single TCP flow.
61
Why 2TxC for a single TCP Flow?
Rule for adjusting W If an ACK is received: W ← W+1/W If a packet is lost: W ← W/2 Only W packets may be outstanding
62
Time Evolution of a Single TCP Flow
Time evolution of a single TCP flow through a router. Buffer is 2T*C Time evolution of a single TCP flow through a router. Buffer is < 2T*C Here is a simulation of a single TCP flow with a buffer size equal to the bandwidth delay product. As you can see, the congestion window changes according to a sawtooth shape, and varies between 140, and 280. On the bottom graph we can see the queue occupancy. As you can see, when the congestion window is halved the buffer occupancy becomes zero, and the two curves change similarly. Note that since the pipe is full at all times, and the link utilization remains at 100%. Now, on the other hand, when the buffer size is less than the bandwidth delay product, we can see that When the congestion window is halved, the queue occupancy goes to zero and Remains at zero for a while before the congestion window is increased again, and can fill up the pipe. During this time, we see a reduction in link utilization.
63
NetFPGA Hardware Set for Demo #2
… CPU x2 NIC PCI-e GE GE Net-FPGA GE Video Client PCI Internet Router Hardware Server delivers streaming HD video to adjacent client GE GE GE CPU x2 NIC Video Server PCI-e PCI-e GE GE
64
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
65
Setup for the Demo 2 65 Each NetFPGA card has four ports
Port 2 connected to Local Host Port 3 connected to adjacent Server Adjacent Server Local Host NetFPGA 65
66
Topology for Second Demonstration
Routers connected point-to-point topology Port 3 connects to local host Port 1 connects to adjacent neighbor Ports 0 and 2 unused .2.1 .5.1 .8.1 .11.1 .4.1 .7.1 .10.1 .13.1 .1.1 .1.2 .4.2 .7.2 .10.2 .13.2 .2.2 .5.2 .8.2 .11.2 .29.1 .14.2 .29.2 .14.1 .26.2 .23.2 .20.2 .17.2 .28.2 .25.2 .22.2 .19.2 .16.2 .16.1 .28.1 .25.1 .22.1 .19.1 .26.1 .23.1 .20.1 .17.1
67
Enhanced Router Objectives Execution Observe router with new modules
Demo 2 Objectives Observe router with new modules New modules: rate limiting, delay, event capture Execution Run event capture router Look at routing tables Explore details pane Start tcp transfer, look at queue occupancy Change rate/delay, look at queue occupancy
68
Step 1 - Run Pre-made Enhanced Router
Demo 2 Start terminal and cd to “NF2/projects/tutorial_router/sw/” Type “./tut_adv_router_gui.pl” A familiar GUI should start
69
Step 3 - Explore Enhanced Router
Demo 2 Click on the Details tab A similar Pipeline to the one seen previously shows with some additions
70
Enhanced Router Pipeline
Demo 2 MAC RxQ CPU Input Arbiter Output Port Lookup TxQ Output Queues Rate Limiter Event Capture Two modules added Event Capture to capture output queue events (writes, reads, drops) Rate Limiter to create a bottleneck
71
Step 4 - Decrease the Link Rate
Demo 2 To create bottleneck and show the TCP “sawtooth”, link-rate is decreased. In the Details tab click the “Rate Limit” module Check Enable Set link rate to 1.953Mbps 71
72
Step 5 – Decrease Queue Size
Demo 2 Go back to the Details Panel and click on “Output Queues”. Select the “Output Queue 2” tab. Change the output queues size in packets slider to 16
73
Step 6 - Start Event Capture
Demo 2 Click on the Event Capture module under the Details tab This should start the configuration page
74
Step 7 - Configure Event Capture
Demo 2 Check Send to local host to receive events on the local host Check Monitor Queue 2 to monitor output queue of MAC port1 Check “Enable Capture” to start Event capture
75
Step 8 - Start TCP Transfer
Demo 2 We will use iperf to run a large TCP transfer and look at queue evolution Start a terminal and cd to “NF2/projects/tutorial_router/sw” type “iperf.sh” 75
76
Step 9 - Look at Event Capture Results
Demo 2 Click on the Event Capture module under the details tab. The sawtooth pattern should now be visible.
77
Queue Occupancy Charts
78
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
79
NetFPGA in the Classroom
Stanford CS344: “Build an Internet Router” Courseware will be available later in 2007 Students work in teams of three (2 software, 1 hardware) Design and implement hardware and software in 8 weeks Software: CLI, PW-OSPF Show interoperability with other groups Add new features in remaining two weeks Firewall, NAT, DRR, Packet capture, Data generator, …
80
Networked FPGAs in Research
RCP: Congestion control New module for parsing and overwriting new packet New software to calculate explicit rates Packet Monitoring (ICSI) Network Shunt Deep Packet Inspection (FPX) TCP/IP Flow Reconstruction Regular Expression Matching Bloom Filters Ethane: Network security New switch (“managed flow-table”) deployed Buffer Sizing Reduce buffer size and measure effect on network performance. Need a way to set buffer size, and measure buffer occupancy.
81
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
82
Enhance Your Router Objectives Execution Add new modules to datapath
Exercise 2 Objectives Add new modules to datapath Synthesize and test router Execution Open user_datapath.v, uncomment delay/rate/event capture modules Synthesize After synthesis, test the new system.
83
An aside: xemacs Tips We will be modifying the Verilog source code
Slides show xemacs, but vim also available. xemacs: To undo, use ctrl+shift+'-' To cancel a multi-keystroke command, just type ctrl+g To select lines, hold shift and press the arrow keys. To comment some selected lines, type ctrl+c+c To uncomment a commented block, move the cursor to one of the lines inside the commented block and type ctrl+c+u To save type ctrl+x+s To search, type ctrl+s search_pattern
84
Step 1 - Open the Source Exercise 2 We will modify the Verilog source code to add event capture, rate limiter, and delay modules We will simply comment and uncomment existing code Open terminal Type “xemacs NF2/projects/tutorial_router/src/user_data_path.v
85
Step 2 - Add wires Now we need to add wires to connect the new modules
Exercise 2 Now we need to add wires to connect the new modules Search for “new wires” (ctrl+s new wires) then press Enter Uncomment the wires (ctrl+c+u)
86
Step 3 - Connect Event Capture
Exercise 2 Search for opl_output (ctrl+s opl_output) then press Enter Comment the four lines above (up, shift + up + up + up + up, ctrl+c+c) Uncomment the block below to connect the outputs (ctrl+s opl_out, ctrl+c+u)
87
Step 4 - Add the Event Capture Module
Exercise 2 Search for evt_capture_top (ctrl+s evt_capture_top) then press Enter Uncomment the block (ctrl+c+u)
88
Step 5 - Connect the Output Queue to the Rate Limiter
Exercise 2 Search for port_outputs (ctrl+s ports_outputs, Enter) Comment the 4 lines above (select the four lines by using shift+arrow keys, then type ctrl+c+c) Uncomment the commented block by scrolling down into the block and typing ctrl+c+u
89
Step 6 - Add Rate Limiter Exercise 2 Scroll down until you reach the next “Excluded” block Uncomment the block containing the rate limiter instantiations. (scroll into the block and type ctrl+c+u) Save (ctrl+x+s)
90
Step 7 - Build the hardware
Exercise 2 Start terminal, cd to “NF2/projects/tutorial_router/synth” Start synthesis with “make”
91
Tutorial Outline Background The Stanford Base Reference Router
Basics of an IP Router The NetFPGA Platform The Stanford Base Reference Router Demo1 : Reference Router running on the NetFPGA Inside the NetFPGA hardware Breakneck introduction to Verilog Exercise 1: Build your own Reference Router The Enhanced Reference Router Motivation: Understanding buffer size requirements in a router Demo 2: Observing and controlling the queue size Using NetFPGA for research and teaching Exercise 2: Enhancing the Reference Router The Life of a Packet Through the NetFPGA
92
Full System Components
PW-OSPF Java GUI Software Driver nf2c0 nf2c1 nf2c2 nf2c3 ioctl PCI Bus DMA Registers CPU RxQ CPU TxQ CPU RxQ CPU TxQ CPU RxQ CPU TxQ nf2_reg_grp CPU RxQ CPU TxQ NetFPGA user data path MAC TxQ MAC RxQ MAC TxQ MAC RxQ MAC TxQ MAC RxQ MAC TxQ MAC RxQ Ethernet
93
Life of a Packet through the hardware
port0 port2 y x IP packet
94
Router Stages Again MAC RxQ CPU Input Arbiter Output Port Lookup TxQ
Output Queues
95
Inter-module Communication
Using “Module Headers”: Ctrl Word (8 bits) Data Word (64 bits) x Module Hdr Contain information such as packet length, input port, output port, … … … y Last Module Hdr Eth Hdr IP Hdr … 0x10 Last word of packet
96
Inter-module Communication
Module i Module i+1 data ctrl wr rdy
97
Dst MAC = port 0, Ethertype = IP
MAC Rx Queue MAC Rx Queue IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Data
98
Dst MAC = port 0, Ethertype = IP
Rx Queue Rx Queue IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Data Pkt length, input port = 0 0xff
99
Input Arbiter Rx Q 7 Input Arbiter Pkt … Rx Q 1 Pkt Rx Q 0 Pkt
100
Output Port Lookup Output Port Lookup Data Pkt length, 0xff
IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 EthHdr: Dst MAC = 0 Src MAC = x, Ethertype = IP Data Pkt length, input port = 0 0xff
101
Output Port Lookup Output Port Lookup 5- Add output port module
1- Check input port matches Dst MAC Output Port Lookup 0x04 output port = 4 6- Modify MAC Dst and Src addresses 2- Check TTL, checksum 0xff Pkt length, input port = 0 EthHdr: Dst MAC = nextHop Src MAC = port 4, Ethertype = IP EthHdr: Dst MAC = 0 Src MAC = x, Ethertype = IP 3- Lookup next hop IP & output port (LPM) IP Hdr: IP Dst: , TTL: 63, Csum:0x3ac2 IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 7-Decrement TTL and update checksum 4- Lookup next hop MAC address (ARP) Data
102
Output Queues Output Queues OQ0 OQ4 Pkt OQ7
103
EthHdr: Dst MAC = nextHop Src MAC = port 4,
MAC Tx Queue MAC Tx Queue IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 IP Dst: , TTL: 63, Csum:0x3ac2 EthHdr: Dst MAC = nextHop Src MAC = port 4, Ethertype = IP Data Pkt length, input port = 0 0xff output port = 4 0x04
104
EthHdr: Dst MAC = nextHop Src MAC = port 4,
MAC Tx Queue MAC Tx Queue 0x04 output port = 4 0xff Pkt length, input port = 0 EthHdr: Dst MAC = nextHop Src MAC = port 4, Ethertype = IP IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 IP Hdr: IP Dst: , TTL: 63, Csum:0x3ac2 Data
105
Exception Packet Example: TTL = 0 or TTL = 1
Packet has to be sent to the CPU which will generate an ICMP packet as a response Difference starts at the Output Port lookup stage
106
Exception Packet Path Software PCI Bus PW-OSPF Java GUI Driver DMA
NetFPGA PW-OSPF Java GUI Driver CPU RxQ TxQ nf2_reg_grp user data path DMA Registers nf2c0 nf2c1 nf2c2 nf2c3 ioctl MAC Ethernet
107
Output Port Lookup Output Port Lookup
1- Check input port matches Dst MAC Output Port Lookup 0x04 output port = 1 2- Check TTL, checksum – EXCEPTION! 0xff Pkt length, input port = 0 EthHdr: Dst MAC = 0, Src MAC = x, Ethertype = IP IP Hdr: IP Dst: , TTL: 1, Csum:0x3ab4 3- Add output port module Data
108
Output Queues Output Queues OQ0 OQ1 OQ2 Pkt OQ7
109
CPU Tx Queue CPU Tx Queue Data IP Hdr:
IP Dst: , TTL: 64, Csum:0x3ab4 IP Dst: , TTL: 1, Csum:0x3ab4 EthHdr: Dst MAC = 0, Src MAC = x, Ethertype = IP Data Pkt length, input port = 0 0xff output port = 1 0x04
110
CPU Tx Queue CPU Tx Queue Data 0x04 output port = 1 0xff Pkt length,
input port = 0 EthHdr: Dst MAC = 0, Src MAC = x, Ethertype = IP IP Hdr: IP Dst: , TTL: 1, Csum:0x3ab4 Data
111
ICMP Packet For the ICMP packet, the packet arrives at the CPU Rx Queue from the PCI Bus Follows the same path as a packet from the MAC until the Output Port Lookup. The OPL module seeing the packet is from the CPU Rx Queue 1, sets the output port directly to 0. The packet then continues on the same path as the non-exception packet to the Output Queues and then MAC Tx queue 0.
112
ICMP Packet Path Software PCI Bus PW-OSPF Java GUI Driver DMA NetFPGA
CPU RxQ TxQ nf2_reg_grp user data path DMA Registers nf2c0 nf2c1 nf2c2 nf2c3 ioctl MAC Ethernet
113
NetFPGA-Host Interaction
Linux driver interfaces with hardware Packet interface via standard Linux network stack Register reads/writes via ioctl system call (with convenience wrapper functions) readReg(nf2device *dev, int address, unsigned *rd_data) writeReg(nf2device *dev, int address, unsigned *wr_data) eg: readReg(&nf2, OQ_NUM_PKTS_STORED_0, &val);
114
NetFPGA-Host Interaction
NetFPGA to host packet transfer 1. Packet arrives – forwarding table sends to CPU queue 2. Interrupt notifies driver of packet arrival 3. Driver sets up and initiates DMA transfer PCI Bus
115
NetFPGA-Host Interaction
NetFPGA to host packet transfer (cont) 4. NetFPGA transfers packet via DMA 5. Interrupt signals completion of DMA PCI Bus 6. Driver passes packet to network stack
116
NetFPGA-Host Interaction
Host to NetFPGA packet transfers 3. Interrupt signals completion of DMA 2. Driver sets up and initiates DMA transfer PCI Bus 1. Software sends packet via network sockets. Packet delivered to driver.
117
NetFPGA-Host Interaction
Register access 2. Driver performs PCI memory read/write PCI Bus 1. Software makes ioctl call on network socket. ioctl passed to driver.
118
NetFPGA-Host Interaction
Packet transfers shown using DMA interface Alternative: use programmed IO to transfer packets via register reads/writes slower but eliminates the need to deal with network sockets
119
Step 8 – Perfect the Router
Exercise 2 If interested, go back to “Demo 2: Step 1” after synthesis is done and redo the steps with your own router. You can also change the bandwidth and queue size settings to see how that effects the queue occupancy evolution. To run your router: 1- cd NF2/projects/tutorial_router/sw 2- type “./tut_adv_router_gui.pl --use_bin ../../../bitfiles/tutorial_router.bit”
120
We’re done! Congratulations!
121
Acknowledgements NetFPGA Team : January 2007
Jianying Luo, Glen Gibb, Nick McKeown, Greg Watson, Jim Weaver, Jad Naous, Ramanan Raghuraman, Paul Hartke, John Lockwood
122
Acknowledgements Support for the NetFPGA project has been provided by the following companies and institutions Disclaimer: Any opinions, findings, conclusions, or recommendations expressed in this material do not necessarily reflect the views of the National Science Foundation or of any other sponsors supporting this project.
123
Reference on the Web NetFPGA homepage
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.