Download presentation
Presentation is loading. Please wait.
1
NetFPGA Spring Camp Day 1
Presented by: Andrew W. Moore with Marek Michalski, Neelakandan Manihatty-Bojan, Gianni Antichi Georgina Kalogeridou, Jong Hun Han, Noa Zilberman aided by Yury Audzevich, Dimosthenis Pediaditakis Poznan University of Technology May 20 – 24, 2013
2
Tutorial Outline Background The Base Reference Router
Introduction The NetFPGA Platform The Base Reference Router Motivation: Basic IP review Example: Reference Router running on the NetFPGA Infrastructure Tree Build System Scripts The Life of a Packet Through the NetFPGA Hardware Datapath Interface to software: Exceptions and Host I/O Implementation Module Template Write Crypto NIC using a static key Simulation and Debug Write and Run Simulations for Crypto NIC Concluding Remarks
3
Section I: Motivation
4
NetFPGA = Networked FPGA
A line-rate, flexible, open networking platform for teaching and research
5
NetFPGA consists of… Four elements: NetFPGA board
Tools + reference designs Contributed projects Community NetFPGA 1G Board NetFPGA 10G Board
6
NetFPGA-10G A major upgrade over the 1Gb/s predecessor
State-of-the-art technology
7
10 Gigabit Ethernet 4 SFP+ Cages AEL2005 PHY 10G Support 1G Support
Direct Attach Copper 10GBASE-R Optical Fiber 1G Support 1000BASE-T Copper 1000BASE-X Optical Fiber
8
Others QDRII-SRAM RLDRAM-II PCI Express x8 Expansion Slot 27MB
Storing routing tables, counters and statistics RLDRAM-II 288MB Packet Buffering PCI Express x8 PC Interface Expansion Slot
9
Xilinx Virtex 5 TX240T Optimized for ultra high-bandwidth applications
48 GTX Transceivers 4 hard Tri-mode Ethernet MACs 1 hard PCI Express Endpoint
10
NetFPGA Board Comparison
NetFPGA 1G NetFPGA 10G 4 x 1Gbps Ethernet Ports 4 x 10Gbps Ethernet Ports 4.5 MB ZBT SRAM 64 MB DDR2 SDRAM 27 MB QDRII-SRAM 288 MB RLDRAM-II PCI PCI Express x8 Virtex II-Pro 50 Virtex 5 TX240T
11
Beyond Hardware NetFPGA-10G Board Xilinx EDK based IDE
Reference designs with ARM AXI4 Software (embedded and PC) Public Repository Public Wiki GitHub, User Community MicroBlaze SW PC SW Xilinx EDK Reference Designs AXI4 IPs
12
NetFPGA board Networking Software running on a standard PC
PC with NetFPGA Networking Software running on a standard PC CPU Memory PCIe A hardware accelerator built with a Field Programmable Gate Array driving 10 Gigabit network links FPGA Memory 10GE
13
Tools + Reference Designs
Compile designs Verify designs Interact with hardware Reference designs: Router (HW) Switch (HW) Network Interface Card (HW) Router Kit (SW) SCONE (SW)
14
Contributed Projects Project Contributor OpenFlow switch
Stanford University IPv4 Router Pisa & Cambridge Uni. RLDRAM I/F Cambridge Univerity ARP Reply Encap & DCAP Learning CAM Switch More projects:
15
Community Wiki Documentation Encourage users to contribute Forums
User’s Guide “so you just got your first NetFPGA” Developer’s Guide “so you want to build a …” Encourage users to contribute Forums Support by users for users Active community - 10s-100s of posts/week Incubating the 10G mailing-list into a forum
16
International Community
Over 1,000 users, using 2,000 cards at 150 universities in 40 countries
17
NetFPGA’s Defining Characteristics
Line-Rate Processes back-to-back packets Without dropping packets At full rate of 10 Gigabit Ethernet Links Operating on packet headers For switching, routing, and firewall rules And packet payloads For content processing and intrusion prevention Open-source Hardware Similar to open-source software Full source code available BSD-Style License for 1G and LGPL 2.1 for 10G But harder, because Hardware modules must meeting timing Verilog & VHDL Components have more complex interfaces Hardware designers need high confidence in specification of modules
18
Test-Driven Design Regression tests Example: Internet Router
Have repeatable results Define the supported features Provide clear expectation on functionality Example: Internet Router Drops packets with bad IP checksum Performs Longest Prefix Matching on destination address Forwards IPv4 packets of length bytes Generates ICMP message for packets with TTL <= 1 Defines how packets with IP options or non IPv4 … and dozens more … Every feature is defined by a regression test
19
Who, How, Why Who uses the NetFPGA? How do they use the NetFPGA?
Teachers Students Researchers How do they use the NetFPGA? To run the Router Kit To build modular reference designs IPv4 router 4-port NIC Ethernet switch, … Why do they use the NetFPGA? To measure performance of Internet systems To prototype new networking systems
20
Spring Camp Objectives
Overall picture of NetFPGA How reference designs work How you can work on a project NetFPGA Design Flow Directory Structure, library modules and projects How to utilize contributed projects Interface/Registers How to verify a design (Simulation and Regression Tests) Things to do when you get stuck AND… You can build your own projects!
21
Section II: Network review
22
Internet Protocol (IP)
Data to be transmitted: Data … Data IP Hdr Data IP Hdr Data IP Hdr IP packets: … Ethernet Frames: Eth Hdr Data IP Hdr Eth Hdr Data IP Hdr Eth Hdr Data IP Hdr
23
Internet Protocol (IP)
Data … 16 32 4 1 Options (if any) Destination Address Source Address Header Checksum Protocol TTL Fragment Offset Flags Fragment ID Total Packet Length T.Service HLen Ver 20 bytes Data IP Hdr
24
Basic operation of an IP router
D E F R5 D R5 F R3 E D Next Hop Destination
25
Basic operation of an IP router
D E F R5
26
Forwarding tables 32 bits wide → ~ 4 billion unique address
IP address Naïve approach: One entry per address Entry Destination Port 1 2 ⋮ 232 12 ~ 4 billion entries Improved approach: Group entries to reduce table size Entry Destination Port 1 2 ⋮ 50 – – – 12
27
IP addresses as a line Your computer My computer Stanford Berkeley
Asia North America 232-1 All IP addresses Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default)
28
Longest Prefix Match (LPM)
Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) Matching entries: Stanford North America Everywhere Universities Continents Planet Most specific Data To: Stanford
29
Longest Prefix Match (LPM)
Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) Universities Continents Matching entries: North America Everywhere Planet Most specific Data To: Canada
30
Implementing Longest Prefix Match
Entry Destination Port 1 2 3 4 5 Stanford Berkeley North America Asia Everywhere (default) Searching Most specific FOUND Least specific
31
Basic components of an IP router
Management & CLI Routing Protocols Software Control Plane Routing Table Data Plane per-packet processing Forwarding Table Switching Queuing Hardware
32
IP router components in NetFPGA
Linux SCONE Management & CLI Management & CLI Routing Protocols OR Software Routing Protocols Routing Table Routing Table Router Kit Output Port Lookup Input Arbiter Output Queues Hardware Forwarding Table Switching Queuing
33
Section III: Example I
34
Operational IPv4 router
Java GUI Routing Table Protocols Management & CLI SCONE Software Control Plane Switching Forwarding Table Queuing Reference router Data Plane per-packet processing Hardware
35
Streaming video
36
Streaming video NetFPGA running reference router PC & NetFPGA
(NetFPGA in PC)
37
Video streaming over shortest path
Streaming video Video streaming over shortest path Video client Video server
38
Streaming video Video client Video server
39
Observing the routing tables
Columns: Subnet address Subnet mask Next hop IP Output ports
40
Example 1 (called exercise in the file store)
41
Review NetFPGA as IPv4 router: Reference hardware + SCONE software
Routing protocol discovers topology Demo: Ring topology Traffic flows over shortest path Broken link: automatically route around failure
42
Section III: Example II
43
Buffers in Routers Internal Contention Pipelining Congestion Rx Tx Rx
44
Buffer Sizing Story
45
Using NetFPGA to explore buffer size
Need to reduce buffer size and measure occupancy Alas, not possible in commercial routers So, we will use the NetFPGA instead Objective: Use the NetFPGA to understand how large a buffer we need for a single TCP flow.
46
Reference Router Pipeline
MAC RxQ CPU Input Arbiter Output Port Lookup TxQ Output Queues Five stages Input interfaces Input arbitration Routing decision and packet modification Output queuing Output interfaces Packet-based module interface Pluggable design
47
Extending the Reference Pipeline
MAC RxQ CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ MAC RxQ CPU RxQ Input Arbiter Output Port Lookup Event Capture Output Queues MAC TxQ CPU TxQ Rate Limiter MAC TxQ CPU TxQ MAC TxQ CPU TxQ MAC TxQ CPU TxQ
48
Enhanced Router Pipeline
MAC RxQ CPU Input Arbiter Output Port Lookup TxQ Output Queues Rate Limiter Event Capture Two modules added Event Capture to capture output queue events (writes, reads, drops) Rate Limiter to create a bottleneck
49
extended reference router
Topology for Exercise 2 Recall: NetFPGA host PC is life-support: power & control So: The host PC may physically route its traffic through the local NetFPGA NetFPGA running extended reference router nf2c2 nf2c1 eth2 eth1 PC & NetFPGA (NetFPGA in PC) Iperf Client Iperf Server
50
Example 2
51
Review NetFPGA as flexible platform:
Reference hardware + SCONE software new modules: event capture and rate-limiting Example 2: Client Router Server topology Observed router with new modules Started tcp transfer, look at queue occupancy Observed queue change in response to TCP ARQ
52
Section IV: Infrastructure
53
Infrastructure Tree structure NetFPGA package contents
Reusable Verilog modules Verification infrastructure Build infrastructure Utilities Software libraries
54
NetFPGA package contents
Projects: HW: router, switch, NIC, buffer sizing router SW: router kit, SCONE Reusable Verilog modules Verification infrastructure: simulate full board with PCI-express + physical interfaces run tests against hardware test data generation libraries (eg. packets) Build infrastructure Utilities: register I/O, packaging, … Software libraries
55
Tree Structure (1) projects (including reference designs)
NetFPGA-10G projects (including reference designs) contrib-projects (contributed user projects) lib (custom and reference IP Cores and software libraries) tools (scripts for running simulations etc.) docs (design documentations and user-guides)
56
Tree Structure (2) hw (hardware logic as IP cores)
lib hw (hardware logic as IP cores) std (reference cores) contrib (contributed cores) xilinx (Xilinx provided cores) sw (core specific software drivers/libraries) std (reference libraries) contrib (contributed libraries) xilinx (Xilinx provided libraries)
57
Tree Structure (3) bitfiles (FPGA executables) hw (EDK based project)
projects/reference_nic bitfiles (FPGA executables) hw (EDK based project) data (contains user constraint files) nf10 (contains files used to run various tools) pcores (contains local hardware peripherals) sw embedded (contains code for microblaze) host (contains code for host communication etc.)
58
Reusable logic (IP cores)
Category IP Core(s) I/O interfaces Ethernet MAC (1G/10G) MDIO PCI Express UART GPIO Output queues BRAM based Output port lookup OpenFlow Switch NIC 1G to 10G ports Learning switch (CAM based) NIC Memory interfaces SRAM Flash Miscellaneous PBS* to AXIS bridge FIFOs RBS** to AXI-LITE bridge * PBS – packet bus stream in NF1G ** RBS – register bus stream in NF1G
59
What is a PCore? A brief detour
60
What’s pcore? IP Core? P Core? Pcore? pcore? - all the same.
“Processor Core” in EDK HDL (Verilog/VHDL) + metadata files ≈ module Examples: 10G MAC SRAM Controller NIC Output port lookup
61
HDL (Verilog) All NetFPGA-10G pcores are AXI-compliant
AXI = Advanced eXtensible Interface Used in ARM-based embedded systems Standard interface to bridge pcores together AXI4/AXI4-Lite: Control and status interface AXI4-Stream: Data path interface Xilinx IPs and tool chains are migrating to AXI
62
Pcore metadata files Help EDK put pcores together
PAO: Peripheral Analyze Order Files that needs synthesizing MPD: Microprocessor Peripheral Definition Defines interface of the pcore (e.g. # of AXI’s) Consult Xilinx UG642 for details
63
Verification Infrastructure (1)
Simulation and Debugging built on industry standard Xilinx “iSim” simulator and “Scapy” Python scripts for stimuli construction and verification
64
Verification Infrastructure (2)
iSim a High Level Description (HDL) simulator performs functional and timing simulations for embedded, VHDL, Verilog and mixed designs Scapy a powerful interactive packet manipulation library for creating “test data” provides primitives for many standard packet formats allows addition of custom formats
65
Build Infrastructure (1)
Based on the design flow of: “Xilinx Embedded Development Kit” “ARM’s AXI4 interface specification”
66
Build Infrastructure (2)
Build/Synthesis collection of shared hardware peripherals cores (pcores) stitched together with “AXI4”, “Lite” and “Stream” buses bitfile generation and verification using Xilinx synthesis and implementation tools XST Translate, MAP, PAR BitGen
67
Build Infrastructure (3)
Register system collates and generates addresses for all the registers and memories in a project uses Xilinx EDK – libgen utility to generate “include” files for software applications libgen
68
Inter-Module Communication
Using AXI-4 Stream (Packets are moved as Stream) Module i Module i+1 TDATA TUSER TSTRB TLAST TVALID TREADY
69
AXI4-Stream AXI4-Stream Description TDATA Data Stream TSTRB
Strobe signal (i.e. byte enable) TVALID Valid Indication TREADY Flow control indication TLAST End of packet/burst indication TUSER Out of band metadata
70
Packet Format TLAST TUSER TSTRB TDATA X 0xFF…F Eth Hdr IP Hdr … 1
X 0xFF…F Eth Hdr IP Hdr … 1 0x0…1F Last word
71
TUSER Position Content [15:0] length of the packet in bytes [23:16]
source port: one-hot encoded [31:24] destination port: one-hot encoded [127:32] 6 user defined slots, 16bit each
72
TVALID/TREADY Signal timing
No waiting! Assert TREADY/TVALID whenever appropriate TVALID should not depend on TREADY TVALID The behavior of TREADY varies between modules output queues – when data is available (only when pkt is outbound) OPL – when input fifo is not full (almost always) TREADY
73
Byte ordering In compliance to AXI, NF 10G has a specific byte ordering 1st byte of the TDATA[7:0] 2nd byte of the TDATA[15:8]
74
Section V: Life of a Packet
75
Reference Router Pipeline
MAC RxQ CPU Input Arbiter Output Port Lookup Output Queues Five stages Input Input arbitration Routing decision and packet modification Output queuing Output Packet-based module interface Pluggable design
76
Full System Components
Software PCIe Bus NetFPGA PW-OSPF Java GUI Driver AXI Lite user data path DMA Registers nf0 nf1 nf2 nf3 ioctl MAC TxQ RxQ Ethernet CPU SCONE
77
Life of a Packet through the Hardware
port0 port2 y x IP packet
78
Dst MAC = port 0, Ethertype = IP
MAC Rx Queue MAC Rx Queue IP Hdr: IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Data
79
Rx Queue Rx Queue Payload IP Hdr:
IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Payload Length, Src port, Dst port, User defined TUSER TDATA
80
Input Arbiter Rx Q 4 Input Arbiter Pkt … Rx Q 1 Pkt Rx Q 0 Pkt
81
Output Port Lookup Output Port Lookup Payload IP Hdr:
IP Dst: , TTL: 64, Csum:0x3ab4 Eth Hdr: Dst MAC = port 0, Ethertype = IP Payload Length, Src port, Dst port, User defined
82
Output Port Lookup Output Port Lookup 5- Update output port in TUSER
1- Check input port matches Dst MAC Output Port Lookup 6- Modify MAC Dst and Src addresses 2- Check TTL, checksum IP Hdr: IP Dst: , TTL: 63, Csum:0x3ac2 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP Payload Length, Src port, Dst port, User defined 3- Lookup next hop IP & output port (LPM) 7-Decrement TTL and update checksum 4- Lookup next hop MAC address (ARP) TUSER TDATA
83
Output Queues Output Queues OQ0 OQ2 Pkt OQ4
84
MAC Tx Queue MAC Tx Queue Payload IP Hdr:
IP Dst: , TTL: 63, Csum:0x3ac2 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP Payload Length, Src port, Dst port, User defined
85
Length, Src port, Dst port, User defined
MAC Tx Queue MAC Tx Queue Length, Src port, Dst port, User defined IP Hdr: IP Dst: , TTL: 63, Csum:0x3ac2 Eth Hdr: Dst MAC= nextHop , Src MAC = port 4, Ethertype = IP Payload
86
Exception Packets Example: TTL = 0 or TTL = 1
Packet has to be sent to the CPU Host generates an ICMP packet response Difference starts at the Output Port Lookup
87
Exception Packet Path Software PCI Bus PW-OSPF Java GUI Driver DMA
NetFPGA PW-OSPF Java GUI Driver AXI Lite user data path DMA Registers nf0 nf1 nf2 nf3 ioctl MAC TxQ RxQ Ethernet CPU SCONE
88
Output Port Lookup Output Port Lookup
1- Check input port matches Dst MAC Output Port Lookup 2- Check TTL, checksum – EXCEPTION! IP Hdr: IP Dst: , TTL: 1, Csum:0x3ab4 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP Payload Length, Src port, Dst port, User defined 3- Add output port module
89
Output Queues Output Queues OQ0 OQ1 OQ2 Pkt OQ4
90
CPU Tx Queue CPU Tx Queue Payload IP Hdr:
IP Dst: , TTL: 1, Csum:0x3ab4 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP Payload Length, Src port, Dst port, User defined
91
Length, Src port, Dst port, User defined
CPU Tx Queue CPU Tx Queue Length, Src port, Dst port, User defined IP Hdr: IP Dst: , TTL: 1, Csum:0x3ab4 Eth Hdr: Dst MAC= 0 , Src MAC = port x, Ethertype = IP Payload
92
ICMP Packet Packet arrives at the CPU Rx Queue from the PCI-express Bus Same path as a packet from the MAC until it reaches the Output Port Lookup (OPL) The OPL module sees the packet is from the CPU Rx Queue 1 and sets the output port directly to 0 The packet continues on the same path as the non-exception packet to the Output Queues and then MAC Tx queue 0
93
ICMP Packet Path Software PCI Bus PW-OSPF Java GUI Driver DMA SCONE
NetFPGA PW-OSPF Java GUI Driver AXI Lite user data path DMA Registers nf0 nf1 nf2 nf3 ioctl MAC TxQ RxQ Ethernet CPU SCONE
94
NetFPGA-Host Interaction
Linux driver interfaces with hardware Packet interface via standard Linux network stack Register reads/writes via ioctl system call with wrapper functions: rdaxi(int address, unsigned *rd_data); wraxi(int address, unsigned *wr_data); eg: rdaxi(0x7d , &val); Named registers are flakey
95
NetFPGA-Host Interaction
NetFPGA to host packet transfer 1. Packet arrives – forwarding table sends to CPU queue 2. Interrupt notifies driver of packet arrival 3. Driver sets up and initiates DMA transfer PCI Bus
96
NetFPGA-Host Interaction
NetFPGA to host packet transfer (cont.) 4. NetFPGA transfers packet via DMA 5. Interrupt signals completion of DMA PCI Bus 6. Driver passes packet to network stack
97
NetFPGA-Host Interaction
Host to NetFPGA packet transfers 3. Interrupt signals completion of DMA 2. Driver sets up and initiates DMA transfer PCI Bus 1. Software sends packet via network sockets Packet delivered to driver
98
NetFPGA-Host Interaction
Register access 2. Driver performs PCIe memory read/write PCI Bus 1. Software makes ioctl call on network socket ioctl passed to driver
99
Section V: Contributed Projects
100
Running the Router Kit User-space development, 4x10GE line-rate forwarding
Usage #1 OSPF BGP My Protocol user kernel Routing Table CPU Memory PCI-Express “Mirror” FPGA Memory IPv4 Router Fwding Table Packet Buffer 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE
101
Enhancing Modular Reference Designs
Usage #2 Verilog, System Verilog, VHDL, Bluespec…. NetFPGA Driver PW-OSPF EDA Tools (Xilinx, Mentor, etc.) Design Simulate Synthesize Download CPU Memory Java GUI Front Panel (Extensible) PCI-Express In Q Mgmt IP Lookup L2 Parse L3 Out Q Verilog modules interconnected by FIFO interfaces 10GbE 10GbE FPGA 10GbE 10GbE 10GbE My Block 10GbE Memory 10GbE 10GbE
102
(10GE MAC is soft/replaceable)
Creating new systems Usage #3 Verilog, System Verilog, VHDL, Bluespec…. NetFPGA Driver My Design (10GE MAC is soft/replaceable) EDA Tools (Xilinx, Mentor, etc.) Design Simulate Synthesize Download CPU Memory PCI-Express 10GbE 10GbE FPGA 10GbE 10GbE 10GbE 10GbE Memory 10GbE 10GbE
103
NetThreads, NetThreads-RE, NetTM
Geoff Salmon Monia Ghobadi Yashar Ganjali Martin Labrecque Gregory Steffan ECE Dept. CS Dept. U. of Toronto Efficient multithreaded design Parallel threads deliver performance System Features System is easy to program in C Time to results is very short
104
Soft Processors in FPGAs
Ethernet MAC DDR controller Soft processors: processors in the FPGA fabric User uploads program to soft processor Easier to program software than hardware in the FPGA Could be customized at the instruction level
105
Example 1G Contributed Projects
Contributor OpenFlow switch Stanford University Packet generator NetFlow Probe Brno University NetThreads University of Toronto zFilter (Sp)router Ericsson Traffic Monitor University of Catania DFA UMass Lowell
106
10G Contributions Alongside reference projects we have… Project
Contributor Bluespec switch MIT/SRI International Traffic Monitor University of Pisa NF1G legacy on NF10G Uni Pisa & Uni Cambridge Simple/better DMA core Stanford RAMcloud project Your ideas You? ….
107
Future for NetFPGA 10G Non-reference Projects we know about
High precision capture card 4 10GbE ports, GPS input, 8ns resolution. Traffic generator 4 port 10GbE real-traffic generator OpenFlow native port (Verilog) to OVS
108
How might we use NetFPGA?
Build an accurate, fast, line-rate NetDummy/nistnet element A flexible home-grown monitoring card Evaluate new packet classifiers (and application classifiers, and other neat network apps….) Prototype a full line-rate next-generation Ethernet-type Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) Hardware supporting Virtual Routers Check that some brave new idea actually works e.g. Rate Control Protocol (RCP), Multipath TCP, toolkit for hardware hashing MOOSE implementation IP address anonymization SSL decoding “bump in the wire” Xen specialist nic computational co-processor Distributed computational co-processor IPv6 anything IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) Netflow v9 reference PSAMP reference IPFIX reference Different driver/buffer interfaces (e.g. PFRING) or “escalators” (from gridprobe) for faster network monitors Firewall reference GPS packet-timestamp things High-Speed Host Bus Adapter reference implementations Infiniband iSCSI Myranet Fiber Channel Smart Disk adapter (presuming a direct-disk interface) Software Defined Radio (SDR) directly on the FPGA (probably UWB only) Routing accelerator Hardware route-reflector Internet exchange route accelerator Hardware channel bonding reference implementation TCP sanitizer Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powermeters) DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for routers h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints for control) Prototype platform for NON-Ethernet or near-Ethernet MACs Optical LAN (no buffers) How might we use NetFPGA? Build an accurate, fast, line-rate NetDummy/nistnet element A flexible home-grown monitoring card Evaluate new packet classifiers (and application classifiers, and other neat network apps….) Prototype a full line-rate next-generation Ethernet-type Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) Hardware supporting Virtual Routers Check that some brave new idea actually works e.g. Rate Control Protocol (RCP), Multipath TCP, toolkit for hardware hashing MOOSE implementation IP address anonymization SSL decoding “bump in the wire” Xen specialist nic computational co-processor Distributed computational co-processor IPv6 anything IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) Netflow v9 reference PSAMP reference IPFIX reference Different driver/buffer interfaces (e.g. PFRING) or “escalators” (from gridprobe) for faster network monitors Firewall reference GPS packet-timestamp things High-Speed Host Bus Adapter reference implementations Infiniband iSCSI Myranet Fiber Channel Smart Disk adapter (presuming a direct-disk interface) Software Defined Radio (SDR) directly on the FPGA (probably UWB only) Routing accelerator Hardware route-reflector Internet exchange route accelerator Hardware channel bonding reference implementation TCP sanitizer Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powermeters) DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for routers h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints for control) Prototype platform for NON-Ethernet or near-Ethernet MACs Optical LAN (no buffers) Well I’m not sure about you but here is a list I created:
109
How might YOU use NetFPGA?
Build an accurate, fast, line-rate NetDummy/nistnet element A flexible home-grown monitoring card Evaluate new packet classifiers (and application classifiers, and other neat network apps….) Prototype a full line-rate next-generation Ethernet-type Trying any of Jon Crowcrofts’ ideas (Sourceless IP routing for example) Demonstrate the wonders of Metarouting in a different implementation (dedicated hardware) Provable hardware (using a C# implementation and kiwi with NetFPGA as target h/w) Hardware supporting Virtual Routers Check that some brave new idea actually works e.g. Rate Control Protocol (RCP), Multipath TCP, toolkit for hardware hashing MOOSE implementation IP address anonymization SSL decoding “bump in the wire” Xen specialist nic computational co-processor Distributed computational co-processor IPv6 anything IPv6 – IPv4 gateway (6in4, 4in6, 6over4, 4over6, ….) Netflow v9 reference PSAMP reference IPFIX reference Different driver/buffer interfaces (e.g. PFRING) or “escalators” (from gridprobe) for faster network monitors Firewall reference GPS packet-timestamp things High-Speed Host Bus Adapter reference implementations Infiniband iSCSI Myranet Fiber Channel Smart Disk adapter (presuming a direct-disk interface) Software Defined Radio (SDR) directly on the FPGA (probably UWB only) Routing accelerator Hardware route-reflector Internet exchange route accelerator Hardware channel bonding reference implementation TCP sanitizer Other protocol sanitizer (applications… UDP DCCP, etc.) Full and complete Crypto NIC IPSec endpoint/ VPN appliance VLAN reference implementation metarouting implementation virtual <pick-something> intelligent proxy application embargo-er Layer-4 gateway h/w gateway for VoIP/SIP/skype h/w gateway for video conference spaces security pattern/rules matching Anti-spoof traceback implementations (e.g. BBN stuff) IPtv multicast controller Intelligent IP-enabled device controller (e.g. IP cameras or IP powermeters) DES breaker platform for flexible NIC API evaluations snmp statistics reference implementation sflow (hp) reference implementation trajectory sampling (reference implementation) implementation of zeroconf/netconf configuration language for routers h/w openflow and (simple) NOX controller in one… Network RAID (multicast TCP with redundancy) inline compression hardware accelorator for TOR load-balancer openflow with (netflow, ACL, ….) reference NAT device active measurement kit network discovery tool passive performance measurement active sender control (e.g. performance feedback fed to endpoints for control) Prototype platform for NON-Ethernet or near-Ethernet MACs Optical LAN (no buffers)
110
Section VI: Example Project
111
Project: Cryptographic NIC
Implement a network interface card (NIC) that encrypts upon transmission and decrypts upon reception
112
Cryptography XOR function XOR written as: ^ ⊻ ⨁
XOR is commutative: (A ^ B) ^ C = A ^ (B ^ C) A B A ^ B 1 XORing a value with itself always yields 0
113
Cryptography (cont.) Simple cryptography: Explanation:
Generate a secret key Encrypt the message by XORing the message and key Decrypt the ciphertext by XORing with the key Explanation: (M ^ K) ^ K = M ^ (K ^ K) Commutativity = M ^ 0 A ^ A = 0 = M
114
Cryptography (cont.) Example: Message: 00111011 Key: 10110001
Message ^ Key ^ Key:
115
Cryptography (cont.) ⨁ Idea: Implement simple cryptography using XOR
32-bit key Encrypt every word in payload with key Note: XORing with a one-time pad of the same length of the message is secure/uncrackable. See: Header Payload ⨁ Key
116
Section VII: Implementation
117
Embedded Development Kit (1)
A development environment for building “embedded systems” NetFPGA-10G is largely built upon the “EDK”
118
Embedded Development Kit (2)
Xilinx EDK suite is composed of: Platform Studio (XPS), a top level integrated design tool for “hardware” synthesis , implementation and bitstream generation Software Development Kit (SDK), a development environment for “software application” running on embedded processors like Microblaze
119
Xilinx Platform Studio (1)
An XPS project consists of following: system.xmp top level XPS project file system.mhs microprocessor hardware specification file defines the embedded processor system, and includes bus architecture, peripherals, processor, and connectivity etc. system.ucf user constraint file defines constraints such as timing, area, IO placement etc.
120
Xilinx Platform Studio (2)
To invoke XPS design tool, run: xps <project_root>/hw/system.xmp This will open the project in the XPS graphical user interface open a new terminal cd ~/NetFPGA-10G-live/projects/crypto_nic source /opt/Xilinx/13.4/ISE_DS/settings64.sh xps hw/system.xmp
121
XPS Design Tool (1) Bus Assembly System Assembly IP Catalog
122
XPS Design Tool (2) IP Catalog: contains categorized list of all available peripheral cores Bus Assembly: shows connectivity of various modules over AXI bus System Assembly: provides a complete view of instantiated cores Cores can be added and deleted, in this view
123
XPS Design Tool (2) Port view Port view: provides listing of internal and external ports for each peripheral core
124
XPS Design Tool (3) Address view Address view: defines base and high address value for peripheral cores connected to AXI4 or AXI-LITE bus These values can be changed manually, here
125
Getting started with a new project (1)
Projects: Each design represented by a project Location: NetFPGA-10G-live/projects/<proj_name> ~/NetFPGA-10G-live/projects/crypto_nic Consists of: Verilog source Simulation tests Hardware tests Libraries Optional software
126
Getting started with a new project (2)
Normally: copy an existing project as the starting point Today: pre-created project Missing from pre-created project: Verilog files (with crypto implementation) Simulation tests Hardware tests Custom software
127
Getting started with a new project (3)
MAC RxQ MAC RxQ MAC RxQ MAC RxQ CPU RxQ Typically implement functionality in one or more modules under the top wrapper Input Arbiter Output Port Lookup Crypto Crypto module to encrypt and decrypt packets Output Queues MAC RxQ MAC RxQ MAC RxQ MAC RxQ CPU RxQ 127
128
Getting started with a new project (4)
Shared modules included from netfpga/lib/verilog Generic modules that are re-used in multiple projects Specify shared modules in project’s mhs file Local src modules override shared modules crypto_nic: Local Shared crypto.v Everything else
129
Getting started with a new project (5)
Create crypto.v from module_template.v: Rename project project_template to crypto Rename the local module_template.v to crypto.v Change the module name inside crypto.v (first non-comment line of the file) Add the crypto module to the mhs file Update mpd file Update pao file
130
Running XPS cd ~/NetFPGA-10G-live/projects/crypto_nic xps hw/system.xmp
131
Adding The Crypto Module
Double click the Crypto module in IP catalog It’s under “Project Local Pcores” Select 256 bit data width and click OK
132
Connect Crytpo module In “Bus Interfaces”, click Crypto’s S_AXIS and select Input Output_lookup’s M_AXIS Click Output_queue’s S_AXIS and select Crypto’s M_AXIS
133
Connect Clock and Reset
Click “Ports” Tab Clock -> clock_generator_0::CLKOUT1 Reset -> reset_0::Peripheral_aresetn
134
Implementing the Crypto Module (1)
What do we want to encrypt? IP payload only Plaintext IP header allows routing Content is hidden Encrypt bytes 35 onward Bytes 1-14 – Ethernet header Bytes – IPv4 header (assume no options) Remember AXI byte ordering For simplicity, assume all packets are IPv4 without options
135
Implementing the Crypto Module (2)
State machine (draw on next page): Module headers on each packet Datapath 256-bits wide 34 / 32 is not an integer! Inside the crypto module Registers in_fifo_empty in_fifo_rd_en data/ctrl valid ready S_AXIS M_AXIS
136
Example packet ... No encryption Partial encryption Full encryption
TUSER (128 bits) TDATA (256 bits) No encryption XXX ETH HDR IP Hdr Partial encryption IP Hdr IP Payload IP Payload ... Full encryption N IP Payload UPDATE if TIME.
137
Crypto Module State Diagram
Hint: We suggest 3 states Detect Packet’s Header
138
Implementing the Crypto Module (3)
Implement your state machine inside crypto.v Use a static key initially Suggested sequence of steps: Create a static key value Constants can be declared in the module with localparam: localparam MY_EXAMPLE = 32’h ; Implement your state machine without modifying the packet Update your state machine to modify the packet by XORing the key and the payload Use eight copies of the key to create a 256-bit value to XOR with data words
139
module_template.v (1) Module port declaration
module module_template #( parameter C_FAMILY = "virtex5", parameter C_M_AXIS_DATA_WIDTH=256, parameter C_S_AXIS_DATA_WIDTH=256, parameter C_M_AXIS_TUSER_WIDTH=128, parameter C_S_AXIS_TUSER_WIDTH=128, ... ) ( // Signals // Local assignments // Registers Module port declaration
140
module_template.v (2) // Modules fallthrough_small_fifo #( .WIDTH(C_S_AXIS_DATA_WIDTH+C_S_AXIS_TUSER_WIDTH +(C_S_AXIS_DATA_WIDTH / 8) +1), .MAX_DEPTH_BITS(2) ) input_fifo ( .din (), // Data in .wr_en (), // Write enable .rd_en (), // Read the next word .dout (), // Data out .full (), //FIFO full indication .nearly_full (), //Almost full indication .empty (), //FIFO empty indication .reset (), //Rest .clk () //Clock ); Packet data dumped in a FIFO. Allows some “decoupling” between input and output.
141
module_template.v (3) Registers section.
// -- AXILITE IPIF axi_lite_ipif_1bar #(. . .) axi_lite_ipif_inst ( ); // -- IPIF REGS ipif_regs #(. . .) ipif_regs_inst // Registers logic Registers section. Ignore for now – we’ll explore this later
142
module_template.v (4) // Logic begin // Default values out_wr_int = 0; in_fifo_rd_en = 0; if (!in_fifo_empty && out_rdy) begin out_wr_int = 1; in_fifo_rd_en = 1; end Combinational logic to read data from the FIFO. (Data is output to output ports.) You’ll want to add your state in this section.
143
More Verilog: Assignments 1
Continuous assignments appear outside processes blocks): assign foo = baz & bar; targets must be declared as wires always “happening” (ie, are concurrent)
144
More Verilog: Assignments 2
Non-blocking assignments appear inside processes blocks) use only in sequential (clocked) processes: clk) begin a <= b; b <= a; end occur in next delta (‘moment’ in simulation time) targets must be declared as regs never clock any process other than with a clock!
145
More Verilog: Assignments 3
Blocking assignments appear inside processes blocks) use only in combinatorial processes: (combinatorial processes are much like continuous assignments) begin a = b; b = a; end occur one after the other (as in sequential langs like C) targets must be declared as regs – even though not a register never use in sequential (clocked) processes!
146
More Verilog: Assignments 3
Blocking assignments appear inside processes blocks) use only in combinatorial processes: (combinatorial processes are much like continuous assignments) begin tmp = a; a = b; b = tmp; end occur one after the other (as in sequential langs like C) targets must be declared as regs – even though not a register never use in sequential (clocked) processes! unlike non-blocking, have to use a temporary signal
147
(hints) Never assign one signal from two processes:
clk) begin foo <= bar; end foo <= quux;
148
(hints) In combinatorial processes: always @(*) begin if (cond) begin
take great care to assign in all possible cases begin if (cond) begin foo = bar; end (latches ‹as opposed to flip-flops› are bad for timing closure)
149
(hints) In combinatorial processes: always @(*) begin if (cond) begin
take great care to assign in all possible cases begin if (cond) begin foo = bar; else foo = quux; end
150
(hints) In combinatorial processes: always @(*) begin foo = quux;
(or assign a default) begin foo = quux; if (cond) begin foo = bar; end
151
Section VIII: Simulation and Debug
152
Testing: Simulation Simulation allows testing without requiring lengthy synthesis process NetFPGA simulation environment allows: Send/receive packets Physical ports and CPU Read/write registers Verify results Simulations run in iSim
153
Testing: Simulation We will simulate the “crypto_nic” design under the “10G simulation framework” We will show you how to create simple packets using scapy transmit and reconcile packets sent over 10G Ethernet and PCIe interfaces
154
Testing: Simulation(2)
Run a simulation to verify changes: cd ~/NetFPGA-10G-live/projects/crypto_nic/hw make sim or make simgui make simgui opens isim *more on this later* Now we can implement the crypto functionality
155
cd ~/NetFPGA-10G-live/projects/crypto_nic/hw vim make_pkts.py
This will create eight simple IP packets for injecting into the system from each port including CPU
156
The destination in this code snippet is the DMA
2 Using the created packets we create the stimulus (input to the simulator) and the expected output from the simulator NB. line 97 deciphers the packets from the device To test without a key – edit ~/NetFPGA-10G-live/projects/crypto_nic/hw/Makefile and set the argument of –k to be 0x0
157
The destinations in this code snippet are the ports
3 Using the created packets we create the stimulus (input to the simulator) and the expected output from the simulator NB. line 97 deciphers the packets from the device To test without a key – edit ~/NetFPGA-10G-live/projects/crypto_nic/hw/Makefile and set the argument of –k to be 0x0
158
The results are in… 4 As expected, two packets are received from PCIe on each 10G interface Total of 32 packets, eight from each 10G interface, are received on the PCIe
159
Running simulation in iSim (1)
To invoke iSim for design simulation, run: cd <project_root>/hw/ make simgui This will open simulation in the iSim graphical user interface
160
Running simulation in iSim (2)
Objects panel Waveform window Instance panel Tcl console
161
Running simulation in iSim (3)
Instance panel: displays process and instance hierarchy Objects panel: displays simulation objects associated with the instance selected in the instance panel Waveform window: displays wave configuration consisting of signals and busses Tcl console: displays simulator generated messages and can executes Tcl commands
162
Simulation gone wild When make sim 1
source /opt/Xilinx/13.4/ISE_DS/settings64.sh 2 … XPS% make: *** No rule to make target `___xps/simgen.opt’, needed ….. Go fix the address in ….hw/system.mhs 3 ERROR:EDK – for point-2-point interface connector ‘nf10_crypto_0_m_axis’….. 3a Add the crypto module to the MHS file (edit system.mhs OR use the GUI) 3b Check the connectivity of the module (To each of the OPL and Output Queue modules) 4 ….Cannot find MPD for weird core name …. 5 ERROR: Simluator:778 – Static elaboration of top level Verilog design unit(s) in library work failed edit pao file (~/NetFPGA-10G-live/projects/crypto_nic/hw/pcores/nf10_crypto_v1_00_a/data/nf10_crypto_v2_1_0.pao) For simulation uncomment the first four lines (39-42) comment out the last line here (43) For synthesis uncomment lines 39,40,43 comment out lines 41 and 42 6 if sim finishes but complains that each test passes 2 packets but all tests FAIL – this means your static key is different between your code and your Makefile (argument to make_pkts.py) check the log (verilog hhhhhhhk
163
Simple Verilog error Lots of errors
When the file is empty…
164
Section IX: Conclusion
165
Acknowledgments NetFPGA Team at Stanford University (Past and Present): Nick McKeown, Glen Gibb, Jad Naous, David Erickson, G. Adam Covington, John W. Lockwood, Jianying Luo, Brandon Heller, Paul Hartke, Neda Beheshti, Sara Bolouki, James Zeng, Jonathan Ellithorpe, Sachidanandan Sambandan, Eric Lo NetFPGA Team at University of Cambridge (Past and Present): Andrew Moore, David Miller, Muhammad Shahbaz, Martin Zadnik Matthew Grosvenor, Gianni Antichi, Neelakandan Manihatty-Bojan, Georgina Kalogeridou, Jong Hun Han, Noa Zilberman All Community members (including but not limited to): Paul Rodman, Kumar Sanghvi, Wojciech A. Koszek, Yahsar Ganjali, Martin Labrecque, Jeff Shafer, Eric Keller , Tatsuya Yabe, Bilal Anwer, Yashar Ganjali, Martin Labrecque Kees Vissers, Michaela Blott, Shep Siegel
166
Acknowledgement Support for the NetFPGA-10G project has been provided by the following companies and institutions Disclaimer: Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of any sponsors supporting this project.
167
Thanks to our Sponsors:
Support for the NetFPGA project has been provided by the following companies and institutions Disclaimer: Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of the National Science Foundation or of any other sponsors supporting this project. This effort is also sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA C This material is approved for public release, distribution unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.