Control and forwarding plane separation on an open-source router

Slides:



Advertisements
Similar presentations
CST Computer Networks NAT CST 415 4/10/2017 CST Computer Networks.
Advertisements

CPSC Network Layer4-1 IP addresses: how to get one? Q: How does a host get IP address? r hard-coded by system admin in a file m Windows: control-panel->network->configuration-
OpenFlow overview Joint Techs Baton Rouge. Classic Ethernet Originally a true broadcast medium Each end-system network interface card (NIC) received every.
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—4-1 Implementing Inter-VLAN Routing Deploying Multilayer Switching with Cisco Express Forwarding.
An Overview of Software-Defined Network Presenter: Xitao Wen.
1 SMART Training S - Setup M - Measurement A - Analysis RT - ReporT.
OpenFlow Costin Raiciu Using slides from Brandon Heller and Nick McKeown.
March 2009IETF 74 - NSIS1 Implementation of Permission-Based Sending (PBS) NSLP: Network Traffic Authorization draft-hong-nsis-pbs-nslp-02 Se Gi Hong*,
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
An Overview of Software-Defined Network
Ethernet Frame PreambleDestination Address Source Address Length/ Type LLC/ Data Frame Check Sequence.
G Robert Grimm New York University Receiver Livelock.
Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.
Optical modules, WDM, routing and KTH/CSD master program 2009 Robert Olsson KTH/CSD.
An Overview of Software-Defined Network Presenter: Xitao Wen.
Paper Review Building a Robust Software-based Router Using Network Processors.
Chapter 4: Managing LAN Traffic
OpenFlow: Enabling Technology Transfer to Networking Industry Nikhil Handigol Nikhil Handigol Cisco Nerd.
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Oppenheimer.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
1 Liquid Software Larry Peterson Princeton University John Hartman University of Arizona
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Chapter 4, slide: 1 Chapter 4: Network Layer r Introduction r IP: Internet Protocol  IPv4 addressing  NAT  IPv6 r Routing algorithms  Link state 
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
Fast NetServ Data Path: OpenFlow integration Emanuele Maccherani Visitor PhD Student DIEI - University of Perugia, Italy IRT - Columbia University, USA.
Multimedia Wireless Networks: Technologies, Standards, and QoS Chapter 3. QoS Mechanisms TTM8100 Slides edited by Steinar Andresen.
Networks and Protocols CE Week 7b. Routing an Overview.
Introduction An introduction to the equipment and organization of the Internet Lab.
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
 RIP — A distance vector interior routing protocol  IGRP — The Cisco distance vector interior routing protocol (not used nowadays)  OSPF — A link-state.
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
Atrium Router Project Proposal Subhas Mondal, Manoj Nair, Subhash Singh.
10Gbit/s Bi-Directional Routing on standard hardware running Linux 10Gbit/s Bi-Directional Routing on standard hardware running Linux by Jesper Dangaard.
Bifrost och 10Gbit routing Software Freedom Day /Stockholm Robert Olsson Uppsala Universitet och KTH.
Open Source Routing KTH CSD Kick-Off Workshop Robert Olsson Uppsala University
Diagnostic Interface For Optical Transceivers 20 min presentation for coming experts Bernt fiber course Robert Olsson/2009.
Multiqueue & Linux Networking Robert Olsson UU/KTH.
Multiqueue Networking David S. Miller Red Hat Inc.
Status from Martin Josefsson Hardware level performance of Status from Martin Josefsson Hardware level performance of by Jesper Dangaard Brouer.
Open-source routing at 10Gb/s Olof Hagsand (KTH) Robert Olsson (Uppsala U) Bengt Görden (KTH) SNCNW May 2009 Project grants: Internetstiftelsen (IIS) Equipment:
Bifrost KTH/CSD course kick-off Fall 2010 Robert Olsson.
PC based software router
NFV Compute Acceleration APIs and Evaluation
NaNet Problem: lower communication latency and its fluctuations. How?
CIS 700-5: The Design and Implementation of Cloud Networks
KTH/CSD course kick-off Summer 2010 Robert Olsson
Software defined networking: Experimental research on QoS
Kernel/Hardware for bifrost
Routing and routing tables
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Reference Router on NetFPGA 1G
Towards 10Gb/s open-source routing
Open Source 10g Talk at KTH/Kista
Multi-PCIe socket network device
KTH/CSD course kick-off Spring 2010 Robert Olsson
KTH/CSD course kick-off Fall 2009 Robert Olsson
What’s “Inside” a Router?
The Stanford Clean Slate Program
Xen Network I/O Performance Analysis and Opportunities for Improvement
Design of a Diversified Router: November 2006 Demonstration Plans
Implementing an OpenFlow Switch on the NetFPGA platform
Reprogrammable packet processing pipeline
Network Processors for a 1 MHz Trigger-DAQ System
Reference Router on NetFPGA 1G
Chapter 5 Network Layer: The Control Plane
NetFPGA - an open network development platform
Chapter 4: outline 4.1 Overview of Network layer data plane
Presentation transcript:

Control and forwarding plane separation on an open-source router Linux Kongress 2010-09-23 in Nürnberg Robert Olsson, Uppsala University Olof Hagsand, KTH

More than 10 year in production at Uppsala University Stockholm ISP/SUNET AS1653 UU- 1 UU- 2 Interneral UU-Net L- green L- red Full Internet routing via EBGP/IBGP 2 * XEON 5630 TYAN 7025 4 *10g ixgbe sfp+ LR/SR AS 2834 IPv4/IPv6 Local BGP peering In Uppsala OSPF DMZ Now at 10g towards ISP, SFP+ 850 nm, 1310nm

Motivation Separate control-plane from forwarding plane A la IETF FORCES Control-plane: sshd, bgp, stats, etc on CPU core 0 Forwarding-plane: Bulk forwarding on core1,..,coreN This leads to robustness of service against overload and DOS attacks, etc Enabled by: multi-core CPUs NIC hw classifiers Fast Buses (QPI/PCI-E gen2)

Control-plane separation on a multi-core Control-Element CE (core0) Control traffic FE1(core1) Incoming traffic Outgoing traffic Classifier FE2(core2) Forwarding-Elements Forwarding traffic ... FEN(coreN) Router

Hi-End Hardware XEON 2 x E5630 TYAN S7025 Motherboard Intel 82599

Block hardware structure CPU0 Quad-core CPU1 Quad-core DDR3 DDR3 QPI DDR3 DDR3 DDR3 DDR3 QPI QPI PCI-E Gen.2 x16 IOH Tylersburg IOH Tylersburg PCI-E Gen.2 x16 QPI PCI-E Gen.2 x16 PCI-E Gen.2 x16 PCI-E Gen.2 x4 PCI-E Gen.2 x8 ESI More I/O devices

Hi-End Hardware/Latency

Hardware - NIC Intel 10g board Chipset 82599 with SFP+ Open chip specs. Thanks Intel!

Classification in the Intel 82599 The classification in the Intel 82599 consists of several steps, each is programmable. This includes: - RSS (Receiver-side scaling): hashing of headers and load-balancing - N-tuples: explicit packet header matches - Flow-director: implicit matching of individual flows.

Routing daemons Packet forwarding is done in Linux kernel Routing protocols is run in user-space daemons Currently tested versions of quagga Bgp, OSPF both IPv4, iPv6 Cisco API

Experiment 1: flow separation external source - Bulk forwarding data from source to sink (10Gb/s mixed packet lengths): mixed flow and packet lengths - Netperf's TCP transactions emulated control data from a separate host - Study latency of TCP transactions source router sink host Bulk data TCP transactions

N-tuple or Flowdirector ethtool -K eth0 ntuple on ethtool -U eth0 flow-type tcp4 src-ip 0x0a0a0a01 src-ip-mask 0xFFFFFFFF dst-ip 0 dst-ip-mask 0 src-port 0 src-port-mask 0 dst-port 0 dst-port-mask 0 vlan 0 vlan-mask 0 user-def 0 user-def-mask 0 action 0 ethtool -u eth0 N-tuple is supported by SUN Niu and Intel ixgbe driver. Actions are: 1) queue 2) drop But we were lazy and patched ixgbe for ssh and BGP to use CPU0

N-tuple or Flowdirector Even more lazy... we found the flow-director was implicitly programmed by outgoing flows. So both incoming and outgoing would use the same queue. So if we set affinity for BGP, sshd etc we could avoid the N-tuple filters Example: taskset -c 0 /usr/bin/sshd Neat....

RSS is still using CPU0 So we both got our “selected traffic” Plus the bulk traffic from RSS We just want RSS to use “other” CPU's

Patching RSS Just a one-liner... diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index 1b1419c..08bbd85 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -2379,10 +2379,10 @@ static void ixgbe_configure_rx(struct ixgbe_adapter *adapter) mrqc = ixgbe_setup_mrqc(adapter); if (adapter->flags & IXGBE_FLAG_RSS_ENABLED) { - /* Fill out redirection table */ - for (i = 0, j = 0; i < 128; i++, j++) { + /* Fill out redirection table but skip index 0 */ + for (i = 0, j = 1; i < 128; i++, j++) { if (j == adapter->ring_feature[RING_F_RSS].indices) - j = 0; + j = 1; /* reta = 4-byte sliding window of * 0x00..(indices-1)(indices-1)00..etc. */ reta = (reta << 8) | (j * 0x11);

Patching RSS CPU- core 1 2 3 4 5 6 7 Number of packets 196830 200860 186922 191866 186876 190106 190412 No traffic to CPU core 0 still RSS gives fairness between other cores

Transaction Performance netperf TCP_RR On “router” taskset -c 0 netserver

Don't let forwarded packets program the flowdirector A new one-liner patch.... @@ -5555,6 +5555,11 @@ static void ixgbe_atr(struct ixgbe_adapter *adapter, struct sk_buff *skb, u32 src_ipv4_addr, dst_ipv4_addr; u8 l4type = 0; + if(!skb->sk) { + /* ignore nonlocal traffic */ + return; + } + /* check if we're UDP or TCP */ if (iph->protocol == IPPROTO_TCP) { th = tcp_hdr(skb);

Instrumenting the flow-director ethtool -S eth0 | grep fdir

Flow-director stats/1 fdir_maxlen: 0 fdir_maxhash: 0 fdir_free: 8191 fdir_coll: 0 fdir_match: 195 fdir_miss: 573632813 <--- Bulk forwarded data from RSS fdir_ustat_add: 1 <--- Old ssh session fdir_ustat_remove: 0 fdir_fstat_add: 6 fdir_fstat_remove: 0 ustat → user stats fstat → failed stats

Flow-director stats/2 fdir_free: 8190 fdir_coll: 0 fdir_match: 196 fdir_maxhash: 0 fdir_free: 8190 fdir_coll: 0 fdir_match: 196 fdir_miss: 630653401 fdir_ustat_add: 2 <--- New ssh session fdir_ustat_remove: 0 fdir_fstat_add: 6 fdir_fstat_remove: 0

Flow-director stats/3 fdir_maxhash: 0 fdir_free: 8190 fdir_coll: 0 fdir_maxlen: 0 fdir_maxhash: 0 fdir_free: 8190 fdir_coll: 0 fdir_match: 206 <--- ssh packets are matched fdir_miss: 645067311 fdir_ustat_add: 2 fdir_ustat_remove: 0 fdir_fstat_add: 6 fdir_fstat_remove: 0

Flow-director stats/4 fdir_maxlen: 0 fdir_maxhash: 0 fdir_free: 32768 <-- Now incresed 32k fdir_coll: 0 fdir_match: 0 fdir_miss: 196502463 fdir_ustat_add: 0 fdir_ustat_remove: 0 fdir_fstat_add: 0 fdir_fstat_remove: 0

Flow-director stats/5 fdir_maxlen: 0 fdir_maxhash: 0 fdir_free: 32764 fdir_coll: 0 fdir_match: 948 <-- netperf TCP_RR fdir_miss: 529004675 fdir_ustat_add: 4 fdir_ustat_remove: 0 fdir_fstat_add: 44 fdir_fstat_remove: 0

Transaction latency using flow separation

Experiment 1 results Baseline (no background traffic) gives 30000 transactions per second With background traffic using RSS over all cores gives increase in transaction latency reducing transactions per second to ~5000 The RSS patch (dont forward traffic on core 0) brings the transaction latency back to (almost) the same case as the baseline In all cases the control traffic is bound to core 0

Experiment 2: Flow separation in-line traffic - Inline control within bulk data (on same incoming interface) - Study latency of TCP transactions - Work in progress TCP transactions source router sink Bulk data

Results in-line Transaction latency wo/w RSS path Flow mix and 64 byte packets Zoom in of 64 byte packets

Classifier small packet problem Seems we drop a lot packets before they are classified DCB (Data Center Bridging) has a lot of features to prioritize different type of traffic. But only for IEEE 802.1Q VMDq2 suggested by Peter Waskiewicz Jr at Intel

Experiment 3: Transmit limits Investigate hardware limits by transmitting as much as possible from all cores simultaneously. CPU0 Quad-core CPU1 Quad-core DDR3 DDR3 QPI DDR3 DDR3 DDR3 DDR3 QPI QPI IOH Tylersburg IOH Tylersburg PCI-E Gen.2 x16 PCI-E Gen.2 x16 QPI PCI-E Gen.2 x16 PCI-E Gen.2 x16 PCI-E Gen.2 x4 PCI-E Gen.2 x8 ESI More I/O devices

pktgen/setup Inter-face eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 eth8 CPU-core 1 2 3 4 5 6 7 12 13 Mem node eth4, eth5 on x4 slot

Setup eth0 eth6 eth1 eth7 eth2 eth8 eth3 eth9 eth4 eth5 CPU0 CPU1 IOH Quad-core CPU1 Quad-core Memory node 0 Memory node1 QPI QPI QPI eth0 IOH Tylersburg IOH Tylersburg eth6 eth1 QPI eth7 eth2 eth8 eth3 eth9 eth4 eth5

TX w. 10 * 10g ports 93Gb/s “Optimal”

Conclusions We have shown traffic separation in a high-end multi-core PC with classifier NICs by assigning one CPU core as control and the other as forwarding cores. Our method: - Interrupt affinity to bind control traffic to core 0 - Modified RSS to spread forwarding traffic over all except core 0 - Modified the flow-director implementation slightly by only letting local (control) traffic populate the flowdir table. There are remaining issues with packet drops in in-line separation We have shown 93Gb/s simplex transmission bandwidth on a fully equipped PC platform

That's all Questions?

Rwanda example

Lagos next

Low-Power Development Some ideas Power consumption SuperMicro X7SPA @ 16.5 Volt with picoPSU Watt Test ------------------- 1.98 Power-Off 13.53 Idle 14.35 1 core 15.51 2 Core 15.84 3 Core 16.50 4 Core Routing Performance about 500.000 packet/sec in optimal setup.

Example herjulf.se 14 Watt by 55Ah battery bifrost/USB + lowpower disk

Running on battery

SuperCapacitors

DOM - Optical Monitoring Optical modules can support optical link monitoring RX, TX power, temperatuers, alarms etc Newly added support to Bifrost/Linux

DOM ethtool -D eth3 Int-Calbr: Avr RX-Power: RATE_SELECT: Wavelength: 1310 nm         Temp: 25.5 C         Vcc: 3.28 V         Tx-Bias: 20.5 mA         TX-pwr: -3.4 dBm ( 0.46 mW)         RX-pwr: -15.9 dBm ( 0.03 mW)