Washington WASHINGTON UNIVERSITY IN ST LOUIS Control Fred Kuhns Applied Research laboratory Department of Computer Science and Engineering Washington University in St. Louis
2 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Virtual Networking – Basic Concepts Substrate Links interconnect adjacent Substrate Routers Meta Links interconnect adjacent Meta Routers. Defined within substrate link context One or more Meta Router instances Substrate Router substrate links may be Tunneled within existing networks: IP, MPLS, etc.
3 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Adding a Node Install new substrate router Create substrate links between peers Instantiate meta router(s) Define meta-links between meta nodes (routers or hosts)
4 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 System Components General purpose processing engines (PE/GP). –Shared: PlanetLab VM environment. Local Planetlab node manager to configure and manager VMs –vserver, vnet may change to support substrate functions Implement substrate functions in kernel –rate control, mux/demux, substrate header processing –Dedicated: no local substrate functions May choose to implement substrate header processing and rate control. Substrate uses VLANs to ensure isolation (VLAN == MRid) Can use 802.1Q priorities to isolate traffic further. NP blades (PE/NP). –Shared: user supplies parse and header formatting code. –Dedicated: User has full access to and control over the hardware device General Meta-Processing Engine (MPE) notes: –Use loopback to enforce rate limits between dedicated MPEs –Legacy node modeled as dedicated MPE, use loopback blade to remove/add substrate headers. Substrate links: Interconnect substrate nodes –Meta-links defined within their context. –Assume an external entity configures end-to-end meta-nets and meta-links –Substrate links configured outside of the node manager’s context
5 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Switch Switch Blade Specs: –Promentum™ ATCA-2210 – –20-port 10GE fabric switch 14 10GE links to user slots 4 10GE links for external connections (up/cross links) on front panel –24-port 1GE Base switch 14 1GE links to users lots 1GE link to redundant switch blade 1 10GE and 4 1GE links for external connections (up/cross links) on front panel –Wire-speed L2 and L3 switching –4K IEEE 802.1Q VLANs –Etc… Traversing the Switch: –Switching is based on Ethernet Destination Address –Isolation is based on VLAN. One VLAN will be assigned to each MetaNet present on a Substrate Router. All switch traffic for a MetaNet will be required to use its assigned VLAN. –Frames from a MetaNet will only be transmitted to a port which is allowed to receive the specified VLAN.
6 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Packet Processing Key features –16 32 bit 1.4 GHz Micro-engines peak instruction rate >20 GIPs 8 hw contexts per processor support >50 i/byte (input & output) pipeline connections for streaming –four QDR SRAM interfaces and three RDRAM interfaces –high IO bandwidth (up to 20G) –Xscale control processor –encryption/decryption engine
7 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 System Architecture General purpose blades. –shared blades run Plab OS no change to current apps –also support dedicated blades –use separate blade server to preserve ATCA slots for NPs NP blades. –support dedicated PEs control from Vserver on PE/GP –shared PE options shared NP for fast path shared NP with plugins 10 GE fabric switch –VLANs used to isolate metarouters –uplinks for connecting to multiple chasses Good ratio of PEs to LC: 3:1 10 GE Switch Line Card Switch Blade PE/GPPE/NP... up to 10 1GE interfaces compute blade with disk Radisys 7010 Radisys 7010 with RTM 1 GE for control 10 Gb/s for data
8 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Block Diagram of a Meta-Router MPE k1 012 MPE k2 345 Meta Switch MPE k3 control Control/Management using Base channel (Control Net: IPv4) Meta Interfaces (MI): MI connected to meta-links Meta-Processing Engines (MPE): - virtual machine, COTS PC, NPU, FPGA - PEs differ in ease of “programming” and performance - MR may use one or more PEs, with possibly different types MPEs interconnected in data plane by a meta-switch. Packet includes Meta-Router and Meta-PE identifier Some Substrate detected errors or events reported to Meta-Router “control” MPE. The first Meta-Processing Engine (MPE) assigned to Meta-Network MNet k called MPE k1 Meta-Router 1G.5G 2G 1G.5G 3G.1G 3G.1G data path
9 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 2x1GE System Block Diagram …… Base Ethernet Switch (1Gbps, control) PE/NP NPU-A NPU-B xscale X PE/NPPE/GP 10 x 1GbE RTM LC NPU-A NPU-B xscale X TCAM … LC RTM PE/GP GbE interface PCI Loopback map VLAN X to VLAN Y Shelf manager I 2 C (IPMI) Node Server Node Manager user login accounts Fabric Ethernet Switch (10Gbps, data path) GP CPU 2x1GE
10 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 PE/GP (control, IPaddr) (platform, x86) (type, dedicated) … Top-Level View (exported) of the Node Node Server user login accounts Node Manager Substrate Control PE/NP (control, IPaddr) (platform, IXP2800) (type, IXP_DEDICATED) … PE/GP (control, IPaddr) (platform, x86) (type, linux_vserver) … PE/NP (control, IPaddr) (platform, IXP2800) (type, IXP_SHARED) … … … … Exported Node Resource List (Processing engines, Substrate Links) S-Link (type, p2p) (peer, XXX) (BW, XXGbps) … S-Link (type, p2p) (peer, _Desc_) (BW, XGbps) …
11 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Substrate: Enabling an MR LC Line card … 10GbE (fabric) Substrate 6 Host (located within node) loopback VLAN k Define Meta-Interface mappings local Meta-Router MR 1 for MNet k PE MPE k1 MPE k2 MNet k Data Plane MPE k3 MNet k Control and Management Plane Enable VLAN k on fabric switch ports 210 Update shared MPEs for MI and inter-MPE traffic Update host with local Net gateway Allocate data-plane MPEs Allocate control-plane MPE (required) Enable control over Base switch (IP-based) Use loopback to define interfaces internal to the system node. MI 2 MI 1 MNet k MI 4 MNet k MI 3 MNet k MI 0 MNet k
12 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 meta-router Block Diagram Meta-Interfaces are rate controlled … Each MR:MI pair is assigned its own rate controlled queue … Lookup table … map to Port, Meta Link pair … Lookup table … map to MR:MI Shared PE Dedicated PE … Shared PE/NP … Lookup table … map to Port Meta Link pair Fabric Switch 1 2 MR 5 MR 4 MR 1 MR Fabric Switch MR 3 … Lookup table … map to MR:MI MR 5 :MI 1 Line Card map received packet to MR and MI VMM “VM” manager meta-net5 control App-level service Shared PE/GP Base switch (control) Meta-net control and management functions (configure, stats, routing etc). Communicate with MR over separate base switch. VMM? Node M. Node Server ‘slice’/MN VMs? Internet
13 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Partitioning the Control plane Substrate manager –Initialization: discover system HW components and capabilities (blades, links etc) –Hides low level implementation details –Interacts with shelf manager for resetting boards or detecting failures. Node manager –Initialization: request system resource list –Operational: Allocate resources to meta-Networks (slice authorities?) –Request substrate to reset MPEs Substrate assumptions: –All MNets (slices) with a locally defined meta-router/service (sliver) have a control process to which it can send exception packets and event notifications. Communication: –out-of-band uses Base interface and internal IP addresses –in-band uses data plane and MPE id. Notifications: –ARP errors, Improperly formatted frame, Interface down/up, etc. –If meta-link is a pass-through link then the Node manager is responsible for handling meta-net level errors/event notification. For example link goes down.
14 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Initialization: Substrate Resource Discovery Creates list of devices and their Ethernet Addresses –Network Processor (NP) blades: Type: network-processor, Arch: ixp2800, Memory: 768MB (DRAM), Disk: 0, Rate: 5Gbps –General Processor (GP) blades: Type: linux-vserver, Arch: X, Memory: X, Disk: X, Rate: X –Line Card blades: not exposed to node manager, used to implement meta-interfaces another entity creates substrate links to interconnect peer substrate nodes. create table mapping line card blades, physical links and Ethernet addresses. Internal representation: –Substrate device ID: –If device has a local control daemon: –Type = Processing Engine (NP/GP):,,,, ??? –Type = Line Card, }>, ??? –Substrate Links,,, … Met-Link list,, …
15 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Initialization: Exported Resource Model List of available elements –Attributes of interest? Platform: IXP2800, PowerPC, ARM, x86; Memory: DRAM/SRAM; Disk: XGB; Bandwidth: 5Gbps; VM_Type: linux-vserver, IXP_Shared, IXP_Dedicated, G__Dedicated; Special: TCAM –network-processor: NP-Shared, NP-Dedicated –General purpose: GP-Shared (linux-vserver), GP-Dedicated –Each element is assigned an IP address for control (internal control LAN) List of available substrate links: –Access networks (expect Ethernet LAN interface): substrate link is multi- access Attributes: Access: multi-access, Available Bandwidth, Legacy protocol(s) (i.e. IP), Link protocol (i.e. Ethernet), Substrate ARP implementation. –Core interface: assume point-to-point, Bandwidth controlled Attributes: Access: Substrate; Bandwidth, Legacy protocol?
16 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Instantiate a router: Register MNet Substrate assumptions: –All MNets (slices) with a locally defined meta-router/service (sliver) will have defined a control process to which it can send exception packets and event notifications. Communication: out-of-band uses Base interface and internal IP addresses, in band uses data plane. ??? Notifications: ARP errors, Improperly formatted frame, Interface down/up, etc. –If meta-link is a pass-through link then the Node manager is responsible for handling errors/event notification. Node manager Actions: –Request binding of MNid k to allocated device (use SDid from initialization) Substrate enables VLAN k on applicable ports of the fabric switch –Allocate hardware resources (see following discussion for different scenarios) –If control module already instantiated then notify it of the MR location (IP address of control interface). –If creating control entity then register it with any line cards with meta-router interfaces (for exception traffic). ???
17 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Instantiate a router: Register Meta-Router (MR) Define MR specific Meta-Processing Engines (MPE): –Register MR ID MRid k with substrate substrate allocates VLAN k and binds to MRid k, –Request Meta-Processing Engines shared or dedicated, NP or GP, if shared then relative allocation (rspec) –shared: implies internal implementation has support for substrate functions –dedicated w/substrate: user implements substrate functions. –dedicated no/substrate: implies substrate will remove any substrate headers from data packets before delivering to MPE. For legacy systems. indicate of this MPE is to receive control events from substrate (Control_MPE). substrate returns MPE id (MPid) and control IP (MPip) address for each allocated MPE substrate internally records Ethernet address of MPE and enables VLAN on applicable port substrate assumes that any MPE may send data traffic to any other MPE –MPE specifies target MPE rather then MI when sending packet.
18 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Instantiate a router: Register Meta-Router (MR) Create meta-interfaces (with BW constraints) –create meta-interfaces associated with external substrate links request meta-interface id (MIid) be bound to substrate link x (SLx). –we need to work out the details of how a SL is specified We need to work out the details of who assigns inbound versus outbound meta-link identifiers (when they are used). If downstream node then the some entity (node manager?) reports the outgoing label. This node assigns the inbound label. multi-access substrate/meta link: node manager or meta-router control entity must configure meta-interface for ARP. Set local meta-address and send destination address with output data packet. substrate updates tables to bind MI to “receiving” MPE (i.e. were substrate sends received packets) –create meta-interfaces for delivery to internal devices (for example, legacy Planetlab nodes) create meta-interface associated with an MPE (i.e. the endsystem)
19 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Line Cards: Assumptions Initially use a simplified model –Core interfaces has point-to-point substrate links which correspond (physically or logically) to physical links. –LAN interfaces only support legacy IP traffic
20 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 Scenarios Shared PE/NP, send request to device controller on the XScale –Allocate memory for MR Control Block –Allocate microengine and load MR code for Parser and Header Formatter –Allocate meta-interfaces (output queues) and assign Bandwidth constraints Dedicated PE/NP –Notify device control daemon that it will be a dedicated device. May require loading/booting a different image? Shared GP –use existing/new PlanetLab framework Dedicated GP –legacy planetlab node –other
21 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 IPv4 Create the default IPv4 Meta-Router, initially in the non-forwarding state. –Register MetaNet: output Meta-Net ID = MNid –Instantiate IPv4 router: output Meta-Router ID = MRid Add interfaces for legacy IPv4 traffic: –Substrate supports defining a default protocol handler (Meta-Router) for non-substrate traffic. –for protocol=IPv4, send to IPv4 meta-router (specify the corresponding MPE).
22 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 General Control/Management Meta routers use Base channel to send requests to control entity on associated MPE devices Node manager sends requests to central substrate manager (xml-rpc?) –request to both configure, start/stop and tear down meta-routers (MPEs and MIs). Substrate enforces isolation and policies/monitors meta-router sending rates. –Rate exceeded error: If MPE violates rate limits then its interface is disabled and the control MPE is notified (over Base channel).. Shared NP –xscale daemon –requests: start/stop forwarding; Allocate shared memory for table; Get/set statistic counters; Set/alter MR control lock; Add/Remove lookup table entries. –Lookup entries can be added to send data packets to control MPE, packet header may contain tag to indicate reason packet was sent –mechanism for allocating space for MR specific code segments. dedicated NP –MPE controls XScale. When XScale boots a control daemon si told to load a specific image containing user code.
23 Washington WASHINGTON UNIVERSITY IN ST LOUIS Fred Kuhns - 10/11/2015 ARP for Access Networks The substrate offers an ARP service to meta-routers Meta-router responsibilities: –before enabling interface must register its meta-network address associated with meta-interface –send destination (next-hop) meta-net address with packets (part of substrate internal header). Substrate will use arp with this value. –if meta-router wants to use multicast or broadcast address then it mus also supply the Link layer destination address. So the substrate must also export the Link layer type. substrate responsibilities –all substrate nodes on an access network must agree on meta-net identifiers (MLIs) –Issues ARP requests/responses using supplied meta-net addresses and met-net id (MLI). –maintain ARP table and timeout entries according to relevant rfcs. –ARP Failed error: If ARP fails for a supplied address then substrate must send packet (or packet context) to control MPE of meta-router.