VINI: Virtual Network Infrastructure Nick Feamster Georgia Tech Andy Bavier, Mark Huang, Larry Peterson, Jennifer Rexford Princeton University
VINI Overview Runs real routing software Exposes realistic network conditions Gives control over network events Carries traffic on behalf of real users Is shared among many experiments Simulation Emulation Small-scale experiment Live deployment ? VINI Bridge the gap between lab experiments and live experiments at scale.
Goal: Control and Realism Control –Reproduce results –Methodically change or relax constraints Realism –Long-running services attract real users –Connectivity to real Internet –Forward high traffic volumes (Gb/s) –Handle unexpected events Topology Actual network Arbitrary, emulated Traffic Real clients, servers Synthetic or traces Traffic Real clients, servers Synthetic or traces Network Events Observed in operational network Inject faults, anomalies
Overview VINI characteristics –Fixed, shared infrastructure –Flexible network topology –Expose/inject network events –External connectivity and routing adjacencies PL-VINI: prototype on PlanetLab Preliminary Experiments Ongoing work
Fixed Infrastructure
Shared Infrastructure
Arbitrary Virtual Topologies
Exposing and Injecting Failures
Carry Traffic for Real End Users s c
Participate in Internet Routing s c BGP
PL-VINI: Prototype on PlanetLab First experiment: Internet In A Slice –XORP open-source routing protocol suite (NSDI 05) –Click modular router (TOCS 00, SOSP 99) Clarify issues that VINI must address –Unmodified routing software on a virtual topology –Forwarding packets at line speed –Illusion of dedicated hardware –Injection of faults and other events
PL-VINI: Prototype on PlanetLab PlanetLab: testbed for planetary-scale services Simultaneous experiments in separate VMs –Each has root in its own VM, can customize Can reserve CPU, network capacity per VM Virtual Machine Monitor (VMM) (Linux++) Node Mgr Local Admin VM 1 VM 2 VM n … PlanetLab node
XORP: Control Plane BGP, OSPF, RIP, PIM- SM, IGMP/MLD Goal: run real routing protocols on virtual network topologies XORP (routing protocols)
User-Mode Linux: Environment Interface network PlanetLab limitation: –Slice cannot create new interfaces Run routing software in UML environment Create virtual network interfaces in UML XORP (routing protocols) UML eth1eth3eth2eth0
Click: Data Plane Performance –Avoid UML overhead –Move to kernel, FPGA Interfaces tunnels –Click UDP tunnels correspond to UML network interfaces Filters –Fail a link by blocking packets at tunnel XORP (routing protocols) UML eth1eth3eth2eth0 Click Packet Forward Engine Control Data UmlSwitch element Tunnel table Filters
Intra-domain Route Changes s c
Ping During Link Failure Link down Link up Routes converging
Close-Up of TCP Transfer Slow start Retransmit lost packet PL-VINI enables a user-space virtual network to behave like a real network on PlanetLab
Challenge: Attracting Real Users Could have run experiments on Emulab Goal: Operate our own virtual network –Carrying traffic for actual users –We can tinker with routing protocols Attracting real users
Conclusion VINI: Controlled, Realistic Experimentation Installing VINI nodes in NLR, Abilene Download and run Internet In A Slice
TCP Throughput Link down Link up Zoom in
Ongoing Work Improving realism –Exposing network failures and changes in the underlying topology –Participating in routing with neighboring networks Improving control –Better isolation –Experiment specification
Resource Isolation Issue: Forwarding packets in user space –PlanetLab sees heavy use –CPU load affects virtual network performance PropertyDepends OnSolution ThroughputCPU% receivedPlanetLab provides CPU reservations LatencyCPU scheduling delay PL-VINI: boost priority of packet forward process
Performance is bad User-space Click: ~200Mb/s forwarding
VINI should use Xen
Experimental Results Is a VINI feasible? –Click in user-space: 200Mb/s forwarded –Latency and jitter comparable between network and IIAS on PL-VINI. –Say something about running on just PlanetLab? Dont spend much time talking about CPU scheduling…
Low latency for everyone? PL-VINI provided IIAS with low latency by giving it high CPU scheduling priority
Internet In A Slice XORP Run OSPF Configure FIB Click FIB Tunnels Inject faults OpenVPN & NAT Connect clients and servers S C S C C S
PL-VINI / IIAS Router Blue: topology –Virtual net devices –Tunnels Red: routing and forwarding –Data traffic does not enter UML Green: enter & exit IIAS overlay UML XORP eth1eth3eth2 UmlSwitch element FIB Encapsulation table eth0 Control Data Click tap0
PL-VINI Summary Flexible Network Topology Virtual point-to-point connectivityTunnels in Click Unique interfaces per experimentVirtual network devices in UML Exposure of topology changesUpcalls of layer-3 alarms Flexible Routing and Forwarding Per-node forwarding tableSeparate Click per virtual node Per-node routing processSeparate XORP per virtual node Connectivity to External Hosts End-hosts can direct traffic through VINIConnect to OpenVPN server Return traffic flows through VININAT in Click on egress node Support for Simultaneous Experiments Isolation between experimentsPlanetLab VMs and network isolation CPU reservations and priorities Distinct external routing adjacenciesBGP multiplexer for external sessions
PL-VINI / IIAS Router XORP: control plane UML: environment –Virtual interfaces Click: data plane –Performance Avoid UML overhead Move to kernel, FPGA –Interfaces tunnels –Fail a link XORP (routing protocols) UML eth1eth3eth2eth0 Click Packet Forward Engine Control Data UmlSwitch element Tunnel table
33 Trellis Same abstractions as PL-VINI –Virtual hosts and links –Push performance, ease of use Full network-stack virtualization Run XORP, Quagga in a slice –Support data plane in kernel Approach native Linux kernel performance (15x PL-VINI) Be an early adopter of new Linux virtualization work kernel FIB virtual NIC application virtual NIC user kernel bridge shaper EGRE tunnel bridge shaper EGRE tunnel Trellis virtual host Trellis Substrate
34 Virtual Hosts Use container-based virtualization –Xen, VMWare: poor scalability, performance Option #1: Linux Vserver –Containers without network virtualization –PlanetLab slices share single IP address, port space Option #2: OpenVZ –Mature container-based approach –Roughly equivalent to Vserver –Has full network virtualization
35 Network Containers for Linux Create multiple copies of TCP/IP stack Per-network container –Kernel IPv4 and IPv6 routing table –Physical or virtual interfaces –Iptables, traffic shaping, sysctl.net variables Trellis: marry Vserver + NetNS –Be an early adopter of the new interfaces –Otherwise stay close to PlanetLab
36 Virtual Links: EGRE Tunnels Virtual Ethernet links Make minimal assumptions about the physical network between Trellis nodes Trellis: Tunnel Ethernet over GRE over IP –Already a standard, but no Linux implementation Other approaches: –VLANs, MPLS, other network circuits or tunnels –These fit into our framework kernel FIB virtual NIC application virtual NIC user kernel EGRE tunnel EGRE tunnel Trellis virtual host Trellis Substrate
37 Tunnel Termination Where is EGRE tunnel interface? Inside container: better performance Outside container: more flexibility –Transparently change implementation –Process, shape traffic btw container and tunnel –User cannot manipulate tunnel, shapers Trellis: terminate tunnel outside container
38 Glue: Bridging How to connect virtual hosts to tunnels? –Connecting two Ethernet interfaces Linux software bridge –Ethernet bridge semantics, create P2M links –Relatively poor performance Common-case: P2P links Trellis –Use Linux bridge for P2M links –Create new shortbridge for P2P links
39 Glue: Bridging How to connect virtual hosts to EGRE tunnels? –Two Ethernet interfaces Linux software bridge –Ethernet bridge semantics –Support P2M links –Relatively poor performance Common-case: P2P links Trellis: –Use Linux bridge for P2M links –New, optimized shortbridge module for P2P links kernel FIB virtual NIC application virtual NIC user kernel bridge* shaper EGRE tunnel bridge* shaper EGRE tunnel Trellis virtual host Trellis Substrate
40 IPv4 Packet Forwarding 2/3 of native performance, 10X faster than PL-VINI Forwarding rate (kpps)
41 Virtualized Data Plane in Hardware Software provides flexibility, but poor performance and often inadequate isolation Idea: Forward packets exclusively in hardware –Platform: OpenVZ over NetFPGA –Challenge: Share common functions, while isolating functions that are specific to each virtual network
42 Accelerating the Data Plane Virtual environments in OpenVZ Interface to NetFPGA based on Stanford reference router
43 Control Plane Virtual environments –Virtualize the control plane by running multiple virtual environments on the host (same as in Trellis) –Routing table updates pass through security daemon –Root user updates VMAC-VE table Hardware access control –VMAC-VE table/VE-ID controls access to hardware Control register –Used to multiplex VE to the appropriate hardware
44 Virtual Forwarding Table Mapping
45 Share Common Functions Common functions –Packet decoding –Calculating checksums –Decrementing TTLs –Input arbitration VE-Specific Functions –FIB –IP lookup table –ARP table
46 Forwarding Performance
47 Efficiency 53K Logic Cells 202 Units of Block RAM Sharing common elements saves up to 75% savings over independent physical routers.
48 Conclusion Virtualization allows physical hardware to be shared among many virtual networks Tradeoffs: sharing, performance, and isolation Two approaches –Trellis: Kernel-level packet forwarding (10x packet forwarding rate improvement vs. PL-VINI) –NetFPGA-based forwarding for virtual networks (same forwarding rate as NetFPGA-based router, with 75% improvement in hardware resource utilization)
Accessing Services in the Cloud 49 Cloud Data Center Data Center Router Interactive Service Bulk transfer Internet Routing updates Packets ISP1 ISP2 Hosted services have different requirements –Too slow for interactive service, or –Too costly for bulk transfer!
Cloud Routing Today Multiple upstream ISPs –Amazon EC2 has at least 58 routing peers in Virginia data center Data center router picks one route to a destination for all hosted services –Packets from all hosted applications use the same path 50
Route Control: Cloudless Solution Obtain connectivity to upstream ISPs –Physical connectivity –Contracts and routing sessions Obtain the Internet numbered resources from authorities Expensive and time-consuming! 51
Routing with Transit Portal (TP) 52 Cloud Data Center Interactive Service Bulk transfer Internet ISP1 ISP2 Virtual Router B Virtual Router A Transit Portal Routes Packets Full Internet route control to hosted cloud services!
Outline Motivation and Overview Connecting to the Transit Portal Advanced Transit Portal Applications Scaling the Transit Portal Future Work & Summary 53
Connecting to the TP Separate Internet router for each service –Virtual or physical routers Links between service router and TP –Each link emulates connection to upstream ISP Routing sessions to upstream ISPs –TP exposes standard BGP route control interface 54
Transit Portal Virtual BGP Router Basic Internet Routing with TP 55 Cloud client with two upstream ISPs –ISP 1 is preferred ISP 1 exhibits excessive jitter Cloud client reroutes through ISP 2 ISP 1 ISP 2 Interactive Cloud Service BGP Sessions Traffic
Current TP Deployment Server with custom routing software –4GB RAM, 2x2.66GHz Xeon cores Three active sites with upstream ISPs –Atlanta, Madison, and Princeton A number of active experiments –BGP poisoning (University of Washington) –IP Anycast (Princeton University) –Advanced Networking class (Georgia Tech) 56
TP Applications: Fast DNS Internet services require fast name resolution IP anycast for name resolution –DNS servers with the same IP address –IP address announced to ISPs in multiple locations –Internet routing converges to the closest server Available only to large organizations 57
TP Applications: Fast DNS ISP1 ISP2 ISP3 ISP4 Transit Portal AsiaNorth America Anycast Routes 58 Name Service TP allows hosted applications use IP anycast
TP Applications: Service Migration Internet services in geographically diverse data centers Operators migrate Internet users connections Two conventional methods: –DNS name re-mapping Slow –Virtual machine migration with local re-routing Requires globally routed network 59
TP Applications: Service Migration ISP1 ISP2 ISP3 ISP4 Transit Portal AsiaNorth America Tunneled Sessions 60 Active Game Service Internet
Scaling the Transit Portal Scale to dozens of sessions to ISPs and hundreds of sessions to hosted services At the same time: –Present each client with sessions that have an appearance of direct connectivity to an ISP –Prevented clients from abusing Internet routing protocols 61
Conventional BGP Routing Conventional BGP router: –Receives routing updates from peers –Propagates routing update about one path only –Selects one path to forward packets Scalable but not transparent or flexible 62 ISP1 ISP2 BGP Router Updates Client BGP Router Packets
Bulk Transfer Routing Process Scaling BGP Memory Use Store and propagate all BGP routes from ISPs –Separate routing tables Reduce memory consumption –Single routing process - shared data structures –Reduce memory use from 90MB/ISP to 60MB/ISP 63 ISP1 ISP2 Virtual Router Routing Table 1 Routing Table 2 Interactive Service
Bulk Transfer Routing Process Scaling BGP CPU Use Hundreds of routing sessions to clients –High CPU load Schedule and send routing updates in bundles –Reduces CPU from 18% to 6% for 500 client sessions 64 ISP1 ISP2 Virtual Router Routing Table 1 Routing Table 2 Interactive Service
Forwarding Table Scaling Forwarding Memory for TP Connecting clients –Tunneling and VLANs Curbing memory usage –Separate virtual routing tables with default to upstream –50MB/ISP -> ~0.1MB/ISP memory use in forwarding table 65 ISP1 ISP2 Virtual BGP Router Forwardin g Table 1 Forwardng Table 2 Bulk Transfer Interactive Service