Rethinking Routers in the Age of Virtualization Jennifer Rexford Princeton University
Traditional View of a Router A big, physical device… – Processors – Multiple links – Switching fabric … that directs Internet traffic – Connects to other routers – Computes routes – Forwards packets
Times Are Changing
Backbone Links are Virtual Flexible underlying transport network – Layer-3 links are multi-hop paths at layer 2 4 Chicago New York Washington D.C.
Routing Separate From Forwarding Separation of functionality – Control plane: computes paths – Forwarding plane: forwards packets Switching Fabric Processor Line card data plane control plane
Multiple Virtual Routers Multiple virtual routers on same physical one – Virtual Private Networks (VPNs) – Router consolidation for smaller footprint Switching Fabric data plane control plane
Capitalizing on Virtualization Simplify network management – Hide planned changes in the physical topology Improve router reliability – Survive bugs in complex routing software Deploy new value-added services – Customized protocols in virtual networks Enable new network business models – Separate service providers from the infrastructure What should the router “hypervisor” look like?
VROOM: Virtual Routers On the Move With Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe
The Two Notions of “Router” IP-layer logical functionality, and physical equipment 9 Logical (IP layer) Physical
Tight Coupling of Physical & Logical Root of many network-management challenges (and “point solutions”) 10 Logical (IP layer) Physical
VROOM: Breaking the Coupling Re-mapping logical node to another physical node 11 Logical (IP layer) Physical VROOM enables this re-mapping of logical to physical through virtual router migration.
Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 12 A B VR-1
Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 13 A B VR-1
Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 14 A B VR-1
Case 2: Service Deployment/Evolution Move (logical) router to more powerful hardware 15
Case 2: Service Deployment/Evolution VROOM guarantees seamless service to existing customers during the migration 16
Case 3: Power Savings 17 $ Hundreds of millions/year of electricity bills
Case 3: Power Savings 18 Contract and expand the physical network according to the traffic volume
Case 3: Power Savings 19 Contract and expand the physical network according to the traffic volume
Case 3: Power Savings 20 Contract and expand the physical network according to the traffic volume
Virtual Router Migration: Challenges 21 1.Migrate an entire virtual router instance All control-plane processes & data-plane states
Virtual Router Migration: Challenges 22 1.Migrate an entire virtual router instance 2.Minimize disruption Data plane: millions of packets/sec on a 10Gbps link Control plane: less strict (with routing message retrans.)
Virtual Router Migration: Challenges 23 1.Migrating an entire virtual router instance 2.Minimize disruption 3.Link migration
Virtual Router Migration: Challenges 24 1.Migrating an entire virtual router instance 2.Minimize disruption 3.Link migration
VROOM Architecture 25 Dynamic Interface Binding Data-Plane Hypervisor
Key idea: separate the migration of control and data planes 1.Migrate the control plane 2.Clone the data plane 3.Migrate the links 26 VROOM’s Migration Process
Leverage virtual server migration techniques Router image – Binaries, configuration files, etc. 27 Control-Plane Migration
Leverage virtual server migration techniques Router image Memory – 1 st stage: iterative pre-copy – 2 nd stage: stall-and-copy (when the control plane is “frozen”) 28 Control-Plane Migration
Leverage virtual server migration techniques Router image Memory 29 Control-Plane Migration Physical router A Physical router B DP CP
Clone the data plane by repopulation – Enable migration across different data planes – Avoid copying duplicate information 30 Data-Plane Cloning Physical router A Physical router B CP DP-old DP-new
Data-plane cloning takes time – Installing 250k routes may take several seconds Control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels 31 Remote Control Plane Physical router A Physical router B CP DP-old DP-new
Data-plane cloning takes time – Installing 250k routes takes over 20 seconds Control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels 32 Remote Control Plane Physical router A Physical router B CP DP-old DP-new
Data-plane cloning takes time – Installing 250k routes takes over 20 seconds Control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels 33 Remote Control Plane Physical router A Physical router B CP DP-old DP-new
At the end of data-plane cloning, both data planes are ready to forward traffic 34 Double Data Planes CP DP-old DP-new
With the double data planes, links can be migrated independently 35 Asynchronous Link Migration A CP DP-old DP-new B
Virtualized operating system – OpenVZ, supports VM migration Routing protocols – Quagga software suite Packet forwarding – NetFPGA hardware Router hypervisor – Our extensions for repopulating data plane, remote control plane, double data planes, … 36 Prototype Implementation
Data plane: NetFPGA – No packet loss or extra delay Control plane: Quagga routing software – All routing-protocol adjacencies stay up – Core router migration (intradomain only) Inject an unplanned link failure at another router At most one retransmission of an OSPF message – Edge router migration (intra and interdomain) Control-plane downtime: 3.56 seconds Within reasonable keep-alive timer intervals 37 Experimental Results
Conclusions on VROOM Useful network-management primitive – Separate tight coupling between physical and logical – Simplify management, enable new applications Evaluation of prototype – No disruption in packet forwarding – No noticeable disruption in routing protocols Ongoing work – Migration scheduling as an optimization problem – Extensions to hypervisor for other applications 38
VERB: Virtually Eliminating Router Bugs With Eric Keller, Minlan Yu, and Matt Caesar
Router Bugs Are Important Routing software is complicated – Leads to programming errors (aka “bugs”) – Recent string of high-profile outages Bugs different from traditional failures – Byzantine failures, don’t simply crash the router – Violate protocol, and cause cascading outages The problem is getting worse – Software is getting more complicated – Other outages becoming less common – Vendors allowing third-party software
Exploit Software and Data Diversity Many sources of diversity – Diverse code (Quagga, XORP, BIRD) – Diverse protocols (OSPF and IS-IS) – Diverse environment (timing, ordering, memory) Reasonable overhead – Extra processor blade for hardware reliability – Multi-core processors, separate route servers, … Special properties of routing software – Clear interfaces to data plane and other routers – Limited dependence on past history
Handling Bugs at Run Time Diverse replication – Run multiple control planes in parallel – Vote on routing messages and forwarding table UPDATE VOTER FIB VOTER REPLICA MANAGER Hypervisor Forwarding Table (FIB) IF 1 IF 2 Protocol daemon RIB Protocol daemon RIB Protocol daemon RIB
UPDATE VOTER FIB VOTER REPLICA MANAGER Hypervisor Replicating Incoming Routing Messages FIB IF 1 IF /8 Update Protocol daemon RIB Protocol daemon RIB Protocol daemon RIB No need for protocol parsing – operates at socket level
UPDATE VOTER FIB VOTER REPLICA MANAGER Hypervisor Voting: Updates to Forwarding Table FIB IF 1 IF /8 IF /8 Update Protocol daemon RIB Protocol daemon RIB Protocol daemon RIB Transparent by intercepting calls to “Netlink”
UPDATE VOTER FIB VOTER REPLICA MANAGER Hypervisor Voting: Control-Plane Messages FIB IF 1 IF /8 IF /8 Update Protocol daemon RIB Protocol daemon RIB Protocol daemon RIB Transparent by intercepting socket system calls
Simple Voting and Recovery Tolerate transient periods of disagreement – During routing-protocol convergence (tens of sec) Several different voting mechanisms – Master-slave vs. wait-for-consensus Small, trusted software component – No parsing, treats data as opaque strings – Just 514 lines of code in our implementation Recovery – Kill faulty instance, and invoke a new one
Conclusion on Bug-Tolerant Router Seriousness of routing software bugs – Cause serious outages, misbehavior, vulnerability – Violate protocol semantics, so not handled by traditional failure detection and recovery Software and data diversity – Effective, and has reasonable overhead Design and prototype of bug-tolerant router – Works with Quagga, XORP, and BIRD software – Low overhead, and small trusted code base
Conclusions for the Talk Router virtualization is exciting – Enables wide variety of new networking techniques – … for network management & service deployment – … and even rethinking the Internet architecture Fascinating space of open questions – Other possible applications of router virtualization? – What is the right interface to router hardware? – What is the right programming environment for customized protocols on virtual networks?