Download presentation
Presentation is loading. Please wait.
1
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe, Jennifer Rexford
2
Virtual ROuters On the Move (VROOM) Key idea – Routers should be free to roam around Useful for many different applications – Simplify network maintenance – Simplify service deployment and evolution – Reduce power consumption –…–… Feasible in practice – No performance impact on data traffic – No visible impact on control-plane protocols 2
3
The Two Notions of “Router” The IP-layer logical functionality, and the physical equipment 3 Logical (IP layer) Physical
4
The Tight Coupling of Physical & Logical Root of many network-management challenges (and “point solutions”) 4 Logical (IP layer) Physical
5
VROOM: Breaking the Coupling Re-mapping the logical node to another physical node 5 Logical (IP layer) Physical VROOM enables this re-mapping of logical to physical through virtual router migration.
6
Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 6 A B VR-1
7
Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 7 A B VR-1
8
Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 8 A B VR-1
9
Case 2: Service Deployment & Evolution Move a (logical) router to more powerful hardware 9
10
Case 2: Service Deployment & Evolution VROOM guarantees seamless service to existing customers during the migration 10
11
Case 3: Power Savings 11 $ Hundreds of millions/year of electricity bills
12
Case 3: Power Savings 12 Contract and expand the physical network according to the traffic volume
13
Case 3: Power Savings 13 Contract and expand the physical network according to the traffic volume
14
Case 3: Power Savings 14 Contract and expand the physical network according to the traffic volume
15
Virtual Router Migration: the Challenges 15 1.Migrate an entire virtual router instance All control plane & data plane processes / states
16
Virtual Router Migration: the Challenges 16 1.Migrate an entire virtual router instance 2.Minimize disruption Data plane: millions of packets/second on a 10Gbps link Control plane: less strict (with routing message retrans.)
17
Virtual Router Migration: the Challenges 17 1.Migrating an entire virtual router instance 2.Minimize disruption 3.Link migration
18
Virtual Router Migration: the Challenges 18 1.Migrating an entire virtual router instance 2.Minimize disruption 3.Link migration
19
VROOM Architecture 19 Dynamic Interface Binding Data-Plane Hypervisor
20
Key idea: separate the migration of control and data planes 1.Migrate the control plane 2.Clone the data plane 3.Migrate the links 20 VROOM’s Migration Process
21
Leverage virtual server migration techniques Router image – Binaries, configuration files, etc. 21 Control-Plane Migration
22
Leverage virtual migration techniques Router image Memory – 1 st stage: iterative pre-copy – 2 nd stage: stall-and-copy (when the control plane is “frozen”) 22 Control-Plane Migration
23
Leverage virtual server migration techniques Router image Memory 23 Control-Plane Migration Physical router A Physical router B DP CP
24
Clone the data plane by repopulation – Enable migration across different data planes – Eliminate synchronization issue of control & data planes 24 Data-Plane Cloning Physical router A Physical router B CP DP-old DP-new
25
Data-plane cloning takes time – Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels 25 Remote Control Plane *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005. Physical router A Physical router B CP DP-old DP-new
26
Data-plane cloning takes time – Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels 26 Remote Control Plane *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005. Physical router A Physical router B CP DP-old DP-new
27
Data-plane cloning takes time – Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels 27 Remote Control Plane *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005. Physical router A Physical router B CP DP-old DP-new
28
At the end of data-plane cloning, both data planes are ready to forward traffic 28 Double Data Planes CP DP-old DP-new
29
With the double data planes, links can be migrated independently 29 Asynchronous Link Migration A CP DP-old DP-new B
30
Control plane: OpenVZ + Quagga Data plane: two prototypes – Software-based data plane (SD): Linux kernel – Hardware-based data plane (HD): NetFPGA Why two prototypes? – To validate the data-plane hypervisor design (e.g., migration between SD and HD) 30 Prototype Implementation
31
Performance of individual migration steps Impact on data traffic Impact on routing protocols Experiments on Emulab 31 Evaluation
32
Performance of individual migration steps Impact on data traffic Impact on routing protocols Experiments on Emulab 32 Evaluation
33
The diamond testbed 33 Impact on Data Traffic n0 n1 n2 n3 VR
34
SD router w/ separate migration bandwidth – Slight delay increase due to CPU contention HD router w/ separate migration bandwidth – No delay increase or packet loss 34 Impact on Data Traffic
35
The Abilene-topology testbed 35 Impact on Routing Protocols
36
Introduce LSA by flapping link VR2-VR3 – Miss at most one LSA – Get retransmission 5 seconds later (the default LSA retransmission timer) – Can use smaller LSA retransmission-interval (e.g., 1 second) 36 Core Router Migration: OSPF Only
37
Average control-plane downtime: 3.56 seconds – Performance lower bound OSPF and BGP adjacencies stay up Default timer values – OSPF hello interval: 10 seconds – BGP keep-alive interval: 60 seconds 37 Edge Router Migration: OSPF + BGP
38
Where To Migrate Physical constraints – Latency E.g, NYC to Washington D.C.: 2 msec – Link capacity Enough remaining capacity for extra traffic – Platform compatibility Routers from different vendors – Router capability E.g., number of access control lists (ACLs) supported The constraints simplify the placement problem 38
39
Conclusions & Future Work VROOM: a useful network-management primitive – Separate tight coupling between physical and logical – Simplify network management, enable new applications – No data-plane and control-plane disruption Future work – Migration scheduling as an optimization problem – Other applications of router migration Handle unplanned failures Traffic engineering 39
40
Thanks! Questions & Comments? yiwang@cs.princeton.edu 40
41
Packet-aware Access Network 41
42
Packet-aware Access Network Pseudo-wires (virtual circuits) from CE to PE 42 PE CE P/G-MSS: Packet-aware/Gateway Multi-Service Switch MSE: Multi-Service Edge
43
Events During Migration Network failure during migration – The old VR image is not deleted until the migration is confirmed successful Routing messages arrive during the migration of the control plane – BGP: TCP retransmission – OSPF: LSA retransmission 43
44
3.Migrate links affixed to the virtual routers Enabled by: programmable transport networks – Long-haul links are reconfigurable Layer 3 point-to-point links are multi-hop at layer 1/2 Flexible Transport Networks 44 Chicago New York Washington D.C. : Multi-service optical switch (e.g., Ciena CoreDirector) Programmable Transport Network
45
Requirements & Enabling Technologies 3.Migrate links affixed to the virtual routers Enabled by: programmable transport networks – Long-haul links are reconfigurable Layer 3 point-to-point links are multi-hop at layer 1/2 45 Chicago New York Washington D.C. : Multi-service optical switch (e.g., Ciena CoreDirector) Programmable Transport Network
46
Requirements & Enabling Technologies 4.Enable edge router migration Enabled by: packet-aware access networks – Access links are becoming inherently virtualized Customers connects to provider edge (PE) routers via pseudo-wires (virtual circuits) Physical interfaces on PE routers can be shared by multiple customers 46 Dedicated physical interface per customer Shared physical interface
47
With programmable transport networks, long-haul links are reconfigurable – IP-layer point-to-point links are multi-hop at transport layer VROOM leverages this capability in a new way to enable link migration Link Migration in Transport Networks 47
48
2. With packet-aware transport networks – Logical links share the same physical port Packet-aware access network (pseudo wires) Packet-aware IP transport network (tunnels) Link Migration in Flexible Transport Networks 48
49
49 The Out-of-box OpenVZ Approach Packets are forwarded inside each VE When a VE is being migrated, packets are dropped
50
50 Putting It Altogether: Realizing Migration 1. The migration program notifies shadowd about the completion of the control plane migration
51
51 Putting It Altogether: Realizing Migration 2. shadowd requests zebra to resend all the routes, and pushes them down to virtd
52
52 Putting It Altogether: Realizing Migration 3. virtd installs routes the new FIB, while continuing to update the old FIB
53
53 Putting It Altogether: Realizing Migration 4. virtd notifies the migration program to start link migration after finishing populating the new FIB 5. After link migration is completed, the migration program notifies virtd to stop updating the old FIB
54
Power Consumption of Routers VendorCiscoJuniper ModelCRS-1124167613T1600T640M320 Power (watt) 10,9204,2124,0009,1006,5003,150 A Synthetic large tier-1 ISP backbone 50 POPs (Point-of-Presence) 20 major POPs, each has: 6 backbone routers, 6 peering routers, 30 access routers 30 smaller POPs, each has: 6 access routers
55
55 Future Work Algorithms that solve the constrained optimization problems Control-plane hypervisor to enable cross-vendor migration
56
56 Performance of Migration Steps Memory copy time With different numbers of routes (dump file sizes)
57
57 Performance of Migration Steps FIB population time Grows linearly w.r.t. the number of route entries Installing a FIB entry into NetFPGA: 7.4 microseconds Installing a FIB entry into Linux kernel: 1.94 milliseconds FIB update time: time for virtd to install entries to FIB Total time: FIB update time + time for shadowd to send routes to virtd
58
58 The Importance of Separate Migration Bandwidth The dumbbell testbed 250k routes in the RIB
59
59 Separate Migration Bandwidth is Important Throughput of the migration traffic
60
60 Separate Migration Bandwidth is Important Delay increase of the data traffic
61
61 Separate Migration Bandwidth is Important Loss rate of the data traffic
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.