Yi Wang, Eric Keller, Brian Biskeborn,

Slides:



Advertisements
Similar presentations
Power Saving. 2 Greening of the Internet Main idea: Reduce energy consumption in the network by turning off routers (and router components) when they.
Advertisements

All Rights Reserved © Alcatel-Lucent 2009 Enhancing Dynamic Cloud-based Services using Network Virtualization F. Hao, T.V. Lakshman, Sarit Mukherjee, H.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Deployment of MPLS VPN in Large ISP Networks
Live Migration of an Entire Network (and its Hosts) Eric Keller, Soudeh Ghorbani, Matthew Caesar, Jennifer Rexford HotNets 2012.
Slick: A control plane for middleboxes Bilal Anwer, Theophilus Benson, Dave Levin, Nick Feamster, Jennifer Rexford Supported by DARPA through the U.S.
© 2010 Cisco and/or its affiliates. All rights reserved. 1 Segment Routing Clarence Filsfils – Distinguished Engineer Christian Martin –
Seamless BGP Migration with Router Grafting Eric Keller, Jennifer Rexford Princeton University Kobus van der Merwe AT&T Research NSDI 2010.
Migrating and Grafting Routers to Accommodate Change Eric Keller Princeton University Jennifer Rexford, Jacobus van der Merwe, Yi Wang, and Brian Biskeborn.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
VROOM: Virtual ROuters On the Move Aditya Akella Based on slides from Yi Wang.
Grafting Routers to Accommodate Change Eric Keller Princeton University Oct12, 2010 Jennifer Rexford, Jacobus van der Merwe, Michael Schapira.
VROOM: Virtual ROuters On the Move
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, Y. Richard Yang Laboratory of Networked Systems Yale University.
Refactoring Router Software to Minimize Disruption Eric Keller Advisor: Jennifer Rexford Princeton University Final Public Oral - 8/26/2011.
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe (AT&T)
CSEE W4140 Networking Laboratory Lecture 4: IP Routing (RIP) Jong Yul Kim
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, and Y. Richard Yang Laboratory of Networked Systems Yale University February.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Green Networking Jennifer Rexford Computer Science Department Princeton University.
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Rethinking Routers in the Age of Virtualization Jennifer Rexford Princeton University
VROOM: Virtual ROuters On the Move Yi Wang (Princeton) With: Kobus van der Merwe (AT&T Labs - Research) Jennifer Rexford (Princeton)
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
1 | © 2015 Infinera Open SDN in Metro P-OTS Networks Sten Nordell CTO Metro Business Group
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—7-1 Optimizing BGP Scalability Improving BGP Convergence.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
Atrium Router Project Proposal Subhas Mondal, Manoj Nair, Subhash Singh.
Fall, 2001CS 6401 Switching and Routing Outline Routing overview Store-and-Forward switches Virtual circuits vs. Datagram switching.
MPLS Virtual Private Networks (VPNs)
InterVLAN Routing 1. InterVLAN Routing 2. Multilayer Switching.
Chapter 3 Part 3 Switching and Bridging
An evolutionary approach to G-MPLS ensuring a smooth migration of legacy networks Ben Martens Alcatel USA.
Multi Node Label Routing – A layer 2.5 routing protocol
Instructor Materials Chapter 1: LAN Design
CIS 700-5: The Design and Implementation of Cloud Networks
Software defined networking: Experimental research on QoS
Shadow Configurations: A Network Management Primitive
Architecture and Algorithms for an IEEE 802
Data Center Network Architectures
Evolution Towards Global Routing Scalability
Elastic Provisioning In Virtual Private Clouds
COS 561: Advanced Computer Networks
Ken Gunnells, Ph.D. - Networking Paul Crigler - Programming
Chapter 3 Part 3 Switching and Bridging
A Principled Approach to Managing Routing in Large ISP Networks
Refactoring Router Software to Minimize Disruption
CS 31006: Computer Networks – The Routers
Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+,
NTHU CS5421 Cloud Computing
Dynamic Routing and OSPF
COS 561: Advanced Computer Networks
Separating Routing Planes using Segment Routing draft-gulkohegde-spring-separating-routing-planes-using-sr-00 IETF 98 – Chicago, USA Shraddha Hegde
COS 461: Computer Networks Spring 2014
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
Chapter 3 Part 3 Switching and Bridging
Backbone Traffic Engineering
COS 461: Computer Networks
BGP Instability Jennifer Rexford
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Presentation transcript:

Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe, Jennifer Rexford

Virtual ROuters On the Move (VROOM) Key idea Routers should be free to roam around Useful for many different applications Simplify network maintenance Simplify service deployment and evolution Reduce power consumption … Feasible in practice No performance impact on data traffic No visible impact on control-plane protocols The key idea of VROOM is that routers should be free to roam around, instead of being permanently attached to a specific piece of hardware. In this talk, I’ll first show that VROOM is useful for many different network management applications, such as simplifying network maintenance, simplifying service deployment and evolution. In fact, it can also save us power. I will then show through our prototype implementation and evaluation that VROOM is actually feasible in practice, with no performance impact on data traffic and no visible impact on routing protocols.

The Two Notions of “Router” The IP-layer logical functionality, and the physical equipment Logical (IP layer) Here is the basic idea of VROOM: virtual router instances running on top of physical routers form the logical topology of a network. The physical routers only provide shared hardware resource and the necessary virtualization support. It is the virtual routers that run routing protocols and forward actual traffic. Physical

The Tight Coupling of Physical & Logical Root of many network-management challenges (and “point solutions”) Logical (IP layer) They are basically the same thing in people’s mind. When we think of a node in a topology, we also have the physical box in our mind, and vice versa. Physical

VROOM: Breaking the Coupling Re-mapping the logical node to another physical node VROOM enables this re-mapping of logical to physical through virtual router migration. Logical (IP layer) The mapping can change, maintaining the IP layer logical topology and configuration intact. Each virtual router has its own routing protocol instances and its own forwarding tables. A physical router can support multiple virtual router instances through virtualization. Physical

Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence VR-1 A B

Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence A VR-1 B

Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence A VR-1 B

Case 2: Service Deployment & Evolution Move a (logical) router to more powerful hardware In ISPs, as a service grows, it may need to be moved to a more powerful router. Today, this process usually involves certain period of downtime.

Case 2: Service Deployment & Evolution VROOM guarantees seamless service to existing customers during the migration

Case 3: Power Savings $ Hundreds of millions/year of electricity bills

Case 3: Power Savings Contract and expand the physical network according to the traffic volume

Case 3: Power Savings Contract and expand the physical network according to the traffic volume

Case 3: Power Savings Contract and expand the physical network according to the traffic volume

Virtual Router Migration: the Challenges Migrate an entire virtual router instance All control plane & data plane processes / states

Virtual Router Migration: the Challenges Migrate an entire virtual router instance Minimize disruption Data plane: millions of packets/second on a 10Gbps link Control plane: less strict (with routing message retrans.)

Virtual Router Migration: the Challenges Migrating an entire virtual router instance Minimize disruption Link migration We need to migrate all the links associated with the virtual router as well. To do this in a seamless fashion, we leverage the fact that in ISPs a point-to-point link at the IP layer is usually a multi-hop path in the underlying transport network. The advances of the transport network now offer the capability to dynamically setup a new optical path and switch the old path to the new path virtually instantaneously. This allows us to realize link migration by configuring the optical transport networks.

Virtual Router Migration: the Challenges Migrating an entire virtual router instance Minimize disruption Link migration

Data-Plane Hypervisor Dynamic Interface Binding VROOM Architecture Data-Plane Hypervisor Dynamic Interface Binding

VROOM’s Migration Process Key idea: separate the migration of control and data planes Migrate the control plane Clone the data plane Migrate the links

Control-Plane Migration Leverage virtual server migration techniques Router image Binaries, configuration files, etc.

Control-Plane Migration Leverage virtual migration techniques Router image Memory 1st stage: iterative pre-copy 2nd stage: stall-and-copy (when the control plane is “frozen”) During the first iteration, all pages are transferred from A to B. Subsequent iterations copy only those pages dirtied during the previous transfer phase.

Control-Plane Migration Leverage virtual server migration techniques Router image Memory CP Physical router A DP Physical router B

Data-Plane Cloning Clone the data plane by repopulation Enable migration across different data planes Eliminate synchronization issue of control & data planes Physical router A DP-old CP Physical router B DP-new DP-new

Remote Control Plane Data-plane cloning takes time Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels Physical router A DP-old CP Physical router B DP-new *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

Remote Control Plane Data-plane cloning takes time Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels Physical router A DP-old CP Physical router B DP-new *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

Remote Control Plane Data-plane cloning takes time Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels Physical router A DP-old CP Physical router B DP-new *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

Double Data Planes At the end of data-plane cloning, both data planes are ready to forward traffic DP-old CP DP-new

Asynchronous Link Migration With the double data planes, links can be migrated independently DP-old A B The double data planes simplify link migration because they enable the links to be migrated independently. For example, in this simple example, to migrate the links, we can first set up the links from the new physical nodes to the two adjacent nodes. We can then migrate the traffic in each direction separately. CP DP-new

Prototype Implementation Control plane: OpenVZ + Quagga Data plane: two prototypes Software-based data plane (SD): Linux kernel Hardware-based data plane (HD): NetFPGA Why two prototypes? To validate the data-plane hypervisor design (e.g., migration between SD and HD)

Evaluation Performance of individual migration steps Impact on data traffic Impact on routing protocols Experiments on Emulab

Evaluation Performance of individual migration steps Impact on data traffic Impact on routing protocols Experiments on Emulab

Impact on Data Traffic The diamond testbed VR n1 n0 n3 n2

Impact on Data Traffic SD router w/ separate migration bandwidth Slight delay increase due to CPU contention HD router w/ separate migration bandwidth No delay increase or packet loss

Impact on Routing Protocols The Abilene-topology testbed

Core Router Migration: OSPF Only Introduce LSA by flapping link VR2-VR3 Miss at most one LSA Get retransmission 5 seconds later (the default LSA retransmission timer) Can use smaller LSA retransmission-interval (e.g., 1 second)

Edge Router Migration: OSPF + BGP Average control-plane downtime: 3.56 seconds Performance lower bound OSPF and BGP adjacencies stay up Default timer values OSPF hello interval: 10 seconds BGP keep-alive interval: 60 seconds

Where To Migrate Physical constraints Latency E.g, NYC to Washington D.C.: 2 msec Link capacity Enough remaining capacity for extra traffic Platform compatibility Routers from different vendors Router capability E.g., number of access control lists (ACLs) supported The constraints simplify the placement problem

Conclusions & Future Work VROOM: a useful network-management primitive Separate tight coupling between physical and logical Simplify network management, enable new applications No data-plane and control-plane disruption Future work Migration scheduling as an optimization problem Other applications of router migration Handle unplanned failures Traffic engineering

Thanks! Questions & Comments? yiwang@cs.princeton.edu

Packet-aware Access Network

Packet-aware Access Network Pseudo-wires (virtual circuits) from CE to PE PE CE P/G-MSS: Packet-aware/Gateway Multi-Service Switch MSE: Multi-Service Edge

Events During Migration Network failure during migration The old VR image is not deleted until the migration is confirmed successful Routing messages arrive during the migration of the control plane BGP: TCP retransmission OSPF: LSA retransmission

Flexible Transport Networks Migrate links affixed to the virtual routers Enabled by: programmable transport networks Long-haul links are reconfigurable Layer 3 point-to-point links are multi-hop at layer 1/2 New York Chicago Programmable Transport Network Washington D.C. : Multi-service optical switch (e.g., Ciena CoreDirector) 44 44

Requirements & Enabling Technologies Migrate links affixed to the virtual routers Enabled by: programmable transport networks Long-haul links are reconfigurable Layer 3 point-to-point links are multi-hop at layer 1/2 New York Chicago Programmable Transport Network Washington D.C. : Multi-service optical switch (e.g., Ciena CoreDirector)

Requirements & Enabling Technologies Enable edge router migration Enabled by: packet-aware access networks Access links are becoming inherently virtualized Customers connects to provider edge (PE) routers via pseudo-wires (virtual circuits) Physical interfaces on PE routers can be shared by multiple customers Dedicated physical interface per customer Shared physical interface

Link Migration in Transport Networks With programmable transport networks, long-haul links are reconfigurable IP-layer point-to-point links are multi-hop at transport layer VROOM leverages this capability in a new way to enable link migration 47 47

Link Migration in Flexible Transport Networks 2. With packet-aware transport networks Logical links share the same physical port Packet-aware access network (pseudo wires) Packet-aware IP transport network (tunnels) 48 48

The Out-of-box OpenVZ Approach Packets are forwarded inside each VE When a VE is being migrated, packets are dropped 49 49

Putting It Altogether: Realizing Migration 1. The migration program notifies shadowd about the completion of the control plane migration 50 50

Putting It Altogether: Realizing Migration 2. shadowd requests zebra to resend all the routes, and pushes them down to virtd 51 51

Putting It Altogether: Realizing Migration 3. virtd installs routes the new FIB, while continuing to update the old FIB 52 52

Putting It Altogether: Realizing Migration 4. virtd notifies the migration program to start link migration after finishing populating the new FIB 5. After link migration is completed, the migration program notifies virtd to stop updating the old FIB 53 53

Power Consumption of Routers Vendor Cisco Juniper Model CRS-1 12416 7613 T1600 T640 M320 Power (watt) 10,920 4,212 4,000 9,100 6,500 3,150 A Synthetic large tier-1 ISP backbone 50 POPs (Point-of-Presence) 20 major POPs, each has: 6 backbone routers, 6 peering routers, 30 access routers 30 smaller POPs, each has: 6 access routers

Future Work Algorithms that solve the constrained optimization problems Control-plane hypervisor to enable cross-vendor migration 55 55

Performance of Migration Steps Memory copy time With different numbers of routes (dump file sizes) 56 56

Performance of Migration Steps FIB population time Grows linearly w.r.t. the number of route entries Installing a FIB entry into NetFPGA: 7.4 microseconds Installing a FIB entry into Linux kernel: 1.94 milliseconds FIB update time: time for virtd to install entries to FIB Total time: FIB update time + time for shadowd to send routes to virtd 57 57

The Importance of Separate Migration Bandwidth The dumbbell testbed 250k routes in the RIB 58 58

Separate Migration Bandwidth is Important Throughput of the migration traffic 59 59

Separate Migration Bandwidth is Important Delay increase of the data traffic 60 60

Separate Migration Bandwidth is Important Loss rate of the data traffic 61 61