VROOM: Virtual ROuters On the Move Aditya Akella Based on slides from Yi Wang.

Slides:



Advertisements
Similar presentations
Building Fast, Flexible Virtual Networks on Commodity Hardware Nick Feamster Georgia Tech Trellis: A Platform for Building Flexible, Fast Virtual Networks.
Advertisements

Improving Internet Availability. Some Problems Misconfiguration Miscoordination Efficiency –Market efficiency –Efficiency of end-to-end paths Scalability.
Power Saving. 2 Greening of the Internet Main idea: Reduce energy consumption in the network by turning off routers (and router components) when they.
1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
All Rights Reserved © Alcatel-Lucent 2009 Enhancing Dynamic Cloud-based Services using Network Virtualization F. Hao, T.V. Lakshman, Sarit Mukherjee, H.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Deployment of MPLS VPN in Large ISP Networks
© 2010 Cisco and/or its affiliates. All rights reserved. 1 Segment Routing Clarence Filsfils – Distinguished Engineer Christian Martin –
Seamless BGP Migration with Router Grafting Eric Keller, Jennifer Rexford Princeton University Kobus van der Merwe AT&T Research NSDI 2010.
Migrating and Grafting Routers to Accommodate Change Eric Keller Princeton University Jennifer Rexford, Jacobus van der Merwe, Yi Wang, and Brian Biskeborn.
Performance Evaluation of Open Virtual Routers M.Siraj Rathore
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Grafting Routers to Accommodate Change Eric Keller Princeton University Oct12, 2010 Jennifer Rexford, Jacobus van der Merwe, Michael Schapira.
Making Routers Last Longer with ViAggre Hitesh Ballani, Paul Francis, Tuan Cao and Jia Wang Cornell University and AT&T Labs- Research Presented by Gregory.
VROOM: Virtual ROuters On the Move
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe
1 Route Control Platform Making the Network Act Like One Big Router Jennifer Rexford Princeton University
Multiple constraints QoS Routing Given: - a (real time) connection request with specified QoS requirements (e.g., Bdw, Delay, Jitter, packet loss, path.
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, Y. Richard Yang Laboratory of Networked Systems Yale University.
Refactoring Router Software to Minimize Disruption Eric Keller Advisor: Jennifer Rexford Princeton University Final Public Oral - 8/26/2011.
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe (AT&T)
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, and Y. Richard Yang Laboratory of Networked Systems Yale University February.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Green Networking Jennifer Rexford Computer Science Department Princeton University.
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Rethinking Routers in the Age of Virtualization Jennifer Rexford Princeton University
VROOM: Virtual ROuters On the Move Yi Wang (Princeton) With: Kobus van der Merwe (AT&T Labs - Research) Jennifer Rexford (Princeton)
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Protocol implementation Next-hop resolution Reliability and graceful restart.
1 UHG MPLS Experience June 14, 2005 Sorell Slaymaker Director Network Architecture & Technologies
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
Network Architecture and Design
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
IGP Data Plane Convergence draft-ietf-bmwg-dataplane-conv-meth-14.txt draft-ietf-bmwg-dataplane-conv-term-14.txt draft-ietf-bmwg-dataplane-conv-app-14.txt.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Introduction to Scaling Networks Scaling Networks.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
U-Turn Alternates for IP/LDP Local Protection draft-atlas-ip-local-protect-uturn-00.txt Alia Atlas Gagan Choudhury
Lecture 2 Agenda –Finish with OSPF, Areas, DR/BDR –Convergence, Cost –Fast Convergence –Tools to troubleshoot –Tools to measure convergence –Intro to implementation:
ICS 156: Networking Lab Magda El Zarki Professor, ICS UC, Irvine.
1 | © 2015 Infinera Open SDN in Metro P-OTS Networks Sten Nordell CTO Metro Business Group
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—6-1 Scaling Service Provider Networks Scaling IGP and BGP in Service Provider Networks.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
1 Revision to DOE proposal Resource Optimization in Hybrid Core Networks with 100G Links Original submission: April 30, 2009 Date: May 4, 2009 PI: Malathi.
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
1 Resource Optimization in Hybrid Core Networks with 100G Links Malathi Veeraraghavan University of Virginia [Collaboration with Admela Jukan] Date: Sep.
Wrapping up subnetting, mapping IPs to physical ports BSAD 146 Dave Novak Sources: Network+ Guide to Networks, Dean 2013.
A Better Way Huawei Financial Agile Network Solution Success Cases.
1 IGP Data Plane Convergence Benchmarking draft-ietf-bmwg-igp-dataplane-conv-app-00.txt draft-ietf-bmwg-igp-dataplane-conv-term-00.txt draft -ietf-bmwg-igp-dataplane-conv-meth-00.txt.
CCNP SWITCH: Implementing Cisco IP Switched Networks
Multi Node Label Routing – A layer 2.5 routing protocol
CIS 700-5: The Design and Implementation of Cloud Networks
Examples based on draft-cheng-supa-applicability-00.txt
Shadow Configurations: A Network Management Primitive
Evolution Towards Global Routing Scalability
A Principled Approach to Managing Routing in Large ISP Networks
Refactoring Router Software to Minimize Disruption
Chapter 2: Static Routing
Dynamic Routing and OSPF
Dynamic Routing Protocols part3 B
Yi Wang, Eric Keller, Brian Biskeborn,
Presentation transcript:

VROOM: Virtual ROuters On the Move Aditya Akella Based on slides from Yi Wang

Virtual ROuters On the Move (VROOM) Key idea – Routers should be free to roam around Useful for many different applications – Simplify network maintenance – Simplify service deployment and evolution – Reduce power consumption – … Feasible in practice – No performance impact on data traffic – No visible impact on routing protocols 2

VROOM: The Basic Idea Virtual routers (VRs) form logical topology physical router virtual router logical link

VROOM: The Basic Idea VR migration does not affect the logical topology physical router virtual router logical link

Outline Why is VROOM a good idea? What are the challenges? – Or it is just technically trivial? How does VROOM work? – The migration process Is VROOM practical? – Prototype system – Performance evaluation Where to migrate? – The scheduling problem Still have questions? Feel free to ask! 5

The Coupling of Logical and Physical Today, the physical and logical configurations of a router is tightly coupled Physical changes break protocol adjacencies, disrupt traffic Logical configuration as a tool to reduce the disruption – E.g., the “cost-out/cost-in” of IGP link weights – Cannot eliminate the disruption – Account for over 73% of network maintenance events 6

VROOM Separates the Logical and Physical Make a logical router instance migratable among physical nodes All logical configurations/states remain the same before/after the migration – IP addresses remain the same – Routing protocol configurations remain the same – Routing-protocol adjacencies stay up No protocol (BGP/IGP) reconvergence – Network topology stays intact No disruption to data traffic 7

Case 1: Planned Maintenance Today’s best practice: “cost-out/cost-in” – Router reconfiguration & protocol reconvergence VROOM – NO reconfiguration of VRs, NO reconvergence 8 PR-A PR-B VR-1

Case 1: Planned Maintenance Today’s best practice: “cost-out/cost-in” – Router reconfiguration & protocol reconvergence VROOM – NO reconfiguration of VRs, NO reconvergence 9 PR-A PR-B VR-1

Case 1: Planned Maintenance Today’s best practice: “cost-out/cost-in” – Router reconfiguration & protocol reconvergence VROOM – NO reconfiguration of VRs, NO reconvergence 10 PR-A PR-B VR-1

Case 2: Service Deployment & Evolution Deploy a new service in a controlled “test network” first 11 Production network Test network CE

Case 2: Service Deployment & Evolution Roll out the service to the production network after it matures VROOM guarantees seamless service to existing customers during the roll-out and later evolution 12 Production network Test network

Case 3: Power Savings Big power consumption of routers – Millions of Routers in the U.S. – Electricity bill: $ hundreds of millions/year 13 (Source: National Technical Information Service, Department of Commerce, Figures for 2005 & 2010 are projections.)

Case 3: Power Savings Observation: the diurnal traffic pattern Idea: contract and expand the physical network according to the traffic demand 14

Case 3: Power Savings 15 Dynamically contract & expand the physical network in a day - 3PM

Case 3: Power Savings 16 Dynamically contract & expand the physical network in a day - 9PM

Case 3: Power Savings 17 Dynamically contract & expand the physical network in a day - 4AM

Migrate an entire virtual router instance – All control plane & data plane processes / states Minimize disruption – Data plane: up to millions packets per second – Control plane: less stringent (w/ routing message retrans.) Migrate links Virtual Router Migration: the Challenges 18

Outline Why is VROOM a good idea? What are the challenges? How does VROOM work? – The migration enablers – The migration process What to be migrated? How? (in order to minimize disruption) Is VROOM practical? Where to migrate?

VROOM Architecture Three enablers that make VR migration possible – Router virtualization – Control and data plane separation – Dynamic interface binding 20

A Naive Migration Process 21 1.Freeze the virtual router 2.Copy states 3.Restart 4.Migrate links  Practically unacceptable Packet forwarding should not stop during migration

22 VROOM’s Migration Process  Key idea: separate the migration of control and data plane No data-plane interruption Low control-plane interruption 1.Control-plane migration 2.Data-plane cloning 3.Link migration

23 Control-Plane Migration  Two things to be copied Router image  Binaries, configuration files, etc. Memory  1st stage: pre-copy  2nd stage: stall-and-copy (when the control plane is “frozen”) t1t2t3t4 time 1 2 1: router-image copy 2: memory copy pre-copystall-and-copy

24 Data-Plane Cloning  Clone the data plane by repopulation Copying the data plane states is wasteful, and could be hard Instead, repopulate the new data plane using the migrated control plane The old data plane continues working during migration t1t2t3t4 time 1 2 1: router-image copy 2: memory copy t5 3 3: data-plane cloning

25 Remote Control Plane  The migrated control plane plays two roles Act as a “remote control plane” for the old data plane Populate the new data plane t1t2t3t4 time 1 2 1: router-image copy 2: memory copy t5 3 3: data-plane cloning old node new node control plane remote control plane

26 Keep the Control Plane “Online”  Data-plane cloning takes time Around 110 us per FIB entry update (for high-end router) * Installing 250k routes could take over 20 seconds  The control plane needs connectivity during this period Redirect the routing messages through tunnels *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

27 Double Data Planes  At the end of data-plane cloning, two data planes are ready to forward traffic (i.e., “double data planes”) t1t2t3t4 time 1 2 1: router-image copy 2: memory copy t5 3 3: data-plane cloning t0 0 0: tunnel setup double data plane data plane old node 4 4: asynchronous link migration new node old node new node control plane remote control plane t6

28 Asynchronous Link Migration  With the double data planes, each link can be migrated independently Eliminate the need for a synchronization system

Outline Why is VROOM a good idea? What are the challenges? How does VROOM work? Is VROOM practical? – Prototype system – Performance evaluation Where to migrate?

30 Prototype Implementation  PC + OpenVZ  OpenVZ: OS-level virtualization Lighter-weight Supports live migration  Two prototypes Software-based data plane (SD): Linux kernel Hardware-based data plane (HD): NetFPGA  NetFPGA: 4-port gigabit Ethernet PCI with an FPGA  Why two prototypes? To validate the data-plane hypervisor design (e.g., migration between SD and HD)

31 The Out-of-box OpenVZ Approach  Packets are forwarded inside each VE  When a VE is being migrated, packets are dropped

32 Control and Data Plane Separation  Move the FIBs out of the VEs  shadowd in each VE, “pushing down” route updates  virtd in VE0, as the “data-plane hypervisor”

33 Dynamic Interface Binding  bindd provides two types of bindings: Map substrate interfaces to the right FIB Map substrate interfaces to the right virtual interfaces

34 Putting It Altogether: Realizing Migration 1. The migration program notifies shadowd about the completion of the control plane migration

35 Putting It Altogether: Realizing Migration 2. shadowd requests zebra to resend all the routes, and pushes them down to virtd

36 Putting It Altogether: Realizing Migration 3. virtd installs routes the new FIB, while continuing to update the old FIB

37 Putting It Altogether: Realizing Migration 4. virtd notifies the migration program to start link migration after finishing populating the new FIB 5. After link migration is completed, the migration program notifies virtd to stop updating the old FIB

38 Evaluation  Answer three questions Performance of individual migration steps? Impact on data traffic? Impact on routing protocol?  Experiments on Emulab

39 Performance of Migration Steps  Memory copy time With different numbers of routes (dump file sizes)

40 Performance of Migration Steps  FIB population time Grows linearly w.r.t. the number of route entries Installing a FIB entry into NetFPGA: 7.4 microseconds Installing a FIB entry into Linux kernel: 1.94 milliseconds FIB update time: time for virtd to install entries to FIB Total time: FIB update time + time for shadowd to send routes to virtd

41 Data Plane Impact  The diamond testbed  64-byte UDP packets, round-trip traffic

42 Data Plane Impact  HD router with separate migration bandwidth No delay increase or packet loss  SD router with separate migration bandwidth Up to 3.7% delay increase at 5k packets/s Less than 0.4% delay increase at 25k packets/s SD, 5k packets/s

43 The Importance of Separate Migration Bandwidth  The dumbbell testbed  250k routes in the RIB

44 Separate Migration Bandwidth is Important  Throughput of the migration traffic

45 Separate Migration Bandwidth is Important  Delay increase of the data traffic

46 Separate Migration Bandwidth is Important  Loss rate of the data traffic

47 Control Plane Impact  The Abilene testbed  Assume a backbone running MPLS  VR5 configured as Core router (running OSPF only) Edge router (running OSPF + BGP)

48 Core Router Migration  No events during migration Average control plane downtime: seconds ( seconds in 10 runs) Support 1-second OSPF hello-interval (with 4-second dead- interval) Miss at most one hello message

49 Core Router Migration  Events happen during migration Introducing events (LSA) by flapping link VR2-VR3 Miss at most one LSA Get retransmission 5 seconds later (the default LSA retransmission- interval) Can use smaller LSA retransmission-interval (e.g., 1 second)

50 Edge Router Migration  255k BGP routes + OSPF  Dump file size grows from 3.2MB to 76.0MB  Average control plane downtime: seconds ( seconds in 10 runs)  Support 2-second OSPF hello-interval (with 8-second dead- interval)  BGP sessions stay up  In practice, ISPs often use the default values 10-second hello-interval 40-second dead interval

Outline Why is VROOM a good idea? What are the challenges? How does VROOM work? Is VROOM practical? Where to migrate?

Deciding Where To Migrate Physical constraints – Latency E.g, NYC to Washington D.C.: 2 msec – Link capacity Enough remaining capacity for extra traffic – Platform compatibility Routers from different vendors – Router capability E.g., number of access control lists (ACLs) supported Good news: these constraints limit the search space 52

53 Two Optimization Problems For planned maintenance/service deployment – Minimize path stretch – With constraints on link capacity, platform compatibility, router capability, etc. For power savings – Maximize power savings With different regional electricity prices – With constraints on path stretch, link capacity, etc.

Conclusions VROOM offers a useful network-management primitive – separates the tight coupling between physical and logical – Simplify network management, enable new applications Live router migration with minimal disruption – Data-plane hypervisor enables Data-plane cloning Remote control plane Double data plane and asynchronous link migration – No data-plane disruption – No visible control-plane disruption 54