Migrating and Grafting Routers to Accommodate Change Eric Keller Princeton University Jennifer Rexford, Jacobus van der Merwe, Yi Wang, and Brian Biskeborn.

Slides:



Advertisements
Similar presentations
Power Saving. 2 Greening of the Internet Main idea: Reduce energy consumption in the network by turning off routers (and router components) when they.
Advertisements

All Rights Reserved © Alcatel-Lucent 2009 Enhancing Dynamic Cloud-based Services using Network Virtualization F. Hao, T.V. Lakshman, Sarit Mukherjee, H.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Live Migration of an Entire Network (and its Hosts) Eric Keller, Soudeh Ghorbani, Matthew Caesar, Jennifer Rexford HotNets 2012.
Seamless BGP Migration with Router Grafting Eric Keller, Jennifer Rexford Princeton University Kobus van der Merwe AT&T Research NSDI 2010.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
VROOM: Virtual ROuters On the Move Aditya Akella Based on slides from Yi Wang.
Grafting Routers to Accommodate Change Eric Keller Princeton University Oct12, 2010 Jennifer Rexford, Jacobus van der Merwe, Michael Schapira.
Projects Related to Coronet Jennifer Rexford Princeton University
VROOM: Virtual ROuters On the Move
PROTOCOLS AND ARCHITECTURE Lesson 2 NETS2150/2850.
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, Y. Richard Yang Laboratory of Networked Systems Yale University.
Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.
Refactoring Router Software to Minimize Disruption Eric Keller Advisor: Jennifer Rexford Princeton University Final Public Oral - 8/26/2011.
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe (AT&T)
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, and Y. Richard Yang Laboratory of Networked Systems Yale University February.
1 Design and implementation of a Routing Control Platform Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, Jacobus van der.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Green Networking Jennifer Rexford Computer Science Department Princeton University.
Rethinking Routers in the Age of Virtualization Jennifer Rexford Princeton University
VROOM: Virtual ROuters On the Move Yi Wang (Princeton) With: Kobus van der Merwe (AT&T Labs - Research) Jennifer Rexford (Princeton)
Routing.
William Stallings Data and Computer Communications 7 th Edition Chapter 2 Protocols and Architecture.
CS335 Networking & Network Administration Tuesday, April 20, 2010.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
COE 342: Data & Computer Communications (T042) Dr. Marwan Abu-Amara Chapter 2: Protocols and Architecture.
Dynamic Infrastructure for Dependable Cloud Services Eric Keller Princeton University.
Virtual ROuters On the Move (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe,
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
NetworkProtocols. Objectives Identify characteristics of TCP/IP, IPX/SPX, NetBIOS, and AppleTalk Understand position of network protocols in OSI Model.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
Common Devices Used In Computer Networks
William Stallings Data and Computer Communications 7 th Edition Data Communications and Networks Overview Protocols and Architecture.
Overview of implementations openBGP (and openOSPF) –Active development Zebra –Commercialized Quagga –Active development XORP –Hot Gated –Dead/commercialized.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Introduction Slide 1 A Communications Model Source: generates.
Data and Computer Communications Chapter 2 – Protocol Architecture, TCP/IP, and Internet-Based Applications.
The OSI Model.
1.4 Open source implement. Open source implement Open vs. Closed Software Architecture in Linux Systems Linux Kernel Clients and Daemon Servers Interface.
Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.
Delivery, Forwarding, and Routing of IP Packets
1 Internet Routing. 2 Terminology Forwarding –Refers to datagram transfer –Performed by host or router –Uses routing table Routing –Refers to propagation.
William Stallings Data and Computer Communications
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
1 | © 2015 Infinera Open SDN in Metro P-OTS Networks Sten Nordell CTO Metro Business Group
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—6-1 Scaling Service Provider Networks Scaling IGP and BGP in Service Provider Networks.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—7-1 Optimizing BGP Scalability Improving BGP Convergence.
11 ROUTING IP Chapter 3. Chapter 3: ROUTING IP2 CHAPTER INTRODUCTION  Understand the function of a router.  Understand the structure of a routing table.
Fabric: A Retrospective on Evolving SDN Presented by: Tarek Elgamal.
Mobile IP THE 12 TH MEETING. Mobile IP  Incorporation of mobile users in the network.  Cellular system (e.g., GSM) started with mobility in mind. 
Multi Node Label Routing – A layer 2.5 routing protocol
OSI model vs. TCP/IP MODEL
Routing.
Refactoring Router Software to Minimize Disruption
COS 561: Advanced Computer Networks
Dynamic Routing and OSPF
COS 561: Advanced Computer Networks
Programmable Networks
COS 461: Computer Networks
Computer Networking A Top-Down Approach Featuring the Internet
Routing and the Network Layer (ref: Interconnections by Perlman
BGP Instability Jennifer Rexford
Routing.
Yi Wang, Eric Keller, Brian Biskeborn,
Presentation transcript:

Migrating and Grafting Routers to Accommodate Change Eric Keller Princeton University Jennifer Rexford, Jacobus van der Merwe, Yi Wang, and Brian Biskeborn

Abstract The complexity of network management is widely recognized as one of the biggest challenges facing the Internet today. Network operators are under tremendous pressure to make their networks highly reliable to avoid service disruptions. Yet, operators often need to change the network to upgrade faulty equipment, deploy new services, and install new routers. Unfortunately, changes cause disruptions, forcing a trade-off between the benefit of the change and the disruption it will cause. We argue that many network-management problems stem from the same root causes - the need to maintain consistency between the physical and logical configuration of the routers and the static coupling of router state and functionality to specific router instances. Hence, we propose two new network-management primitives where (i) (virtual) routers are allowed to freely move from one physical router to another, and (ii) parts of a router can be seamlessly removed from one router and merged into another without any disruption. In addition to simplifying existing network-management tasks like planned maintenance and service deployment, these primitives can also help tackle emerging challenges such as reducing energy consumption and can even be applied to traffic management. In this talk I will present the design and implementation of our modified router to incorporate these two primitives. 2

Dealing with Change 3 Networks need to be highly reliable –To avoid service disruptions Operators need to deal with change –Install, maintain, upgrade, or decommission equipment –Deploy new services But… change causes disruption –Forcing a tradeoff Migration and Grafting –Enabling operators to make changes –With no (minimal) disruption

Shutting Down a Router (today) How a route is propagated 4 F C G D A /8 (E) E /8 (D, E) /8 (C, D, E) /8 (F, G, D, E) /8 (A, C, D, E) B

Shutting Down a Router (today) Neighbors detect router down Choose new best route (if available) Send out updates 5 FG D A E /8 (A, F, G, D, E) B C Downtime best case – settle on new path (seconds) Downtime worst case – wait for router to be up (minutes) Both cases: lots of updates propagated

Moving a Link (today) 6 F C G D A E B Reconfigure D, E Remove Link

Moving a Link (today) 7 F C G D A E B No route to E withdraw

Moving a Link (today) 8 F C G D A E B Add Link Configure E, G /8 (E) /8 (G, E) Downtime best case – settle on new path (seconds) Downtime worst case – wait for link to be up (minutes) Both cases: lots of updates propagated

Tradeoff Benefit of the change Vs Amount of disruption 9

Planned Maintenance Shut down router to… * Replace power supply * Upgrade to new model 10 Unavoidable: So operators will do it

Power Savings Shut down router to… * Save power during times of lower traffic 11 Not done today because of the disruption

Customer Requests a Feature Network has mixture of routers from different vendors * Rehome customer to router with needed feature 12 Unavoidable (customer requested): So operators will do it

Traffic Management Typical traffic engineering: * adjust routing protocol parameters based on traffic Congested link 13

Traffic Management Instead… * Rehome customer to change traffic matrix 14 Not done today because of the disruption

Why is Change so Hard? Root cause is the monolithic view of a router (Hardware, software, and links as one entity) –Revisit the design to make dealing with change easier Goals: Routing and forwarding should not be disrupted –Data packets are not dropped –Routing protocol adjacencies do not go down –All route announcements are received Change should be transparent –Neighboring routers/operators should not be involved –Redesign the routers not the protocols 15

Network Management Primitives Virtual router migration –To break the routing software free from the physical device it is running on Router grafting –To break the links/sessions free from the routing software instance currently handling it 16

17 VROOM: Virtual Routers on the Move [SIGCOMM 2008]

The Two Notions of “Router” The IP-layer logical functionality, and the physical equipment 18 Logical (IP layer) Physical

The Tight Coupling of Physical & Logical Root of many network-management challenges (and “point solutions”) 19 Logical (IP layer) Physical

VROOM: Breaking the Coupling Re-mapping the logical node to another physical node 20 Logical (IP layer) Physical VROOM enables this re-mapping of logical to physical through virtual router migration.

Enabling Technology: Virtualization Routers becoming virtual 21 Switching Fabric data plane control plane

Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 22 A B VR-1

Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 23 A B VR-1

Case 1: Planned Maintenance NO reconfiguration of VRs, NO reconvergence 24 A B VR-1

Case 2: Power Savings 25 $ Hundreds of millions/year of electricity bills

Case 2: Power Savings Contract and expand the physical network according to the traffic volume 26

Case 2: Power Savings Contract and expand the physical network according to the traffic volume 27

Case 2: Power Savings Contract and expand the physical network according to the traffic volume 28

1.Migrate an entire virtual router instance All control plane & data plane processes / states Virtual Router Migration: the Challenges 29 Switching Fabric data plane control plane

1.Migrate an entire virtual router instance 2.Minimize disruption Data plane: millions of packets/second on a 10Gbps link Control plane: less strict (with routing message retransmission) Virtual Router Migration: the Challenges 30

1.Migrate an entire virtual router instance 2.Minimize disruption 3.Link migration Virtual Router Migration: the Challenges 31

Virtual Router Migration: the Challenges 1.Migrate an entire virtual router instance 2.Minimize disruption 3.Link migration 32

VROOM Architecture Dynamic Interface Binding Data-Plane Hypervisor 33

Key idea: separate the migration of control and data planes 1.Migrate the control plane 2.Clone the data plane 3.Migrate the links VROOM’s Migration Process 34

Leverage virtual server migration techniques Router image –Binaries, configuration files, running processes, etc. Control-Plane Migration 35

Leverage virtual server migration techniques Router image –Binaries, configuration files, running processes, etc. Control-Plane Migration Physical router A Physical router B DP CP 36

Clone the data plane by repopulation –Enables traffic to be forwarded during migration –Enables migration across different data planes Data-Plane Cloning Physical router A Physical router B CP DP-old DP-new 37

Remote Control Plane Physical router A Physical router B CP DP-old DP-new 38 Data-plane cloning takes time –Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005.

Data-plane cloning takes time –Installing 250k routes takes over 20 seconds* The control & old data planes need to be kept “online” Solution: redirect routing messages through tunnels Remote Control Plane *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, Physical router A Physical router B CP DP-old DP-new 39

At the end of data-plane cloning, both data planes are ready to forward traffic Double Data Planes CP DP-old DP-new 40

With the double data planes, links can be migrated independently Asynchronous Link Migration A CP DP-old DP-new B 41

42 Prototype: Quagga + OpenVZ Old routerNew router

Performance of individual migration steps Impact on data traffic Impact on routing protocols Experiments on Emulab 43 Evaluation

Performance of individual migration steps Impact on data traffic Impact on routing protocols Experiments on Emulab 44 Evaluation

The diamond testbed 45 Impact on Data Traffic n0 n1 n2 n3 VR No delay increase or packet loss

The Abilene-topology testbed 46 Impact on Routing Protocols

Average control-plane downtime: 3.56 seconds OSPF and BGP adjacencies stay up At most 1 missed advertisement retransmitted Default timer values –OSPF hello interval: 10 seconds –OSPF RouterDeadInterval: 4x hello interval –OSPF retransmission interval: 5 seconds –BGP keep-alive interval: 60 seconds –BGP hold time interval: 3x keep-alive interval 47 Edge Router Migration: OSPF + BGP

VROOM Summary Simple abstraction No modifications to router software (other than virtualization) No impact on data traffic No visible impact on routing protocols 48

49 Router Grafting [NSDI 2010]

Recall: Moving a single session (today) 1)Reconfigure old router, remove old link 2)Add new link link, configure new router 3)Establish new BGP session (exchange routes) 50 Logical (IP layer) Physical delete peer Add peer BGP updates Downtime (minutes)

Router Grafting: Breaking up the router Logical (IP layer) Physical Send state Move link 51 Router Grafting enables this breaking apart a router (splitting/merging).

Grafting needs Router Modification Goals… –In addition to being transparent and no disruption Minimal code changes –Increase likelihood of adoption by vendors Interoperability (vendors, models, versions) –Increase usefulness –Means we can’t do memory copying (need export format independent of implementation) 52

Challenge: Protocol Layers 53 BGP TCP IP BGP TCP IP Send Packets Reliable Stream Exchange Routes Physical Link Configure neighbor(…) Configure neighbor(…)

Link and IP 54 BGP TCP IP BGP TCP IP Send Packets Reliable Stream Exchange Routes Physical Link Configure neighbor(…) Configure neighbor(…)

Link and IP Links use Programmable Transport Network IP Address has local meaning only –Moves with session 55 IP

TCP 56 BGP TCP IP BGP TCP IP Send Packets Reliable Stream Exchange Routes Physical Link Configure neighbor(…) Configure neighbor(…)

TCP Keeping it completely transparent –Sequence numbers –Packet input queue (packets that were not read) –Packet output queue (packets that were not ack’d yet) 57 TCP(data, seq, …) send() ack TCP(data’, seq’) recv() app OS

BGP 58 BGP TCP IP BGP TCP IP Send Packets Reliable Stream Exchange Routes Physical Link Configure neighbor(…) Configure neighbor(…)

BGP: Not just state transfer 59 Migrate session AS100 AS200 AS400 AS300

BGP: Not just state transfer 60 Migrate session AS100 AS200 AS400 AS300 Need to re-run decision processes

BGP: What (not) to Migrate Requirements –Want data packets to be delivered –Want routing adjacencies to remain up Need –Configuration –Routing information Do not need –State machine –Statistics –Timers 61

BGP: Configuration 62 Router sessions configured via command line (file) –Policies, details about neighbor –Stored in internal data structures Extract relevant commands –Apply to new router –Translated if necessary Need to modify software –Start ‘inactive’ (waiting for migrate in)

BGP: Route Information Routes from neighbor –Needed so neighbor doesn’t need to re-announce –B has different routes than A –Need to rerun decision process 63 Stores as RIB-in Propagate (if best) B A

BGP: Route Information Routes to neighbor –A’s best routes sent to neighbor –After migration, topology changes –Need to diff what A sent with what B would have sent 64 B A Stores as RIB-out Propagate best B would have sent different route

BGP: Special Case - Cluster Router 65 Switching Fabric Blade Line card A B C D Blade ABCD * Links “migrated” internally * Topology doesn’t change (no need to run decision process)

Prototype Added grafting into Quagga –RIB and decision process well separated Graft daemon to control process SockMi for TCP migration 66 Modified Quagga graft daemon Linux kernel SockMi.ko Migrate-from Router Handler Comm Linux kernel click click.ko click-based link migration Quagga Remote End-point Router Linux kernel Migrate-to Router Modified Quagga graft daemon Linux kernel SockMi.ko

Evaluation Impact on data traffic Impact on routing protocols Overhead on rest of the network 67

Evaluation Impact on data traffic Impact on routing protocols Overhead on rest of the network 68

Impact on Routing Protocols 69 CPU utilization affected by time to complete –Includes export, transmit, import, lookup, and decision –6.8s for between routers –4.4s for between blades –Further optimizations possible Protocols affected by unresponsiveness –Set old router to “inactive”, migrate link, migrate TCP, set new router to “active” –A few milliseconds

Overhead on rest of network How much communication/work on other routers? –Function of how routers are configured –e.g., Would A and B choose same route? (doing analysis as ongoing work) –Expected case: only minimal communication needed 70 B A Updates sent as a result of migration

Router Grafting Summary Enables moving a single link/session with… –Minimal code change –No impact on data traffic –No visible impact on routing protocol adjacencies –Minimal overhead on rest of network 71

Migrating and Grafting Together Router Grafting can do everything VROOM can –By migrating each link individually But VROOM is more efficient when… –Want to move all sessions –Moving between compatible routers (same virtualization technology) –Want to preserve “router” semantics VROOM requires no code changes –Can run a grafting router inside of virtual machine (e.g., VROOM + Grafting) –Each useful for different tasks 72

Conclusion To enable change without disruption –Need to revisit monolithic view of a router Decouple the software from the hardware –VROOM Decouple the links from the router software –Router Grafting Future Work: Hosted Virtual Networks –Decouple who runs the routing software from who owns/maintains the routing equipment 73

Questions? Contact info: 74