Separating Routing From Routers Jennifer Rexford Princeton University

Slides:



Advertisements
Similar presentations
Failure Resilient Routing Simple Failure Recovery with Load Balancing Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
Advertisements

Multihoming and Multi-path Routing
Path Splicing with Network Slicing Nick Feamster Murtaza Motiwala Santosh Vempala.
Multihoming and Multi-path Routing
Jennifer Rexford Princeton University MW 11:00am-12:20pm Logically-Centralized Control COS 597E: Software Defined Networking.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
TIE Breaking: Tunable Interdomain Egress Selection Renata Teixeira Laboratoire d’Informatique de Paris 6 Université Pierre et Marie Curie with Tim Griffin.
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
1 Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network Jian Wu (University of Michigan) Z. Morley Mao (University.
Traffic Engineering With Traditional IP Routing Protocols
VROOM: Virtual ROuters On the Move Jennifer Rexford Joint work with Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe
1 Route Control Platform Making the Network Act Like One Big Router Jennifer Rexford Princeton University
Traffic Engineering Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
1 Traffic Engineering for ISP Networks Jennifer Rexford IP Network Management and Performance AT&T Labs - Research; Florham Park, NJ
Traffic Engineering in IP Networks Jennifer Rexford Computer Science Department Princeton University; Princeton, NJ
A Routing Control Platform for Managing IP Networks Jennifer Rexford Computer Science Department Princeton University
New Routing Architectures Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Network Protocols Designed for Optimizability Jennifer Rexford Princeton University
Dynamics of Hot-Potato Routing in IP Networks Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Wresting Control from BGP: Scalable Fine-grained Route Control UCSD / AT&T Research Usenix —June 22, 2007 Dan Pei, Tom Scholl, Aman Shaikh, Alex C. Snoeren,
Internet Routing (COS 598A) Today: Interdomain Traffic Engineering Jennifer Rexford Tuesdays/Thursdays.
1 Design and implementation of a Routing Control Platform Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, Jacobus van der.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Internet Routing (COS 598A) Today: Telling Routers What to Do Jennifer Rexford Tuesdays/Thursdays.
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Spring Routing & Switching Umar Kalim Dept. of Communication Systems Engineering 06/04/2007.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Tesseract A 4D Network Control Plane
Backbone Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
COS 461: Computer Networks
Multipath Routing Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Multipath Protocol for Delay-Sensitive Traffic Jennifer Rexford Princeton University Joint work with Umar Javed, Martin Suchara, and Jiayue He
1 Network-wide Decision Making: Toward a Wafer-thin Control Plane Jennifer Rexford, Albert Greenberg, Gisli Hjalmtysson ATT Labs Research David A. Maltz,
Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and.
Building a Strong Foundation for a Future Internet Jennifer Rexford ’91 Computer Science Department (and Electrical Engineering and the Center for IT Policy)
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Central.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Authors Renata Teixeira, Aman Shaikh and Jennifer Rexford(AT&T), Tim Griffin(Intel) Presenter : Farrukh Shahzad.
Happy Network Administrators  Happy Packets  Happy Users WIRED Position Statement Aman Shaikh AT&T Labs – Research October 16,
Using Measurement Data to Construct a Network-Wide View Jennifer Rexford AT&T Labs—Research Florham Park, NJ
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
Central Control over Distributed Routing fibbing.net SIGCOMM Stefano Vissicchio 18th August 2015 UCLouvain Joint work with O. Tilmans (UCLouvain), L. Vanbever.
Traffic Engineering for ISP Networks Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
Interdomain Routing Security. How Secure are BGP Security Protocols? Some strange assumptions? – Focused on attracting traffic from as many Ases as possible.
A Firewall for Routers: Protecting Against Routing Misbehavior1 June 26, A Firewall for Routers: Protecting Against Routing Misbehavior Jia Wang.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
Mike Freedman Fall 2012 COS 561: Advanced Computer Networks Traffic Engineering.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
OpenFlow: Enabling Innovation in Campus Networks Yongli Chen.
Separating Routing From Routers Jennifer Rexford Princeton University
1 Internet Routing: BGP Routing Convergence Jennifer Rexford Princeton University
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Jian Wu (University of Michigan)
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
Administrivia Paper assignments for reviews 2 and 3 are out
COS 461: Computer Networks
BGP Instability Jennifer Rexford
Presentation transcript:

Separating Routing From Routers Jennifer Rexford Princeton University

Today’s IP Routers Management plane –Construct network-wide view –Configure the routers Control plane –Track topology changes –Compute routes and install forwarding tables Data plane –Forward, filter, buffer, mark, and rate-limitpackets –Collect traffic statistics OSPF BGP OSPF BGP OSPF BGP FIB configuration monitoring controlled by vendor

Death to the Control Plane! Faster pace of innovation –Remove dependence on vendors and the IETF Simpler management systems –No need to “invert” control-plane operations Easier interoperability between vendors –Compatibility necessary only in “wire” protocols Simpler, cheaper routers –Little or no software on the routers 3

We Can Remove the Control Plane! Control software can run elsewhere –The control plane is just software anyway State and computation is reasonable –E.g., 300K prefixes, a few million changes/day System overheads can be amortized –Mostly redundant data across routers Easier access to other information –Layer-2 risks, host measurements, biz goals, … Some control could move to end hosts 4

Outline 4D architecture –Decision, dissemination, discovery, & data planes Routing Control Platform (RCP) –Interdomain routing without routers Simple failure resilient routing –Intradomain routing without routers Conclusion and ongoing work 5

Clean-Slate 4D Architecture 6

Three Goals of 4D Architecture Network-level objectives –Configure the network, not the routers –E.g., minimize the maximum link utilization –E.g., connectivity under all layer-two failures Network-wide views –Complete visibility to drive decision-making –Traffic matrix, network topology, equipment Direct control –Direct, sole control over data-plane configuration –Packet forwarding, filtering, marking, buffering… 7

The Four Planes Decision: all management and control Dissemination: communication to/from the routers Discovery: topology and traffic monitoring Data: packet handling 8 routers

Practical Challenges Scalability –Decision elements responsible for many routers Response time –Delays between decision elements and routers Reliability –Must have multiple decision elements and failover Security –Network vulnerable to attacks on decision elements Interoperability –Legacy routers and neighboring domains 9

Routing Control Platform (RCP) Separating Interdomain Routing From Routers 10

RCP: Routing Control Platform Compute interdomain routes for the routers –Input: BGP-learned routes –Output: forwarding-table entries for each router Backwards compatibility with legacy routers –RCP speaks to routers using BGP protocol Routers still run intradomain routing protocol RCP Autonomous System

Scalable Implementation Eliminate redundancy –Store a single copy of each BGP-learned route Accelerate lookups –Maintain indices to identify affected routers Avoid recomputation –Compute routes once for group of related routers Handle only BGP routing –Leave intradomain routing to the routers 12 An extensible, scalable, “smart” route reflector

Runs on a Single High-End PC Home-grown implementation on top of Linux –Experiments on 3.2 Ghz P4 with 4GB memory Computing routes for all AT&T routers –Grouping routers in the same point-of-presence Replaying all routing-protocol messages –BGP and OSPF logs, for 203,000 IP prefixes Experimental results –Memory footprint: 2.5 GB –Processing time: msec 13

Reliability Simple replication –Single PC can serve as an RCP –So, just run multiple such PCs Run each replica independently –Separate BGP update feeds and router sessions No need for replica consistency protocol –Routing converges to a single stable solution –All replicas reach the same conclusion Robust to network partitions 14

Example Service: DoS Blackholing Filtering attack traffic –Measurement system detects an attack –Identify entry point and victim of attack –Drop offending traffic at the entry point RCP null route DoS attack

Example Service: Maintenance Dry-out Planned maintenance on an edge router –Drain traffic off of an edge router –Before bringing it down for maintenance d egress 1 egress 2 RCP use egress 2

Example Service: Egress Selection Customer-controlled egress selection –Multiple ways to reach the same destination –Giving customers control over the decision egress 1 egress 2 data center 1 data center 2 hot-potato routing RCP use egress 1 customer sites

Example Service: Better BGP Security Enhanced interdomain routing security –Anomaly detection to detect bogus routes –Prefer “familiar” routes over unfamiliar d???? egress 1 egress 2 RCP use egress 2 d

Example Service: Customized Routes Different routes have different properties –Security, performance, cost, stay in U.S., … Different preferences for different customers –Offer customized route selection as a service 19 Bank VoIP provider School

Simple Failure Resilient Routing Removing Intradomain Routing From Routers 20

Refactoring Intradomain Routing Traffic management –Avoid congestion –React quickly to failures Why not use MPLS fast reroute? –Complicated software –State and signaling for backup paths –Suboptimal routing after failure Alternative –Multipath routing with flexible splitting –Configuration that is robust to failures 21

Semi-Static Multipath Routing Minimal data-plane functionality –Static forwarding tables –Splitting traffic over multiple paths –Path failure detection (e.g., BFD) –No control-plane communication among routers Offline control by decision elements –Knows topology, traffic demands, and potential layer-two failures –Computes paths and splitting ratios –No real-time interaction with the network 22

Simple Failure Resilient Routing topology design list of shared risks traffic demands t s fixed paths splitting ratios Paths: MPLS LSPs or OpenFlow rules

Link Failure on One of the Paths t s link cut fixed paths splitting ratios

Path-Level Failure Detection t s fixed paths splitting ratios path probing link cut

New Path-Level Splitting Fractions t s fixed paths splitting ratios link cut

Goal #1: Failure-Resilient Paths Working path needed per failure scenario –Including shared risks at layer-2 Example of failure states: {e 1 }, {e 2 }, {e 3 }, {e 4 }, {e 5 }, {e 1, e 2 }, {e 1, e 5 } e1e1 e3e3 e2e2 e4e4 e5e5 R1 R2

Goal #2: Minimize Link Loads Splitting parameters that minimize congestion –Considering all possible failure scenarios 28 link utilization u e s cost Φ(u e s ) links indexed by e failure states indexed by s aggregate congestion cost weighted for all failures min ∑ s w s ∑ e Φ(u e s ) while routing all traffic

A Range of Solutions Overly simple solutions do not work well Diminishing returns for adding functionality 29 capabilities of routers congestion Suboptimal solution Solution not scalable Good performance and practical?

Sweet Spot: State-Dependent Splitting Edge router functionality –Observe which paths failed –Use custom splitting ratios for this scenario FailureSplitting Ratios -0.4, 0.4, 0.2 p20.6, 0, 0.4 …… configuratio n: p1p1 p2p2 p3p3 at most 2 #paths entries

Performance Evaluation AT&T backbone network –Network topology and layer-2 shared risks  954 failure scenarios, with up to 20 links –Measured traffic demands from Netflow data Experimental results –Congestion level is indistinguishable from optimal –Mean delay (~31 msec) close to OSPF (~28.5 msec) –Most router pairs needed 5 paths, and at most 10 Computation time –To optimize both the paths and splitting parameters –Less than two hours for the AT&T backbone 31

Deployment Scenarios Move functionality to end hosts –Path failure detection and traffic splitting –Even simpler routers, and finer-grain splitting Dynamic adaptation to traffic –Adjust splitting parameters to traffic demands –E.g., monitor link load to detect congestion –Allows faster responses to shifts in load Using OpenFlow and NOX platforms –OpenFlow: path splitting based on flow rules –NOX: compute the paths and splitting fractions 32

Returning to 4D Challenges Scalability –Amortize state and computation, or pre-compute routes Response time –Decision element near the routers, or make routing static Reliability –Simple replication of the decision elements Security –Perimeter filters, only routers talk to decision elements Interoperability –Use BGP, MPLS, or OpenFlow as dissemination plane 33

Conclusions Today’s routers –Too complicated –Too difficult to manage –Too hard to change Dumb routers, smart decision elements –Routers forward packets & collect measurement –… at the best of the decision elements Proofs of concept –Routing Control Platform for interdomain routing –Failure resilient routing for intradomain routing 34

Ongoing Work: OpenFlow and NOX Enterprise network monitoring –Adaptive monitoring of flow aggregates –IDS, anomaly detection, traffic engineering, … Server load balancing in data centers –Multiple servers in a single data center –Sticky load balancing through switch flow rules Wide-area replicated services –Multiple servers in multiple data centers –Directing client traffic to the service instance 35