Separating Routing From Routers Jennifer Rexford Princeton University

Separating Routing From Routers Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex

Today’s IP Routers Management plane –Construct network-wide view –Configure the routers Control plane –Track topology changes –Compute routes and install forwarding tables Data plane –Forward, filter, buffer, mark, and rate-limitpackets –Collect traffic statistics OSPF BGP OSPF BGP OSPF BGP FIB configuration monitoring controlled by vendor

Death to the Control Plane! Faster pace of innovation –Remove dependence on vendors and the IETF Simpler management systems –No need to “invert” control-plane operations Easier interoperability between vendors –Compatibility necessary only in “wire” protocols Simpler, cheaper routers –Little or no software on the routers 3

We Can Remove the Control Plane! Control software can run elsewhere –The control plane is just software anyway State and computation is reasonable –E.g., 300K prefixes, a few million changes/day System overheads can be amortized –Mostly redundant data across routers Easier access to other information –Layer-2 risks, host measurements, biz goals, … Some control could move to end hosts 4

Outline 4D architecture –Decision, dissemination, discovery, & data planes Routing Control Platform (RCP) –Interdomain routing without routers Simple failure resilient routing –Intradomain routing without routers Conclusion and ongoing work 5

Clean-Slate 4D Architecture 6

Three Goals of 4D Architecture Network-level objectives –Configure the network, not the routers –E.g., minimize the maximum link utilization –E.g., connectivity under all layer-two failures Network-wide views –Complete visibility to drive decision-making –Traffic matrix, network topology, equipment Direct control –Direct, sole control over data-plane configuration –Packet forwarding, filtering, marking, buffering… 7

The Four Planes Decision: all management and control Dissemination: communication to/from the routers Discovery: topology and traffic monitoring Data: packet handling 8 routers

Practical Challenges Scalability –Decision elements responsible for many routers Response time –Delays between decision elements and routers Reliability –Must have multiple decision elements and failover Security –Network vulnerable to attacks on decision elements Interoperability –Legacy routers and neighboring domains 9

Routing Control Platform (RCP) Separating Interdomain Routing From Routers 10

RCP: Routing Control Platform Compute interdomain routes for the routers –Input: BGP-learned routes –Output: forwarding-table entries for each router Backwards compatibility with legacy routers –RCP speaks to routers using BGP protocol Routers still run intradomain routing protocol RCP Autonomous System

Scalable Implementation Eliminate redundancy –Store a single copy of each BGP-learned route Accelerate lookups –Maintain indices to identify affected routers Avoid recomputation –Compute routes once for group of related routers Handle only BGP routing –Leave intradomain routing to the routers 12 An extensible, scalable, “smart” route reflector

Runs on a Single High-End PC Home-grown implementation on top of Linux –Experiments on 3.2 Ghz P4 with 4GB memory Computing routes for all AT&T routers –Grouping routers in the same point-of-presence Replaying all routing-protocol messages –BGP and OSPF logs, for 203,000 IP prefixes Experimental results –Memory footprint: 2.5 GB –Processing time: 0.1-20 msec 13

Reliability Simple replication –Single PC can serve as an RCP –So, just run multiple such PCs Run each replica independently –Separate BGP update feeds and router sessions No need for replica consistency protocol –Routing converges to a single stable solution –All replicas reach the same conclusion Robust to network partitions 14

Example Service: DoS Blackholing Filtering attack traffic –Measurement system detects an attack –Identify entry point and victim of attack –Drop offending traffic at the entry point RCP null route DoS attack

Example Service: Maintenance Dry-out Planned maintenance on an edge router –Drain traffic off of an edge router –Before bringing it down for maintenance d egress 1 egress 2 RCP use egress 2

Example Service: Egress Selection Customer-controlled egress selection –Multiple ways to reach the same destination –Giving customers control over the decision egress 1 egress 2 data center 1 data center 2 hot-potato routing RCP use egress 1 customer sites

Example Service: Better BGP Security Enhanced interdomain routing security –Anomaly detection to detect bogus routes –Prefer “familiar” routes over unfamiliar d???? egress 1 egress 2 RCP use egress 2 d

Example Service: Customized Routes Different routes have different properties –Security, performance, cost, stay in U.S., … Different preferences for different customers –Offer customized route selection as a service 19 Bank VoIP provider School

Simple Failure Resilient Routing Removing Intradomain Routing From Routers 20

Refactoring Intradomain Routing Traffic management –Avoid congestion –React quickly to failures Why not use MPLS fast reroute? –Complicated software –State and signaling for backup paths –Suboptimal routing after failure Alternative –Multipath routing with flexible splitting –Configuration that is robust to failures 21

Semi-Static Multipath Routing Minimal data-plane functionality –Static forwarding tables –Splitting traffic over multiple paths –Path failure detection (e.g., BFD) –No control-plane communication among routers Offline control by decision elements –Knows topology, traffic demands, and potential layer-two failures –Computes paths and splitting ratios –No real-time interaction with the network 22

Simple Failure Resilient Routing topology design list of shared risks traffic demands t s fixed paths splitting ratios 0.25 0.5 Paths: MPLS LSPs or OpenFlow rules

Link Failure on One of the Paths t s link cut fixed paths splitting ratios 0.25 0.5

Path-Level Failure Detection t s fixed paths splitting ratios 0.25 0.5 path probing link cut

New Path-Level Splitting Fractions t s fixed paths splitting ratios 0.5 0 link cut

Goal #1: Failure-Resilient Paths Working path needed per failure scenario –Including shared risks at layer-2 Example of failure states: {e 1 }, {e 2 }, {e 3 }, {e 4 }, {e 5 }, {e 1, e 2 }, {e 1, e 5 } e1e1 e3e3 e2e2 e4e4 e5e5 R1 R2

Goal #2: Minimize Link Loads Splitting parameters that minimize congestion –Considering all possible failure scenarios 28 link utilization u e s cost Φ(u e s ) links indexed by e failure states indexed by s aggregate congestion cost weighted for all failures min ∑ s w s ∑ e Φ(u e s ) while routing all traffic

A Range of Solutions Overly simple solutions do not work well Diminishing returns for adding functionality 29 capabilities of routers congestion Suboptimal solution Solution not scalable Good performance and practical?

Sweet Spot: State-Dependent Splitting Edge router functionality –Observe which paths failed –Use custom splitting ratios for this scenario 30 0.4 0.2 FailureSplitting Ratios -0.4, 0.4, 0.2 p20.6, 0, 0.4 …… configuratio n: 0.6 0.4 p1p1 p2p2 p3p3 at most 2 #paths entries

Performance Evaluation AT&T backbone network –Network topology and layer-2 shared risks  954 failure scenarios, with up to 20 links –Measured traffic demands from Netflow data Experimental results –Congestion level is indistinguishable from optimal –Mean delay (~31 msec) close to OSPF (~28.5 msec) –Most router pairs needed 5 paths, and at most 10 Computation time –To optimize both the paths and splitting parameters –Less than two hours for the AT&T backbone 31

Deployment Scenarios Move functionality to end hosts –Path failure detection and traffic splitting –Even simpler routers, and finer-grain splitting Dynamic adaptation to traffic –Adjust splitting parameters to traffic demands –E.g., monitor link load to detect congestion –Allows faster responses to shifts in load Using OpenFlow and NOX platforms –OpenFlow: path splitting based on flow rules –NOX: compute the paths and splitting fractions 32

Returning to 4D Challenges Scalability –Amortize state and computation, or pre-compute routes Response time –Decision element near the routers, or make routing static Reliability –Simple replication of the decision elements Security –Perimeter filters, only routers talk to decision elements Interoperability –Use BGP, MPLS, or OpenFlow as dissemination plane 33

Conclusions Today’s routers –Too complicated –Too difficult to manage –Too hard to change Dumb routers, smart decision elements –Routers forward packets & collect measurement –… at the best of the decision elements Proofs of concept –Routing Control Platform for interdomain routing –Failure resilient routing for intradomain routing 34

Ongoing Work: OpenFlow and NOX Enterprise network monitoring –Adaptive monitoring of flow aggregates –IDS, anomaly detection, traffic engineering, … Server load balancing in data centers –Multiple servers in a single data center –Sticky load balancing through switch flow rules Wide-area replicated services –Multiple servers in multiple data centers –Directing client traffic to the service instance 35

Separating Routing From Routers Jennifer Rexford Princeton University

Similar presentations

Presentation on theme: "Separating Routing From Routers Jennifer Rexford Princeton University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Separating Routing From Routers Jennifer Rexford Princeton University

Similar presentations

Presentation on theme: "Separating Routing From Routers Jennifer Rexford Princeton University"— Presentation transcript:

Similar presentations

About project

Feedback