A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Outline Revisiting the control plane –Complexity of today’s control plane –Principles for a redesign Routing Control Platform –Deployability –Scalability –Reliability Example applications –DDoS blackholing, planned maintenance, and customized egress selection Conclusions and future work
Internet Architecture Smart hosts, and a dumb network Network delivers packets to hosts Services implemented on hosts Keep most state at the edges Edge Network IP But, how should we partition function vertically?
Today: Inside a Single Network Data Plane Packet handling by routers Forwarding, filtering, queuing Management Plane Figure out what is happening in network Decide how to change it Shell scripts Traffic Engin. Databases Planning tools OSPF SNMPnetflowmodems Configs OSPF BGP Link metrics OSPF BGP OSPF BGP Control Plane Multiple routing processes on each router Each router with different configuration program Many control knobs: link weights, access lists, policy FIB Routing policies Packet filters
No State in the Network? Yeah, Right… Dynamic state –Routing tables –Forwarding tables Configuration state –Access control lists –Link weights –Routing policies Hard-wired state –Default values of timers –Path-computation algorithms Lots of state, updated in a distributed, uncoordinated way
How Did We Get in This Mess? Initial IP architecture –Bundled packet handling and control logic –Distributed the functions across routers –Didn’t fully anticipate the need for management Rapid growth in features –Sudden popularity and growth of the Internet –Increasing demands for new functionality –Incremental extensions to protocols & routers Challenges of distributed algorithms –Some tasks are hard to do in a distributed fashion
What Does the Network Operator Want? Network-wide views –Network topology (e.g., routers, links) –Mapping to lower-level equipment –Traffic matrix Network-level objectives –Load balancing –Survivability –Reachability –Security Direct control –Explicit configuration of data-plane mechanisms
What Architecture Would Achieve This? Management plane Decision plane –Responsible for all decision logic and state –Operates on network-wide view and objectives –Directly controls the behavior of the data plane Control plane Discovery plane –Responsible for providing the network-wide view –Topology discovery, traffic measurement, etc. Data plane –Queues, filters, and forwards data packets –Accepts direct instruction from the decision plane
Advantages of the New Approach Lower management complexity –Complete, network-wide view –Direct control over the routers –Single specification of policies and objectives Simpler routers –Much less control-plane software –Much less configuration state Enabling innovation –New algorithms for selecting paths within an AS –New approaches to inter-AS routing
Example: Improving ISP Routing Border router Internal router 1.Provide internal reachability (IGP) 2.Learn routes to external destinations (eBGP) 3.Distribute externally learned routes internally (iBGP) 4.Select closest egress (IGP)
Is the New Architecture Feasible? Deployability: any way from here to there? –Must be compatible with today’s routers –Must provide incentives for deployment Speed: can it run fast enough? –Must respond quickly to network events –Needs to be as fast as a router Reliability: avoid single point of failure? –Must be replicated to tolerate failure –Replicas must behave consistently
Deployability: Don’t Change the Message Format Border Gateway Protocol –Interdomain routing protocol for the Internet –Widely implemented and used in networks Three main aspects of BGP –Protocol: standard messages sent between routers –Decision logic: multi-step route selection process –Policy: configuration options that influence routing The key point is –Although decision logic and policies are complex… –… the protocol and message format are simple Idea: use BGP messages to tell each router how to forward
Phase 1: Flexible Path Selection in One AS iBGP eBGP Before: conventional use of BGP in backbone network iBGP eBGP After: RCP learns routes and sends answers to routers RCP
Phase 2: AS-Wide Path Selection and Export iBGP eBGP Before: RCP gets “best” iBGP routes (and IGP feed) After: RCP gets all eBGP routes from neighbors iBGP eBGP RCP
Phase 3: Direct Communication Between RCPs Before: RCP gets all eBGP routes from neighbors iBGP eBGP After: ASes exchange routes via RCP RCP AS 3 AS 2 AS 1 iBGP Physical peering Inter-AS Protocol RCP
RCP Architecture Route Control Server (RCS) BGP Engine IGP Viewer Routing Control Platform (RCP) Available BGP routes BGP updates … Selected BGP routes BGP updates … Path cost matrix IGP link-state advertisements …
Challenges and Contributions Reliability –Problem: single point of failure –Contribution: simple replication of RCP components Consistency –Problem: inconsistent decisions by replicas –Contribution: consistency without inter-replica protocol Scalability –Problem: storing all routes increases cpu/memory usage –Contribution: can support large ISP in one computer Building this system is feasible
Consistency: One RCP, One Partition Solution: Assign all routers along the shortest IGP path the same exit router –Ensures forwarding loops don’t arise RCP 1 BA “Use egress B” “Use egress A”
Consistency: One RCP, Many Partitions Solution: Only use state from router’s partition in assigning its routes –Ensures next hop is reachable Partition 1Partition 2 RCP 1
Consistency: Many RCPs, Many Partitions Solution: RCPs receive same IGP/BGP state from each partition they can reach –IGP provides complete visibility and connectivity –RCS only acts on partition if it has complete state for it Partition 1Partition 2Partition 3 RCP 2RCP 1 No consistency protocol needed to guarantee consistency in steady state
RCS Scalability Eliminate redundancy –Store only a single copy of each BGP route Accelerate lookup –Quickly find routers whose routes changed Avoid recomputation –Compute routes once for groups of routers –Don’t recompute if relative ranking of egress routers unchanged
RIB-Out shadow tables Prefixes BGP updates (to routers) (points to currently used route for each router) rtr1rtr2rtr3 (stores copies of routes) BGP routes Prefixes BGP updates (from egress routers) Global route table eg1 eg2 eg3 eg1 eg2 eg3 IGP updates (points to routes that use each egress) Egress lists rtr1 rtr2 Scalability: RCS Data Structures
Example of Egress List Operation A BC C A B D D’s egress list 4 3 7
Example of Egress List Operation A BC D 2 C A B D’s egress list
Example of Egress List Operation A BC D 5 5 C A B D’s egress list 4 3 7
Example of Egress List Operation A BC D 1 1 C A B D’s egress list 4 3 7
Scalability: Standard Computing Platform Implementation platform –3.2 GHz Pentium-4 –8 GB memory –Linux kernel Workload –Routing/topology changes in AT&T’s network RCP performance –Memory usage: less than 2GB –Speed, BGP changes: less than 40 msec –Speed, topology changes: seconds System is able to keep up…
Application: DDoS Blackholing Blackholing of denial-of-service attacks –Preconfigure a “null” route on each router –Identify address of victim (from DoS system) –RCP assigns a null route for the destination iBGP RCP Victim “Use null route for /32” attack (detected by traffic analysis)
Application: Maintenance Dry-Out Dry-out of traffic before maintenance –Plan to take a router temporarily out of service –RCP assigns routes via new egress in advance iBGP RCP Router r about to undergo maintenance before d r s “Use route via s for d” after
Application: Customized Egress Selection Customer-controlled selection of egress points –Customer with two data centers and many sites –Customer wants to control the load balancing –RCP customization, not simply closest egress iBGP RCP d r s “Use route via r for d” “Use route via s for d” Site #1 Site #2
Conclusion Managing IP networks is too hard –IP architecture not designed for management –Complex, distributed operation of routers Reducing complexity in the key –Network-wide views/objectives and direct control –Removing control logic and state from the routers New architecture is feasible –RCP is deployable, scalable, and reliable –RCP solves important operations problems
Future Work Optimization –Real-time adaptation and offline planning –Designing the boundary to support optimization Security –Identifying unstable and suspicious BGP routes –Incrementally deploying a more secure protocol Policy –High-level specification of routing policies –Quantifying reductions in configuration complexity