Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented by John P. John Modified by Moonyoung Chung
Contents Introduction Motivation and Goals Consensus Routing – Stable Mode – Transient Mode Evaluation Conclusions 2NSDI '08
Internet Routing 3NSDI '08 A goal of the Internet is global reachability But, BGP fails to achieve this goal – Physical paths exist, but not BGP paths – 10-15% of BGP updates cause loops and blackholes – 90% of all packet losses on the Internet due to loops
BGP NSDI '084 Opaque policy routing – Preferred routes visible to neighbors – Underlying policies not visible and under local control Mechanism: – Autonomous Systems(ASes) send preferred path to neighbors – If AS receives new path, start using right away – Forward path to neighbors, after some delay – Path eventually propagates to all ASes
Example Destination 5: 4-5 5: : 1-5 5: 4-5 5: NSDI '085
BGP link failure NSDI '086 5: 4-5 5: : 1-5 5: 4-5 5: Destination 5:4-5 Link 4-5 fails AS4 withdraws path from upstream ASes AS4 withdraws path from upstream ASes
BGP link failure NSDI '087 5: 4-5 5: : 1-5 5: 4-5 5: Destination AS 2 and 3 pick their next best paths AS 2 and 3 pick their next best paths Routing loop is formed!
BGP policy change NSDI ' AS4 wants all traffic destined for AS5 to come through AS6 5: 4-5 5: : : 1-5 5: 4-5 5: : : 4-5 5: :4-5 AS4 withdraws the path from AS2 and AS3 AS4 withdraws the path from AS2 and AS Destination
BGP policy change NSDI ' : 4-5 5: : : 1-5 5: 4-5 5: : : 4-5 5: Destination AS 2 and 3 pick their next best paths AS 2 and 3 pick their next best paths Routing loop is formed!
Lack of Consistency NSDI '0810 The underlying cause of all these problems is inconsistent global state – Link failures – Traffic engineering – Scheduled Maintenance – Link coming up Protocol behavior complex, unpredictable No indicator of when system converged to consistent state
Motivation and Goal NSDI '0811 Goal: – Networks that have high availability Insight: – Consistency is the key
Consensus Routing NSDI '0812 Lesson from distributed system design: – De-couple safety and liveliness Safety: Forwarding tables are always consistent and policy compliant, consistent view of global state Liveness: Routing system adapts to failures quickly and maintains high availability
Safety: Stable Mode NSDI '0813 Problem: Inconsistent state Solution: – Apply updates only after they have reached all depend ent ASes – Apply updates synchronously across ASes
Stable Mode Consistent view of global state – Stable Forwarding Table (SFT) at k th epoch 1.Update log 2.Distributed snapshot 3.Frontier computation 4.SFT computation 5.View change NSDI '0814
Update log NSDI ' ASes compute and forward routes as before, but don’t apply to forwarding table
Distributed Snapshot NSDI '0816 Some node(s) calls for the (k+1) th distributed snapshot 1.Run BGP, but don’t apply the updates Periodically, a distributed snapshot is taken Updates in transit, or being processed are marked incomplete
Frontier Computation: Aggregation * frontier: the most recent complete update at each AS NSDI '0817 ASes send snapshot report to the consolidators 1.the saved sequence of updates 2.the set of incomplete updates ASes send snapshot report to the consolidators 1.the saved sequence of updates 2.the set of incomplete updates Consolidators 1.Run BGP, but don’t apply the updates 2.Distributed Snapshot
Frontier Computation: Consensus NSDI ' Run BGP, but don’t apply the updates 2.Distributed Snapshot 3.Send info to consolidators Consolidators run a consensus algorithm to agree on the set of incomplete updates Consolidators run a consensus algorithm to agree on the set of incomplete updates Consolidators
Frontier Computation: Flood NSDI '0819 Consolidators Consolidators flood the incomplete set to all the ASes 1.Run BGP, but don’t apply the updates 2.Distributed Snapshot 3.Send info to consolidators 4.Consensus
SFT Computation & View Change Details and proof of consistency in the paper NSDI ' Run BGP, but don’t apply the updates 2.Distributed Snapshot 3.Send info to consolidators 4.Consensus 5.Flood Apply completed updates Versioning, Garbage collection
Mechanism NSDI '0821 Other details in the paper: – Transition between epochs – Slow/unresponsive ASes – Failed ASes – Reintegration of failed ASes – Provable safety and liveness properties
Transient Mode: Liveness Problem: Upon link failure, need to wait till path reaches everyone Solution: Dynamically re-route around the failed link – use existing techniques Pre-computed backup paths Deflection Detour routing NSDI '0822
Routing Deflection NSDI '0823 S S Destination D D 3 3 deflect packet to neighbor traverse a different route
Backtracking NSDI '0824 S S Destination backtracking D D
Detour Routing NSDI '0825 S S Destination tunnel D D 3 3 B B Tier 1 B is responsible for f orwarding packets
Backup routes Pre-computed failover paths e.g. RBGP, scheme for pre-computing backup routes to each destination NSDI '0826
BGP NSDI '0827 Time Connectivity Link Failure (or other BGP event) Link Failure (or other BGP event) BGP converges to alternate path BGP converges to alternate path Global reachability Completely Unreachable
Consensus Routing NSDI '0828 Time Connectivity Global reachability Completely Unreachable Time Connectivity Global reachability Completely Unreachable Link Failure (or other BGP event) Link Failure (or other BGP event) Switch to transient routing Switch to transient routing Snapshot
Evaluation In the talk, answer the following: – How does consensus routing affect connectivity? – What is the traffic overhead? Methodology – Extensive simulations on realistic Internet-scale topologies. – an implemented XORP prototype. – experiments on PlanetLab. NSDI '0829
Methodology NSDI ' Fail each access link of each multi-homed stub AS Fail each access link of each multi-homed stub AS See what fraction of ASes are temporarily disconnected until convergence See what fraction of ASes are temporarily disconnected until convergence 23,390 ASes, 46,095 links 9,100 multi-homed stub AS
Connectivity Consensus routing maintains complete connectivity in over 99% of the cases BGP maintains complete connectivity in < 40% of the failure cases BGP maintains complete connectivity in < 40% of the failure cases NSDI '0831
Overhead Entire update is not sent, only identifiers of the updates overhead NSDI '0832
Conclusions BGP’s transient problems are due to inconsistent global state Consensus routing enables consistent routing state with opaque policies – key technique: separation of safety and liveness We can have an Internet that has high availability! NSDI '0833