Improved BGP convergence via Ghost Flushing Yehuda Afek Anat Bremler-Barr Shemer Schwarzd המרכז הבינתחומי הרצליה
Problem: BGP Convergence [Labovitz,Ahuja,Bose,Jahanian] BGP may take up to 15 minutes to converge. Here: Reduce the worst case from minutes to seconds, in a practical way
Problem: BGP Convergence [Labovitz,Ahuja,Bose,Jahanian] Events Time (sec ’ s, minRouteAdver=30) E-Down 30 n n 10,000, up to 15 minutes E-Up 30d d E-Up 30 d d 30, d=diameter E-Longer 2 30 l l == path length E-Shorter 30d E-Shorter 30 d Here: E-down = l time units (unit = link delay) E-Longer = 30 d
Agenda BGP overview The BGP convergence problem Ghost buster rule Ghost flushing rule Simulation results
BGP protocol Distance (Path) vector protocol Receive AS-path from the neighbors Chooses the best one (shortest) Eliminates Routing loops using the AS-path Two kinds of messages: Announcements and Withdrawal
Problem: Ghost information One Ghost (old information) makes many, and in the network it continues recursively dst dst: 0 0 t=0 withdraw
Problem: Ghost information dst dst: 2 0 dst: dst: {} t=1 annc:2 0 annc:1 0 One Ghost (old information) makes many, and in the network it continues recursively
Problem: Ghost information dst dst: {} dst: dst: dst: {} t=2 withdraw One Ghost (old information) makes many, and in the network it continues recursively
MinRouteAdver Effect MinRouteAdver – wait 30 sec ’ s before sending an announcement again Applies to announcements only, not on withdrawals Motivation to reduce messages
Problem: Ghost information dst dst: {} dst: dst: dst: {} t=3t=4t=5t=6 One Ghost (old information) makes many, and in the network it continues recursively minRouteAdver: Wait 30 sec ’ s before sending the next announcement (BGP) t=28t=27t=24t=22t=20t=17t=16t=14t=11t=10t=8t=7t=31 annc: annc: 2 1 0
Problem: MinRouteAdver Effect MinRouteAdver – delays the elimination of ghost information dst dst: {} dst: dst: dst: {} t=31 annc: annc: 2 1 0
E_Down convergence MessageTime nE30nBGP In the clique (size 4) example the scenario ends after 62 sec (= 30(n-2) )
Without MinRouteAdver Avalanche of Messages O(n!) Explore all possible paths of length 1, 2 … dst dst: 0 0 dst: {} t=0 1 : : : : : : : : : : : : 4 0 withdrawal
Without MinRouteAdver dst dst: 2 0 dst: dst: {} t=0.1 1 : : : : : : : : : : : : 4 0 annc: 2 0 annc: 1 0 Avalanche of Messages O(n!) Explore all possible paths of length 1, 2 …
Without MinRouteAdver dst dst: 20 0 dst: 3 0 dst: {} t=0.2 1 : : : : : : : : : : : : 4 0 annc: 2 0 annc:3 0 annc:2 0 Avalanche of Messages O(n!) Explore all possible paths of length 1, 2 …
Without MinRouteAdver dst dst: 3 0 dst: 4 0 dst: dst: {} t=0.3 1 : : : : : : : : : : : : 4 0 annc:3 0 annc:4 0 Avalanche of Messages O(n!) Explore all possible paths of length 2, 3 … annc:3 0
Without MinRouteAdver dst dst: 4 0 dst: dst: 4 0 dst: {} t=0.4 1 : : : : : : : : : : : : 4 0 annc:4 0 annc: annc:4 0 Avalanche of Messages O(n!) Explore all possible paths of length 2, 3 … annc:4 0
E_Down convergence MessageTime nE30nBGP with MinRouteAdver n!Ehn h=one link delay BGP without MinRouteAdver
Related Work Introducing the problem [Labovitz,Ahuja,Bose,Jahanian], [Labovitz,Wattenhofer,Venkatachary,Ahuja] real life evidence theoretical analysis Experimental analysis [Griffin,Premore] Solution Works in Counting to Infinity: Adding states [Garcia-Luna-Aceves] – EIGRP like … Route Poisoning with Hold-down [Cisco:Rutgers] – IGRP like... Routes consistency [Pei,Zhao,Wang,Massey,Mankin,Wu,Zhang]
Ghost flushing rule If ASpath to dst is longer and cannot send annoucement (due to minRouteAdver rule ) then send withdrawal Motivation: Flush the ghost information ASAP
Ghost Flushing example dst dst: 0 0 t=0 withdraw
Ghost Flushing example dst dst: 2 0 dst: dst: {} t=1 annc:2 0 annc:1 0
Ghost Flushing example dst dst: {} dst: dst: dst: {} t=2 withdraw 2,3,4 send “ flushing ” withdrawal: since their ASpath is changed and minRouteAdver timer did not elapsed. withdraw
Ghost Flushing example dst dst: {} dst: dst: dst: {} t=2 withdraw Longer ASpath & minRouteAdver timer Send “ flushing ” withdrawal withdraw
Ghost Flushing example dst dst: {} 0 t=3 withdraw
Analysis: Time convergence of ghost flushing rule, E_down In each time unit (=h, maximum link delay), ghost information is erased to a distance greater by one After k time units, ghost information ASpath with length < k has disappeared. Longest Ghost ASpath = n (in theory). Hence (worst case) time convergence: nh
E_Down convergence MessageTime nE30nBGP with MinRouteAdver n!Ehn h=one link delay BGP without MinRouteAdver 2Ehn/30hnGhost flushing
Ghost Buster Rule The convergence time is better than expected !!!! Explanation: The minRouteAdver blocks the propagation of ghost information, while the flushing withdrawal “ eats ” the ghost information. Bad (wrong) news propagate slowly
Analysis: Ghost buster rule Add to the ghost flushing rule: Router sends announcement, only after delta time MinRouteAdver similar to delta: Common implementation: MinRouteAdver per peer And, timer almost always on (lots of BGP announcements !)
Analysis: Time convergence of ghost buster rule The ghost information disappears at time t: d+t/(delta+h) = t/h Every delta+h time the length of the maximum ghost ASpath is increased by one. Every h time, the length of the minimum ghost ASpath is increased by one. After the failure the length of the maximum ghost ASpath is d (diameter). Hence: t = kdh/(k-1) d, where k = (delta+h)/h is the rate of the algorithm
E_Down convergence MessageTime nE30nBGP with MinRouteAdver n!Ehn h=one link delay BGP without MinRouteAdver 2Ehn/30hnGhost flushing 2Ehkd/30(k-1) kdh/k-1 dh d=diameter k=(delta+h)/h Ghost flushing With Ghost buster
The effect on E_longer dst BGP: Convergence time dominated: 1. Time until ghost information vanishes 2. Time until backup path propagates in Ghost flushing: helps the first factor
The effect on E_longer Original BGP may err: MinRouteAdver peer stores wrong ASPath BGP may err and send the packet in the wrong direction Ghost flushing: send withdrawal to a peer. Perhaps by a chance there may be an alternative path there.
Simulation: BGP code Shortest path metric Delay on link between 0.2 to 2 sec MinRouteAdver randomly in 0 to 30 sec
Simulation: Clique E-down
Simulation: ISP topology dst
Example: Core Internet (ASes) Ghost Flushing BGPIn-degreeOut-degree
E_longer: Convergence Time dst
E_longer: ISP Topology
Conclusion Reduced convergence time from minutes to sec ’ s. Does not hurt in other cases Ghost flushing - no change at BGP messages Ghost buster solution – a new counting to infinity solution BGP very sensitive to minor modifications.