Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and.

Similar presentations


Presentation on theme: "Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and."— Presentation transcript:

1 Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research http://www.research.att.com/~jrex Joint work with Renata Teixeira, Aman Shaikh, and Timothy Griffin

2 2 Outline  Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing  Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams  Performance evaluation Characterization on AT&T’s network Implications on network practices  Conclusions and future directions

3 3 Autonomous Systems 1 2 3 4 5 6 7 Client Web server AS path: 6, 5, 4, 3, 2, 1

4 4 Interdomain Routing (BGP)  Border Gateway Protocol (BGP) IP prefix: block of destination IP addresses AS path: sequence of ASes along the path  Policy configuration by the operator Path selection: which of the paths to use? Path export: which neighbors to tell? 1 23 12.34.158.5 “I can reach 12.34.158.0/24” “I can reach 12.34.158.0/24 via AS 1”

5 5 Intradomain Routing (IGP)  Interior Gateway Protocol (OSPF and IS-IS) Shortest path routing based on link weights Routers flood link-state information to each other Routers compute “next hop” to reach other routers  Weights configuration by the operator Simple heuristics: link capacity or physical distance Traffic engineering: tuning link weights to the traffic 3 2 2 1 1 3 1 4 5 3

6 6 Two-Level Internet Routing  Hierarchical routing Intra-domain Metric based Inter-domain Reachability and policy  Design principles Scalability Isolation Simplicity of reasoning AS 1 AS 2 AS 3 intra-domain routing (IGP) inter-domain routing (BGP) Autonomous system (AS) = network with unified administrative routing policy (ex. AT&T, Sprint, UCSD)

7 7 packet to dst Motivation dst X ISP network Y 10 9 11 Z Hot-potato routing = ISPs policy of routing to closest exit point when there is more than one route to destination Consequences:  Routers CPU overload  Transient forwarding instability  Traffic shift  Inter-domain routing changes ISP network failure planned maintenance traffic engineering Routes to thousands of destinations switch exit point!!!

8 8 BGP Decision Process  Ignore if exit point unreachable  Highest local preference  Lowest AS path length  Lowest origin type  Lowest MED (with same next hop AS)  Lowest IGP cost to next hop  Lowest router ID of BGP speaker Hot potato

9 9 Outline  Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing  Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams  Performance evaluation Characterization on AT&T’s network Implications on network practices  Conclusions and future directions

10 10 Why is This So Darn Hard?  Noisy signals Single event can cause multiple IGP messages Large amount of background BGP updates Multiple messages for single BGP routing change  Protocol implementation Routing protocols provide limited information High complexity of BGP due to configurable policies Many vendor-specific details, such as timers  Monitoring limitations Cannot collect data from every vantage point Delays in delivering data to the collection machine Time synchronization across multiple collectors

11 11 Our Approach  Collect measurement of both protocols BGP monitor and OSPF monitor  Correlate the two streams of data Match BGP updates with OSPF events  Analyze the protocol interaction X Y Z M AT&T backbone OSPF messages BGP updates

12 12 Challenges  Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z M dst Example 1: BGP update caused by OSPF BGP: announcement: dst, X OSPF: CHG: X, 8 9 10 dst, Ydst, X 8

13 13 M Challenges  Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z dst Example 2: BGP update NOT caused by OSPF 9 10 dst, Ydst, X BGP: announcement: dst, X

14 14 M Challenges  Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z dst Example 2: BGP update NOT caused by OSPF 9 10 dst, Ydst, X 8 BGP: announcement: dst, X OSPF: CHG: X, 8

15 15 Heuristic for Matching Classify BGP updates by possible OSPF causes Transform stream of OSPF messages into routing changes link failure refresh weight change chg cost del chg cost Match BGP updates with OSPF events that happen close in time Stream of OSPF messages Stream of BGP updates time

16 16 Pre-processing OSPF LSAs  Transform OSPF messages into routing changes from a router’s perspective M X Y Z 1 1 1 2 1 2 2 10 LSA weight change, 10 10 LSA weight change, 10 X 5 Y 4 CHG Y, 7 X 5 Y 7 LSA delete LSA add, 1 DEL X Y 7 ADD X, 5 X 5 Y 7 OSPF routing changes: 2 1

17 17 Classifying BGP Updates BGP update from Z Announcement of dst, X Withdrawal of dst, Y Replacement of route to dst different route through Y ADD X? DEL Y? ADD X? CHG X or CHG Y? Y X X Y Z dst M

18 18 Classifying BGP Updates route via X is better route via X is worse routes are equally good ADD X? DEL Y? ADD X? CHG X, CHG Y? Y X X Y Z dst M

19 19 Outline  Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing  Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams  Performance evaluation Characterization on AT&T’s network Implications on network practices  Conclusions and future directions

20 20 Time Lag OSPF-triggered BGP updates for June 25 th, 2003 Cumulative %BGP updates time BGP – time OSPF (seconds) ~15% of OSPF-triggered BGP updates in a day most OSPF-triggered BGP updates lag from 10 to 60 seconds

21 21 Results for June 2003  High variability according to location and day Impact on external BGP measurements and customers  One LSA can have a big impact locationminmaxdays > 10% close to peers0%3.76%0 between peers0%25.87%5 locationno impactprefixes impacted close to peers97.53%less than 1% between peers97.17%55%

22 22 BGP Updates Over Prefixes Cumulative %BGP updates % prefixes Non-OSPF triggered All OSPF-triggered OSPF-triggered BGP updates affects ~50% of prefixes uniformly prefixes with only one exit point Contrast with non-OSPF triggered BGP updates

23 23 Operational Implications  Forwarding plane convergence Accuracy of active measurements  Router proximity to exit points Likelihood of hot-potato routing changes  Cost in/out of links during maintenance Avoid triggering BGP routing changes  More complexity with route reflectors Longer delays and more BGP messages

24 24 Forwarding Convergence R1R1 R2R2 dst 10 100 10 111 R 2 starts using R1 to reach dst Scan process runs in R 2 R 1 ’s scan process can take up to 60 seconds to run Packets to dst may be caught in a loop for 60 seconds!

25 25 Measurement Accuracy  Measurements of customer experience Probe machines have just one exit point! R1R1 R2R2 dst 10 100 111 loop to reach dst W1W1 W2W2

26 26 What to do?  Increase estimate for forwarding convergence For destinations/customers with multiple exit points  Extensions to measurement infrastructure Multiple access links for a probe machine Multiple probe machines with same address  Better BGP implementation on the router Decrease scan timer (maybe too much overhead?) Event-driven IGP/BGP implementation

27 27 Avoid Equal-distance Exits Z 10 X Y Z 1000 1 X Y dst Small changes will make Z switch exit points to dst More robust to intra-domain routing changes

28 28 Careful Cost in/out Links Z X Y 5 5 10 dst 4 100 Traffic is more predictable Faster convergence Less impact on neighbors

29 29 iBGP Route Reflectors 20 10 8 9 X Y Z W dst dst Y, 18 dst W, 20 11 dst Y, 21 dst W, 20 Announcement X dst X,19 dst W, 20 Scalability trade-off: Less BGP state vs. Number of BGP updates from Z and longer convergence delay

30 30 Ongoing Work  Reduction of false matches Compare with conservative analysis (lower bound) Cluster BGP updates and IGP LSAs in time  Black-box testing of the routers Scan timer and its effects (forwarding loops) Vendor interactions (with Cisco)  Impact of the IGP-triggered BGP updates Changes in the flow of traffic Externally visible BGP routing changes  Modeling the protocol interaction Understand impact of router location

31 31 Future Directions  Improving isolation (cooling those potatoes!) Operational practices: preventing interactions Protocol design: stronger decoupling Network design: internal topology/architecture  Extending our monitoring architecture Data from multiple vantage points Real time correlation of data streams Automatic generation of alarms/reports  Better route monitoring Router support for special monitoring sessions Protocol extensions to help in troubleshooting Diagnose problems, and read the router’s mind!

32 32 Exporting Routing Instability Z X Y Z X Y dst Announcement No change => no announcement


Download ppt "Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and."

Similar presentations


Ads by Google