Download presentation
Presentation is loading. Please wait.
1
Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research http://www.research.att.com/~jrex Joint work with Renata Teixeira, Aman Shaikh, and Timothy Griffin
2
2 Outline Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams Performance evaluation Characterization on AT&T’s network Implications on network practices Conclusions and future directions
3
3 Autonomous Systems 1 2 3 4 5 6 7 Client Web server AS path: 6, 5, 4, 3, 2, 1
4
4 Interdomain Routing (BGP) Border Gateway Protocol (BGP) IP prefix: block of destination IP addresses AS path: sequence of ASes along the path Policy configuration by the operator Path selection: which of the paths to use? Path export: which neighbors to tell? 1 23 12.34.158.5 “I can reach 12.34.158.0/24” “I can reach 12.34.158.0/24 via AS 1”
5
5 Intradomain Routing (IGP) Interior Gateway Protocol (OSPF and IS-IS) Shortest path routing based on link weights Routers flood link-state information to each other Routers compute “next hop” to reach other routers Weights configuration by the operator Simple heuristics: link capacity or physical distance Traffic engineering: tuning link weights to the traffic 3 2 2 1 1 3 1 4 5 3
6
6 Two-Level Internet Routing Hierarchical routing Intra-domain Metric based Inter-domain Reachability and policy Design principles Scalability Isolation Simplicity of reasoning AS 1 AS 2 AS 3 intra-domain routing (IGP) inter-domain routing (BGP) Autonomous system (AS) = network with unified administrative routing policy (ex. AT&T, Sprint, UCSD)
7
7 packet to dst Motivation dst X ISP network Y 10 9 11 Z Hot-potato routing = ISPs policy of routing to closest exit point when there is more than one route to destination Consequences: Routers CPU overload Transient forwarding instability Traffic shift Inter-domain routing changes ISP network failure planned maintenance traffic engineering Routes to thousands of destinations switch exit point!!!
8
8 BGP Decision Process Ignore if exit point unreachable Highest local preference Lowest AS path length Lowest origin type Lowest MED (with same next hop AS) Lowest IGP cost to next hop Lowest router ID of BGP speaker Hot potato
9
9 Outline Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams Performance evaluation Characterization on AT&T’s network Implications on network practices Conclusions and future directions
10
10 Why is This So Darn Hard? Noisy signals Single event can cause multiple IGP messages Large amount of background BGP updates Multiple messages for single BGP routing change Protocol implementation Routing protocols provide limited information High complexity of BGP due to configurable policies Many vendor-specific details, such as timers Monitoring limitations Cannot collect data from every vantage point Delays in delivering data to the collection machine Time synchronization across multiple collectors
11
11 Our Approach Collect measurement of both protocols BGP monitor and OSPF monitor Correlate the two streams of data Match BGP updates with OSPF events Analyze the protocol interaction X Y Z M AT&T backbone OSPF messages BGP updates
12
12 Challenges Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z M dst Example 1: BGP update caused by OSPF BGP: announcement: dst, X OSPF: CHG: X, 8 9 10 dst, Ydst, X 8
13
13 M Challenges Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z dst Example 2: BGP update NOT caused by OSPF 9 10 dst, Ydst, X BGP: announcement: dst, X
14
14 M Challenges Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z dst Example 2: BGP update NOT caused by OSPF 9 10 dst, Ydst, X 8 BGP: announcement: dst, X OSPF: CHG: X, 8
15
15 Heuristic for Matching Classify BGP updates by possible OSPF causes Transform stream of OSPF messages into routing changes link failure refresh weight change chg cost del chg cost Match BGP updates with OSPF events that happen close in time Stream of OSPF messages Stream of BGP updates time
16
16 Pre-processing OSPF LSAs Transform OSPF messages into routing changes from a router’s perspective M X Y Z 1 1 1 2 1 2 2 10 LSA weight change, 10 10 LSA weight change, 10 X 5 Y 4 CHG Y, 7 X 5 Y 7 LSA delete LSA add, 1 DEL X Y 7 ADD X, 5 X 5 Y 7 OSPF routing changes: 2 1
17
17 Classifying BGP Updates BGP update from Z Announcement of dst, X Withdrawal of dst, Y Replacement of route to dst different route through Y ADD X? DEL Y? ADD X? CHG X or CHG Y? Y X X Y Z dst M
18
18 Classifying BGP Updates route via X is better route via X is worse routes are equally good ADD X? DEL Y? ADD X? CHG X, CHG Y? Y X X Y Z dst M
19
19 Outline Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams Performance evaluation Characterization on AT&T’s network Implications on network practices Conclusions and future directions
20
20 Time Lag OSPF-triggered BGP updates for June 25 th, 2003 Cumulative %BGP updates time BGP – time OSPF (seconds) ~15% of OSPF-triggered BGP updates in a day most OSPF-triggered BGP updates lag from 10 to 60 seconds
21
21 Results for June 2003 High variability according to location and day Impact on external BGP measurements and customers One LSA can have a big impact locationminmaxdays > 10% close to peers0%3.76%0 between peers0%25.87%5 locationno impactprefixes impacted close to peers97.53%less than 1% between peers97.17%55%
22
22 BGP Updates Over Prefixes Cumulative %BGP updates % prefixes Non-OSPF triggered All OSPF-triggered OSPF-triggered BGP updates affects ~50% of prefixes uniformly prefixes with only one exit point Contrast with non-OSPF triggered BGP updates
23
23 Operational Implications Forwarding plane convergence Accuracy of active measurements Router proximity to exit points Likelihood of hot-potato routing changes Cost in/out of links during maintenance Avoid triggering BGP routing changes More complexity with route reflectors Longer delays and more BGP messages
24
24 Forwarding Convergence R1R1 R2R2 dst 10 100 10 111 R 2 starts using R1 to reach dst Scan process runs in R 2 R 1 ’s scan process can take up to 60 seconds to run Packets to dst may be caught in a loop for 60 seconds!
25
25 Measurement Accuracy Measurements of customer experience Probe machines have just one exit point! R1R1 R2R2 dst 10 100 111 loop to reach dst W1W1 W2W2
26
26 What to do? Increase estimate for forwarding convergence For destinations/customers with multiple exit points Extensions to measurement infrastructure Multiple access links for a probe machine Multiple probe machines with same address Better BGP implementation on the router Decrease scan timer (maybe too much overhead?) Event-driven IGP/BGP implementation
27
27 Avoid Equal-distance Exits Z 10 X Y Z 1000 1 X Y dst Small changes will make Z switch exit points to dst More robust to intra-domain routing changes
28
28 Careful Cost in/out Links Z X Y 5 5 10 dst 4 100 Traffic is more predictable Faster convergence Less impact on neighbors
29
29 iBGP Route Reflectors 20 10 8 9 X Y Z W dst dst Y, 18 dst W, 20 11 dst Y, 21 dst W, 20 Announcement X dst X,19 dst W, 20 Scalability trade-off: Less BGP state vs. Number of BGP updates from Z and longer convergence delay
30
30 Ongoing Work Reduction of false matches Compare with conservative analysis (lower bound) Cluster BGP updates and IGP LSAs in time Black-box testing of the routers Scan timer and its effects (forwarding loops) Vendor interactions (with Cisco) Impact of the IGP-triggered BGP updates Changes in the flow of traffic Externally visible BGP routing changes Modeling the protocol interaction Understand impact of router location
31
31 Future Directions Improving isolation (cooling those potatoes!) Operational practices: preventing interactions Protocol design: stronger decoupling Network design: internal topology/architecture Extending our monitoring architecture Data from multiple vantage points Real time correlation of data streams Automatic generation of alarms/reports Better route monitoring Router support for special monitoring sessions Protocol extensions to help in troubleshooting Diagnose problems, and read the router’s mind!
32
32 Exporting Routing Instability Z X Y Z X Y dst Announcement No change => no announcement
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.