Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman.

Similar presentations


Presentation on theme: "Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman."— Presentation transcript:

1 Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research http://www.research.att.com/~jrex Joint work with Renata Teixeira, Aman Shaikh, and Timothy Griffin

2 2 Network “Operations” Research  Understand key operations tasks Traffic engineering Planned maintenance Problem troubleshooting  Develop sound models and methods Problem the operator is trying to solve Underlying protocols and mechanisms Control knobs the operator to tune  Incorporate network measurements Characterize system behavior Detect and diagnose problems Input into “what if” models

3 3 Outline  Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing  Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams  Performance evaluation Characterization on AT&T’s network Implications on operational practices  Conclusions and future directions

4 4 Autonomous Systems 1 2 3 4 5 6 7 Client Web server AS path: 6, 5, 4, 3, 2, 1 Multiple links Middle of path

5 5 Interdomain Routing (BGP)  Border Gateway Protocol (BGP) IP prefix: block of destination IP addresses AS path: sequence of ASes along the path  Policy configuration by the operator Path selection: which of the paths to use? Path export: which neighbors to tell? 1 23 12.34.158.5 “I can reach 12.34.158.0/24” “I can reach 12.34.158.0/24 via AS 1”

6 6 Intradomain Routing (IGP)  Interior Gateway Protocol (OSPF and IS-IS) Shortest path routing based on link weights Routers flood link-state information to each other Routers compute “next hop” to reach other routers  Link weights configured by the operator Simple heuristics: link capacity or physical distance Traffic engineering: tuning link weights to the traffic 3 2 2 1 1 3 1 4 5 3 Path cost: 2+1+5

7 7 Two-Level Internet Routing  Hierarchical routing Intra-domain Metric based Inter-domain Reachability and policy  Design principles Scalability Isolation Simplicity of reasoning AS 1 AS 2 AS 3 intra-domain routing (IGP) inter-domain routing (BGP) Autonomous system (AS) = network with unified administrative routing policy (ex. AT&T, Sprint, UCSD)

8 8 packet to dst Coupling: Hot-Potato Routing dst X ISP network Y 10 9 11 Z Hot-potato routing = ISPs policy of routing to closest exit point when there is more than one route to destination Consequences:  Routers CPU overload  Transient forwarding instability  Traffic shift  Inter-domain routing changes ISP network failure planned maintenance traffic engineering Routes to thousands of destinations switch exit point!!!

9 9 BGP Decision Process  Ignore if exit point unreachable  Highest local preference  Lowest AS path length  Lowest origin type  Lowest MED (with same next hop AS)  Lowest IGP cost to next hop  Lowest router ID of BGP speaker Hot potato

10 10 Hot-Potato Routing Model  Cost vector for Z: c X =10, c W =8, and c Y =9  Egress set for dst: {X, Y}  Best route for Z: through Y, which is closest X Y 10 9 9 Z dst 8 W Hot-potato change: change in cost vector causes change in best route

11 11 The Big Picture Hot-potato routing changes (interdomain changes caused by intradomain changes)

12 12 Why Care about Hot Potatoes?  Understanding of Internet routing Frequency of hot-potato routing changes Influence on end-to-end performance  Operational practices Knowing when hot-potato changes happen Avoiding unnecessary hot-potato changes Analyzing externally-caused BGP updates  Distributed root cause analysis Each AS can tell what BGP updates it caused Someone should know why each change happened

13 13 Outline  Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing  Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams  Performance evaluation Characterization on AT&T’s network Implications on network practices  Conclusions and future directions

14 14 Why is This So Darn Hard?  Noisy signals Multiple IGP messages per event Multiple BGP messages per change Large background of BGP updates  Routing protocols Protocols don’t divulge their reasons Highly-configurable BGP policies Vendor-specific details (e.g., timers)  Monitoring limitations Limited number of vantage points Delivering the measurement data Time synchronization across collectors

15 15 Our Approach  Measure both protocols BGP and OSPF monitors  Correlate the two streams Match BGP updates with OSPF events  Analyze the interaction X Y Z M AT&T backbone OSPF messages BGP updates

16 16 Heuristic for Matching Classify BGP updates by possible OSPF causes Transform stream of OSPF messages into routing changes link failure refresh weight change chg cost del chg cost Match BGP updates with OSPF events that happen close in time Stream of OSPF messages Stream of BGP updates time

17 17 Pre-processing OSPF LSAs  Transform OSPF messages into routing changes from a router’s perspective M X Y Z 1 1 1 2 1 2 2 10 LSA weight change, 10 10 LSA weight change, 10 X 5 Y 4 CHG Y, 7 X 5 Y 7 LSA delete LSA add, 1 DEL X Y 7 ADD X, 5 X 5 Y 7 OSPF routing changes: 2 1

18 18 Classifying BGP Updates BGP update from Z Announcement of dst, X Withdrawal of dst, Y Replacement of route to dst different route through Y See next slide! Y X X Y Z dst M = “Can’t be caused by a cost change”

19 19 Classifying BGP Updates route via X is better route via X is worse routes are equally good CHG X, CHG Y? Y X X Y Z dst M

20 20 The Role of Time  IGP link-state advertisements Multiple LSAs from a single physical event Group into single cost vector change  BGP update messages Multiple BGP updates during convergence Group into single BGP routing change  Matching IGP to BGP Avoid matching unrelated IGP and BGP changes Match related changes that are close in time Characterize the measurement data to determine the right windows 10 sec 70 sec 180 sec

21 21 Outline  Internet routing Interdomain and intradomain routing Coupling due to hot-potato routing  Measuring hot-potato routing Measuring the two routing protocols Correlating the two data streams  Performance evaluation Characterization on AT&T’s network Implications on network practices  Conclusions and future directions

22 22 Summary Results (June 2003)  High variability in % of BGP updates  One LSA can have a big impact locationminmaxdays > 10% close to peers0%3.76%0 between peers0%25.87%5 locationno impactprefixes impacted close to peers97.53%less than 1% between peers97.17%55%

23 23 Delay for BGP Routing Change  Router vendor scan timer BGP table scan every 60 seconds OSPF changes arrive uniformly in interval  Internal BGP hierarchy (route reflectors) Wait for another router to change best route Introduces a wait for a second BGP scan  Transmitting many BGP messages Latency for transferring the data We have seen delays of up to 180 seconds!

24 24 Delay for BGP Change (1 st Prefix) Cisco BGP scan timer Two BGP scans

25 25 iBGP Route Reflectors 20 10 8 9 X Y Z W dst dst Y, 18 dst W, 20 11 dst Y, 21 dst W, 20 Announcement X dst X,19 dst W, 20 Scalability trade-off: Less BGP state vs. Number of BGP updates from Z and longer convergence delay

26 26 Transferring Multiple Prefixes

27 27 BGP Updates Over Prefixes Cumulative %BGP updates % prefixes Non-OSPF triggered All OSPF-triggered OSPF-triggered BGP updates affects ~50% of prefixes uniformly prefixes with only one exit point Contrast with non-OSPF triggered BGP updates

28 28 Operational Implications  Forwarding plane convergence Accuracy of active measurements  Router proximity to exit points Likelihood of hot-potato routing changes  Cost in/out of links during maintenance Avoid triggering BGP routing changes

29 29 Forwarding Convergence R1R1 R2R2 dst 10 100 10 111 R 2 starts using R1 to reach dst Scan process runs in R 2 R 1 ’s scan process can take up to 60 seconds to run Packets to dst may be caught in a loop for 60 seconds!

30 30 Measurement Accuracy  Measurements of customer experience Probe machines have just one exit point! R1R1 R2R2 dst 10 100 111 loop to reach dst W1W1 W2W2

31 31 Avoid Equal-distance Exits Z 10 X Y Z 1000 1 X Y dst Small changes will make Z switch exit points to dst More robust to intra-domain routing changes

32 32 Careful Cost in/out Links Z X Y 5 5 10 dst 4 100 Traffic is more predictable Faster convergence Less impact on neighbors

33 33 Ongoing Work  Black-box testing of the routers Scan timer and its effects (forwarding loops) Vendor interactions (with Cisco)  Impact of the IGP-triggered BGP updates Changes in the flow of traffic Forwarding loops during convergence Externally visible BGP routing changes  Improving isolation (cooling those potatoes!) Operational practices: preventing interactions Protocol design: weaken the IGP/BGP coupling Network design: internal topology/architecture

34 34 Thanks!  Any questions?

35 35 Exporting Routing Instability Z X Y Z X Y dst Announcement No change => no announcement

36 36 What to do?  Increase estimate for forwarding convergence For destinations/customers with multiple exit points  Extensions to measurement infrastructure Multiple access links for a probe machine Multiple probe machines with same address  Better BGP implementation on the router Decrease scan timer (maybe too much overhead?) Event-driven IGP/BGP implementation

37 37 Time Lag OSPF-triggered BGP updates for June 25 th, 2003 Cumulative %BGP updates time BGP – time OSPF (seconds) ~15% of OSPF-triggered BGP updates in a day most OSPF-triggered BGP updates lag from 10 to 60 seconds

38 38 Challenges  Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z M dst Example 1: BGP update caused by OSPF BGP: announcement: dst, X OSPF: CHG: X, 8 9 10 dst, Ydst, X 8

39 39 M Challenges  Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z dst Example 2: BGP update NOT caused by OSPF 9 10 dst, Ydst, X BGP: announcement: dst, X

40 40 M Challenges  Lack of information on routing messages Routing protocols are designed to determine a path between two hosts, but not to give reason X Y Z dst Example 2: BGP update NOT caused by OSPF 9 10 dst, Ydst, X 8 BGP: announcement: dst, X OSPF: CHG: X, 8


Download ppt "Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman."

Similar presentations


Ads by Google