Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard.

Similar presentations


Presentation on theme: "Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard."— Presentation transcript:

1 Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard Mortier, Christophe Diot

2 2 Link Utilization Internet backbone link Routing loop causes increase by 25%!

3 3 Overview  Routing protocols have much impact on the performance of the network  How do we detect them?  How often do loops occur?  How do they impact loss and delay?  Analyze causes of loops  What causes them?

4 4 Possible Causes of Routing Loops  Persistent routing loops  E.g., due to misconfiguration.  Loops can last hours if undetected.  Transient routing loops  Routing state is dynamic.  Inconsistencies in routing state can cause loops.  Inconsistencies should disappear within seconds/minutes.  Expectation: Loops last seconds/minutes.

5 5 How Can Transient Routing Loop Occur? R2 R3 R1

6 6 Detection of “Loops” in Packet Traces  Detect replicas in a packet trace  Packets with exact same header but for TTL,CRC  TTL difference: 2 or larger  Set of replicas = Packet Loop  Set of packet loops associated with a routing event = Routing Loop

7 7 Traces  Backbone traces  NYC and SJ links from Nov. 8 th, 2001  NYC links from Oct. 9 th, 2002

8 8 Packet Traces TraceLengthAvg BWPacketsLooped (hours)(Mbps) Total (10 6 ) Packets Backbone 1241504.839% Backbone 27.52431 6770.118% Backbone 3112.2201.687% Backbone 41110713500.026% …loops occur in bursts and can affect up to 25% of packets! On average, loops do not affect much traffic, but…

9 9 Observations about Packet Loops  General Observations  Loop size: # of nodes involved in packet loop  Number of replicas in packet loop  Properties of packet loops  Packet types  Duration  Of packet loops in packets

10 10 Loop Size Loop size: value by which TTL field in packet loops gets decremented. Figure 2

11 11 Packet Loop Length How often does a packet show up before it expires? Figure 3

12 12 Traffic Types  Different types of Internet traffic.  Routers are oblivious to type of traffic.  Expectation: Traffic types of packet loops streams are distributed similarly as traffic types of overall traffic.

13 13 Traffic Types (Backbone 2)  By protocol  TCP: 10% (93%)  UDP: 16% (6%)  ICMP 77% (0.3%)  TCP Flags  SYN: 51% (5%)  ACK: 73% (97%)  RST: 13% (1.5%)  FIN: 8% (4%)

14 14 Reasons for Increases  TCP SYN traffic.  TCP is connection oriented.  End point tries to open connection, sends SYN packet.  SYN packet loops and expires, no other packets are sent.  UDP traffic.  UDP is connectionless, no feedback from receiver.  Sending application is oblivious of loop.  ICMP traffic.  Caused by traceroute/ping applications.  People are exploring loop. Observations confirm presence of loops!

15 15 Out-Of-Order Delivery

16 16 Causes of Packet Loops: BGP customer AS 1 AS 2 A B C D

17 17 Matching BGP Updates  Any advertisement of the longest prefix?  Temporal vicinity of 2 minutes to packet loops?  Change in next hop or AS path?

18 18 Causes of Loops: ISIS R1 R3R2 R5R4 11 1 11 4

19 19 Time-Line at Nodes R2 and R3 R2R3 Failure Detection LSP generation Shortest Path Computation LSP Flooding FIB Update LSP Arrival Shortest Path Computation FIB Update

20 20 Matching ISIS Updates  Upon receipt of an LSP, compute the shortest path from the observation node to the egress router  If forwarding path changed and it is within temporal vicinity of loop  see if the observation node lies on the shortest path before or after the change

21 21 BGP Update Matches Trace% transient % persistent (BGP) % persistent (no BGP) Total NYC-2040.1050.890.8 NYC-2180.207.587.9 NYC-233.300 NYC-2218.8080.699.4 NYC-2470.000 NYC-2543.715.5059.2

22 22 Factors to Varying Success  Persistent Loops  Events occurred before trace collection  BGP changes external to Sprint  Comparison with RouteView updates: increase in matches  Geographical distribution of loop destinations  Measurement PoP not involved in route changes  Avg # of ASes traversed: longest for NYC-23

23 23 Conclusions  Loops can be detected and analyzed  Loops are not uncommon  Most are due to BGP updates  BGP changes farther away from the observations point may not be identified

24 BACKUP SLIDE

25 25 CDF of Number of Replicas

26 26 CDF of Inter-Replica Spacing Time

27 27 Packet Types of All Traffic

28 28 Packet Types of Loops

29 29 Destination Addresses of Loops Backbone 1 Regional 2

30 30 CDF of Replica Stream Duration in Time

31 31 CDF of Routing Loop Duration in Time

32 32 Overview  Types and causes behind routing loops  Transient - part of normal routing protocol operation  Persistent - “long-lasting”, manual intervention required  Detection of routing loops in packet traces  Detection algorithm  Observations about the routing loops  Analysis of performance impact  Loss, delay, out-of-order delivery  On-line detection algorithm  Summary

33 33 Fraction of Packets in Loops Backbone 1 Backbone 4

34 34 Construction of a Typical End-To-End Path 10 hops in the Backbone DSL/LAN/Cable/Phone Regional to Backbone

35 35 Estimate of End-to-End Loss  Assume:  No loss on the access link due to routing loops  Losses are independence between links  Estimate:  L r : from Regional traces  L b : from Backbone traces but for Backbone 4  1 - (1- L r ) 2 (1- L b ) 10 = 0.003 ~ 0.025  Implications on SLA??

36 36 Delay Due to Routing Loops

37 37 Out-Of-Order Delivery

38 38 Causes of Loop

39 39 Overview  Types and causes behind routing loops  Transient - part of normal routing protocol operation  Persistent - “long-lasting”, manual intervention required  Detection of routing loops in packet traces  Detection algorithm  Observations about the routing loops  Analysis of performance impact  Loss, delay, out-of-order delivery  On-line detection algorithm  Summary & Future Work

40 40 To Detect a Loop On-line  Focus on persistent loops  Questions:  More focus on persistent loops  How much traffic is affected? -> alarm  What prefix is affected? -> warning

41 41 On-Line Detection Algorithm  How many packets to /24 get looped? 100  WARNING  How many looped packets / million? 5%  How long (in millions) did it last? 10 millions  ALARM  By the time an alarm is raised, warnings are raised and help debugging the system  Fixed memory and computation complexity

42 42 Validation of On-Line Algorithm

43 43 Summary  Impact of routing on performance has been analyzed in terms of loss and delay.  Per-link loss varies greatly.  Excluding “outliers”, end-to-end loss of 0.3% is unavoidable.  For a small number of packets that escape the loops, 50 ~ 500 msec delay is added on the average.  On-line detection algorithm  In conjunction with routing protocol monitoring, it will help detect and fix persistent loops.

44 44 Future Work  More work needed to determined causes behind routing loops  Correlate with BGP/IS-IS updates Address hijacking Wrong aggregation Origin misconfiguration Export misconfiguration  Integration with existing monitoring tools

45 Backup Slides

46 46 Superbowl Sunday, 2/3/2002

47 47 Superbowl Sunday, 2/3/2002

48 48 What Next?  Alarms and warnings  How to extract just enough info to be useful  How to relate it with BGP/IS-IS update info  How to integrate with management/monitoring infrastructure


Download ppt "Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard."

Similar presentations


Ads by Google