Measurement of Routing Switch-over Time with Redundant Link Masaki Hirabaru (NICT), Teruo Nakai (KDDI), Yoshitaka Hattori (KDDI), Motohiro Ishii (QIC), and Yasuichi Kitamura (NICT) August 25, 2005 NOC-Network Engineering Session Advanced Network Conference in Taipei
Motivations Research Networks are getting complicated in the AP region. Situation: many redundant (back-up) links with dynamic routing (instead of SONET or L-2 redundancy) Evaluation: how fast they can switch over in case of outages Improvement: how we can minimize service disruption time Two Cases APII Fukuoka-Tokyo Path over JGN II (Japan Domestic Part) JP-US Path over JGN II / TransPAC2 (at Tokyo)
Routing Switchover from Link A to B Time Failure Link ALink B Resume packets reorder Service Disruption Failure Detection Routing Transient Test Packet Train
Kwangju Busan Fukuoka Korea 2.5G SONET KOREN Taegu Daejon Seoul XP Genkai XP Fukuoka Japan 1,000km APII/JGNII Case 1: APII Seoul-Tokyo Connection Tokyo XP JGNII TransPAC2 Southern Route (10G) Northern Route (1G) 10G JGN II : L2 Service No Fault Detection! => L3 User-side Detection
APII Fukuoka-Tokyo Configuration over JGN II Kanazawa Tokyo tpr4 apii-juniper Osaka Okayama Fukuoka 4/22 5/16(18) x x Northern Route (VLAN 1G) Southern Route (VLAN 10G) OSPF metric 90 OSPF metric 9 iBGP Peering with loopback address Switch (L2) Router (L3)
Detecting Switchover Period #1 (APII) 810 packets / sec = 10 Mbps (MTU 1500B UDP) ~ 1ms resolution Fukuoka -> Tokyo: time 2005/04/22 02:07:46-02:08:23 JST gap 37.1s loss time 2005/04/22 04:02:38-04:03:15 JST gap 36.8s loss Tokyo->Fukuoka: (Unsuccessful measurement) Tokyo -> Fukuoka Northern Route Traffic Graph Tokyo -> Fukuoka Southern Route Traffic Graph two maintenance outages
OSPF Hello Intervals Default: hello interval 10 secs dead interval 40 secs Proposal: hello interval 1 sec (minimum) dead interval 4 secs OSPF adjacency will be lost while parameters are inconsistent among peers. Another way would be using BFD available JUNOS 7.X.
Detecting Switchover Period #2 (APII) 1000 packets / sec = 1 Mbps (58B UDP) ~ 1ms resolution Fukuoka -> Tokyo: time 2005/05/16 02:17:16-02:17:21 JST gap 5.6s loss 5637 Tokyo->Fukuoka: time 2005/05/16 02:17:16-02:17:20 JST gap 4.1s loss Tokyo -> Fukuoka Northern Route Traffic Graph Tokyo -> Fukuoka Southern Route Traffic Graph
Tokyo Michigan TransPAC2 JGN II Case 2: JP-US Connection over TransPAC2 / JGN II Packets are sent at Michigan Routes are injected at Tokyo packets routes
AS22388 losang snvang dnvrng kscyng iplsng chinng transpac-chi AS11537 AS tpr tpr AS22335 AS237 transpac-la ge-2-3-0x v-bin-arbl ms /16 56ms 5ms Ann Arbor - Chicago AS7660 Tokyo - Michigan L3 Configuration JGN II Int’l and TransPAC2: - SONET OC-192 unprotected - L2 fault detection provided ge-1-1-0x
Detecting Switchover Period #3 (TransPAC2) 100 packets / sec = 100 Kbps (58B UDP) ~ 10ms resolution LA -Tokyo TransPAC2 Traffic Graph Time Failure TransPAC2JGN II Resume 7/ ms 8/ ms 5-10 packets reorders in 5-10 ms * Announce /24 into the both JGN II and TransPAC2
Detecting Switchover Period #4 (JGN II) 100 packets / sec = 100 Kbps (58B UDP) ~ 10ms resolution Time Failure JGN IITransPAC2 Resume 7/ ms 8/03 837ms no reorders * Announce /24 into the both JGN II and TransPAC2 BGP down (Tokyo) 45 ms (7/29) 1773 ms (8/03)
Detecting Switchover Period #5 (TransPAC2) 100 packets / sec = 100 Kbps (58B UDP) ~ 10ms resolution Time Failure TransPAC2JGN II Resume 8/ ms No packets reorders * Announce /24 for JGN II and /25 for TransPAC /24 and /25
Artificial Route Change 1) Keep announce a /24 route from Tokyo 2) Announce the /25 route to another link, then 3) Withdraw the /25 route Tokyo Chicago A Packet (before) Michigan Time A Time s Tokyo Chicago W Packet (after) Michigan Time w Time s Time AD = Time S – Time A Time WD = Time S – Time W Packet (after) Packet (before)
A-1 Flap TransPAC 2 JGN II TransPAC2 AW /24 /25 Time AD 196 msTime WD 193 ms A-2 Flap JGN II JGN II TransPAC2 AW /24 /25 Time AD 2106 msTime WD 1922 ms
Route Propagation Delays (1) Delays from Tokyo to Oregon Routeview : 6 secs (via LA) : 5 secs (via LA) : 23 secs (via CHI) : 36 secs (via CHI) No BGP Updates available in Abilene?
Route Propagation Delays (2) - Tokyo to Seoul: BGP Peering Topology - busan daejon seoul tokyo4 tokyo5 koganei generator (A / W) 2-hour intervals monitor AS7660 AS /24 Acknowledgement to JaeHwa KOREN AS17579 AS2907 AS11537 AS2523 JGNII TransPAC2 10G AS9270 fukuoka
Route Propagation Delays (2) - Results - Time Location 2005/08/20 20:00: /08/21 00:00: /08/21 04:00: /08/21 08:00: /08/21 12:00: /08/21 16:00:00 Koganei Tokyo (5/4)7 / / 5256 / / / / 335 Fukuoka Busan Seoul Announce delays (unit is millisecond) Time Location 2005/08/20 22:00: /08/21 02:00: /08/21 06:00: /08/21 10:00: /08/21 14:00: /08/21 18:00:00 Koganei Tokyo (5/4)6 / / / / / / 1295 Fukuoka Busan Seoul Withdraw delays (unit is millisecond)
Summary Avoid Ethernet multiple access device (L2 switch) If no, decrease Hello (heartbeat) intervals, or use BDF Inject alternates routes with longer prefixes Loop-free alternates and fast reroute (future work) Fast route propagation has been not well considered Need overseas test points continuous 0.1Mbps traffic (10ms resolution) Millisecond-order event timestamps (BGP updates, link failure, etc…) Less restriction for IP options and ICMP Global IP addresses for routers Harmful route dampening Route exits controlled by BGP Community Abilene / StarLight Router Proxy is helpful to check the routes.