Download presentation
Presentation is loading. Please wait.
1
March 22, 2002 Simple Protocols, Complex Behavior (Simple Components, Complex Systems) Lixia Zhang UCLA Computer Science Department
2
3/22/022lixia@cs.ucla.edu l Work under DARPA's "Fault Tolerant Networking" program l Joint research project n UCLA: Lan Wang, Dan Pei n USC/ISI: Dan Massey, Allison Mankin n NCSU: Xiaoliang Zhao n UC Davis: Felix Wu l Data analysis by Lan Wang, Dan Pei, Xiaoliang Zhao
3
3/22/023lixia@cs.ucla.edu Building a More Resilient Internet l Goal: build against attacks N times worse than 9/11 or Code Red l How: n Identify fundamental pieces in the infrastructure The current project focusing on the routing infrastructure n Assess how well each of them currently can resist faults/attacks n Build more and stronger fences to protect each l One specific case study n BGP: assessment of how well it stands against network attacks and failures now, what works and what to be improved
4
3/22/024lixia@cs.ucla.edu BGP Resilience during Code Red Attack l Renesys report, “ Global BGP routing instability during the Code Red attacks ”, showed n the correlation between the large attack traffic spikes due to the attack and BGP routing message spikes on 9/18/01 n evidence of possibly large scale BGP route changes? l Exactly how well or poorly did BGP behave as a routing protocol, and why ? n Is BGP indeed in trouble facing virus attacks? n What insight about the routing protocol design can be inferred from the collected BGP data on 9/18?
5
3/22/025lixia@cs.ucla.edu Data and Methodology l BGP update messages collected at RIPE NCC from 9/10/01 to 9/30/01 The same data set as collected by Renesys report n 3 US peers: AT&T, Verio, Global Crossing n 8 peers in Europe n 1 in Japan l Methodology n Categorize BGP advertisements n Infer the causes of each class of BGP advertisements
6
3/22/026lixia@cs.ucla.edu
7
3/22/027lixia@cs.ucla.edu BGP Message Classification (1)
8
3/22/028lixia@cs.ucla.edu What Our Analysis Shows l A substantial percentage of the BGP messages during the worm attack were not about route changes n BGP initial table exchanges: 40.2% on 9/18/2001 n Duplicate advertisements: 5% on 9/18/2001 l BGP updates that might indicate route changes: n Implicit withdraws: 37.6% on 9/18 l BGP updates that indicate route changes: n New Announcements: 8.8% on 9/18 n Withdraws: 8.3% on 9/18 (roughly the same as during normal days)
9
3/22/029lixia@cs.ucla.edu What Our Analysis Shows 40.2% A substantial percentage of the BGP messages during the worm attack were not about route changes 37.6% 8.8% 8.3%
10
3/22/0210lixia@cs.ucla.edu A Closer Look at the Changes l (40.2%) BGP table exchanges BGP session restarts l (37.6%) Implicit withdraws n ~ 25% have unchanged ASPATH attribute Most of them wouldn’t be propagated by the receiver (e.g. changes in MED attribute) Possible causes: internal network dynamics n slow convergence n topology change l (~17%) Explicit route announcement and withdraws n Reachability and/or route changes l (~5%) Duplicate announcements
11
3/22/0211lixia@cs.ucla.edu How to Infer the Causes? All the BGP sessions restarted during the worm attack period (total 32 restarts) Synchronized session restarts on other days too Largely an artifact of the monitoring point Cern peering with NCC: 550,000 announcements due to session restart Cern peering with RRC04: 0 BGP Session Restarts BGP table exchanges (40.2% of total BGP announcements on 9/18)
12
3/22/0212lixia@cs.ucla.edu Implicit Withdraws (37.6% on 9/18) Type 1: same ASPATH as the previous announcement (25%) Type 2: otherwise
13
3/22/0213lixia@cs.ucla.edu Type 1 Implicit Withdraw Two US peers have high percentage of Type 1 Implicit Withdraws.
14
3/22/0214lixia@cs.ucla.edu Type 2 Implicit Withdraw (28.9% on 9/18) l The European peers have more Type 2 Implicit Withdraws than US peers. l Causes: largely slow convergence? n more quantitative analysis coming.
15
3/22/0215lixia@cs.ucla.edu And mostly came from edges AnnouncementsWithdraws PrefixCountPrefixCount 200.16.216.0/2413272192.189.62.0/243825 192.189.62.0/247113200.16.216.0/243253 64.65.240.0/24496764.65.240.0/242987 208.192.27.0/244667207.45.205.0/242539 207.45.205.0/243652207.45.223.0/242538 207.45.223.0/243650208.192.27.0/241908 204.1.252.0/243034207.45.195.0/241591 209.170.44.0/243028144.170.89.0/241436 195.251.224.0/212953144.170.76.0/241432 144.170.89.0/242922144.170.236.0/241402 199.33.32.0/192891144.170.86.0/241368 144.170.76.0/242874196.12.172.0/241134 144.170.236.0/242874203.102.83.0/241083 144.170.86.0/242813195.251.224.0/211015 195.251.236.0/232731207.105.168.0/24936 193.92.176.0/222726195.251.236.0/23928 There were indeed certain reachability changes
16
3/22/0216lixia@cs.ucla.edu Evidence:one example of slow convergence 09/18/2001 14:04:23 AS3549 originated prefix 66.133.177.0/24 09/18/2001 14:04:37 AS1103 announced aspath 1103 3549 09/18/2001 14:05:10 AS3549 withdrew 66.133.177.0/24 09/18/2001 14:05:36 AS1103 announced aspath 1103 8297 6453 3549 09/18/2001 14:06:34 AS1103 announced aspath 1103 8297 6453 1239 3549 09/18/2001 14:07:02 AS1103 sent withdrawal to 66.133.177.0/24
17
3/22/0217lixia@cs.ucla.edu About 30% of the total advertisements from AT&T were duplicate Duplicate Advertisements: One extreme Example: AT&T How to Infer the Causes (2)
18
3/22/0218lixia@cs.ucla.edu What Insights Does This Give Us? l BGP as a routing protocol stood well during the worm attack n It also stood up well during other major topological incidents, such as cable cuts, Baltimore tunnel fire, even 9/11 event.
19
3/22/0219lixia@cs.ucla.edu What Insights Does This Give Us (II)? l BGP design needs improvement for unforeseen future faults/attacks n BGP peering must work well not just on good days, but behave well even on rainy days n BGP should keep local changes local in order to be a more resilient global routing protocol n BGP fast convergence solution should be deployed to remove the "amplifier" effect of the slow route convergence under stressful conditions "Improving BGP Convergence Through Consistency Assertions", D. Pei, et al, INFOCOM 2002 l BGP implementation tradeoffs must be made in view of the system performance as a whole
20
3/22/0220lixia@cs.ucla.edu Other Lessons Learned l What's interesting vs what's the challenge n Be careful of measurement artifacts n Raw data alone does not necessarily tell what is going on, let alone why l Be careful of the distinction between the properties of a protocol and the behavior due to a particular implementation and/or configuration n E.g. high duplicate updates from one service provider, uncontrolled flapping from another l Challenge: Understand the dynamics of what the data means for the protocol in action
21
3/22/0221lixia@cs.ucla.edu What to Carry Away l Large systems: intrinsically complex external behavior l Large systems: potentially large numbers of unexpected events and couplings l Research challenge: building large scale systems that are resilient to unexpected failures
22
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.