Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Resilience: Exploring Cascading Failures Vishal Misra Columbia University in the City of New York Joint work with Ed Coffman, Zihui Ge and Don.

Similar presentations


Presentation on theme: "Network Resilience: Exploring Cascading Failures Vishal Misra Columbia University in the City of New York Joint work with Ed Coffman, Zihui Ge and Don."— Presentation transcript:

1 Network Resilience: Exploring Cascading Failures Vishal Misra Columbia University in the City of New York Joint work with Ed Coffman, Zihui Ge and Don Towsley (Umass-Amherst)

2 Prologue On Tuesday, September 18, simultaneous with the onset of the propagation phase of the Nimda worm, we observed a BGP storm. This one came on faster, rode the trend higher, and then, just as mysteriously, turned itself off, though much more slowly. Over a period of roughly two hours, starting at about 13:00 GMT (9am EDT), aggregate BGP announcement rates exponentially ramped up by a factor of 25, from 400 per minute to 10,000 per minute, with sustained "gusts" to more than 200,000 per minute. The advertisement rate then decayed gradually over many days, reaching pre-Nimda levels by September 24th. Similar events were observed on July 19 th, the day CODE RED spread http://www.renesys.com/projects/bgp_instability

3 Conjecture o The viruses started random IP port scanning o Most of these random IP addresses were not in the cached entries of the routing table, causing.... o frequent cache misses, and.. o in the case of invalid IP addresses, generation of ICMP (router error) messages.. o …both of the above causes led to router CPU overload, causing routers to crash o Router failure led to withdrawal announcements by the peers, generating a high level of advertisement traffic. o When the router came back on, it required a full state update from it's peers, creating a large spike in the load of it's peers that provided the state dump o Once the restarted router obtained all the dumps, it dumped its full state to all its peers, creating another spike in the load.. o Frequent full state dumps led to more CPU overload, leading to more crashes, and the propagation of the cycle... Cascading Failures?

4 Outline o Background o Modeling interactions o A Fluid model v Phase transitions o A Birth-Death model v More phase transitions o Insights o Future work

5 Studies in Cascading Failures o Cascading failures studied extensively in Power Networks (Zaborsky et al.) o Coupling in Power Networks between nodes well understood: e.g. differential equations describe voltage-phasor-load relationships o Coupling in data networks: Routing, Traffic engineering, policy routing, DNS…difficult to model!

6 Modeling interactions o We model coupling at BGP level o Study the interaction of a clique of BGP routers o Model three different kinds of phenomena: router crash, router repair and full state updates o System essentially forms a mutual aid collective

7 Clique of routers Routers form a fully connected graph All routers are peers of each other At the AS level, BGP routers form a clique of the order of 540 nodes

8 A fluid model for interactions o We consider a clique of N nodes o Study process of nodes that are down, D o k s : Rate at which single up node brings up down nodes o k l : Rate at which full state updates brings down up nodes o Typically, expect k s >> k l

9 Drift equations   (t) = Number of arrivals in [0,t) d  (t) = (N-D)*D*k s dt   (t) = Number of departures in [0,t) d  (t) = D *(N-D) /D k l dt = (N-D) *k l dt o Now, consider the drift in down nodes D dD(t) = d  (t) - d  (t)

10 Dynamics of D System shows Phase Transition If D(0) > k s / k l else

11 Phase transitions N = 100 k s / k l = 20

12 Properties of phase transition o Threshold is an absolute quantity rather than a fraction o Cliques with “powerful” (i.e., k s / k l high) nodes do not exhibit cascading failures o Smaller cliques more resistant to phase transitions

13 A Birth-Death model o Again consider a clique of N nodes o The system state i is the number of down nodes o Transitions rates are state dependent 01ii+1N-1N    i ii 

14 Transient model  Since  N =0, state N is an absorbing state o System ends up in N with probability 1 o Perform transient analysis, compute mean time to absorption, W i starting from state i o W i good indicator of stability of system, a low value indicates propensity to collapse to state N (where all nodes are down) o Physically, interpret W i as the ability for the system to recover if it ends up in state i through some exogenous process (e.g. attacks)

15 Solution for W i With boundary conditions and

16 Solution (cont.) and Yield a way to compute W i

17 Modeling transition rates i =(N-i) *i *k l + k a k a =ambient traffic load, k l similar to fluid model k s similar to fluid model  i =(N-i) *k s

18 The mean time to absorption N=20, k s =1, k l =0.01 System stable, mean time to absorption of the order 10 26, even if only one node is up

19 A larger clique N=100, k s =1, k l =0.01 System still stable, mean time to absorption of the order 10 48, if only one node is up

20 The appearance of phase transitions N=200, k s =1, k l =0.01 Mean time to absorption goes down from 10 47, to about 0 in a matter of few states

21 Dependence on service rate/load Transition point shifts right as ratio goes up

22 Dependence on clique size Transition point remains roughly the same, relative stability goes down as N goes up

23 Early conclusions o Cascading failures possible in mutual support systems like a BGP clique o Presence of phase transitions depends on system parameters strongly o Clique size an important threshold, larger cliques more likely to undergo cascading failures

24 Future work o Refine model, plug in numbers for parameters o Look at different topologies o Do more detailed modeling of single router (fixed point solutions)


Download ppt "Network Resilience: Exploring Cascading Failures Vishal Misra Columbia University in the City of New York Joint work with Ed Coffman, Zihui Ge and Don."

Similar presentations


Ads by Google