Network Resilience: Exploring Cascading Failures Vishal Misra Columbia University in the City of New York Joint work with Ed Coffman, Zihui Ge and Don.

Slides:



Advertisements
Similar presentations
Routing System Stability draft-dimitri-grow-rss-01.txt IETF71 - Philadelphia.
Advertisements

Weak State Routing for Large Scale Dynamic Networks Utku Günay Acer, Shivkumar Kalyanaraman, Alhussein A. Abouzeid Rensselaer Polytechnic Institute Department.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Part IV: BGP Routing Instability. March 8, BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages.
1 Experimental Study of Internet Stability and Wide-Area Backbone Failure Craig Labovitz, Abha Ahuja Merit Network, Inc Presented by Changchun Zou.
Optimizing Buffer Management for Reliable Multicast Zhen Xiao AT&T Labs – Research Joint work with Ken Birman and Robbert van Renesse.
Spontaneous recovery in dynamic networks Advisor: H. E. Stanley Collaborators: B. Podobnik S. Havlin S. V. Buldyrev D. Kenett Antonio Majdandzic Boston.
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
Simulating Large Networks using Fluid Flow Model Yong Liu Joint work with Francesco LoPresti, Vishal Misra Don Towsley, Yu Gu.
BGP in 2009 Geoff Huston APNIC May Conventional BGP Wisdom IAB Workshop on Inter-Domain routing in October 2006 – RFC 4984: “routing scalability.
Small-world Overlay P2P Network
CS-495 Advanced Networking J. Scott Miller, Spring 2005 Against Internet Intrusions (paper)
TCP over ad hoc networks Ad Hoc Networks will have to be interfaced with the Internet. As such backward compatibility is a big issue. One might expect.
Network Protocols Designed for Optimizability Jennifer Rexford Princeton University
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
Fluid-based Analysis of a Network of AQM Routers Supporting TCP Flows with an Application to RED Vishal Misra Wei-Bo Gong Don Towsley University of Massachusetts,
Vassilios V. Dimakopoulos and Evaggelia Pitoura Distributed Data Management Lab Dept. of Computer Science, Univ. of Ioannina, Greece
March 22, 2002 Simple Protocols, Complex Behavior (Simple Components, Complex Systems) Lixia Zhang UCLA Computer Science Department.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Catastrophic Failures in Networked Systems Jon Crowcroft
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
Scalable Construction of Resilient Overlays using Topology Information Mukund Seshadri Dr. Randy Katz.
Allocations vs Announcements A comparison of RIR IPv4 Allocation Records with Global Routing Announcements Geoff Huston May 2004 (Activity supported by.
1 Worm Modeling and Defense Cliff C. Zou, Don Towsley, Weibo Gong Univ. Massachusetts, Amherst.
EQ-BGP: an efficient inter- domain QoS routing protocol Andrzej Bęben Institute of Telecommunications Warsaw University of Technology,
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
2 Introduction: phase transition phenomena Phase transition: qualitative change as a parameter crosses threshold Matter temperature magnetism demagnetism.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
Interconnectivity Density Compare number of AS’s to average AS path length A uniform density model would predict an increasing AS Path length (“Radius”)
ETRI meeting (Feb 16, 2005) -- Dongkee LEE 1 Sapphire/Slammer worm impact on Internet routing Dongkee LEE.
Code Red Worm Propagation Modeling and Analysis Cliff Changchun Zou, Weibo Gong, Don Towsley Univ. Massachusetts, Amherst.
CODE RED WORM PROPAGATION MODELING AND ANALYSIS Cliff Changchun Zou, Weibo Gong, Don Towsley.
Code Red Worm Propagation Modeling and Analysis Cliff Changchun Zou, Weibo Gong, Don Towsley.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
Fluid-based Analysis of a Network of AQM Routers Supporting TCP Flows with an Application to RED Vishal Misra Wei-Bo Gong Don Towsley University of Massachusetts,
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 11 Unicast Routing Protocols.
Modeling Worms: Two papers at Infocom 2003 Worms Programs that self propagate across the internet by exploiting the security flaws in widely used services.
Inter-Domain Routing Trends Geoff Huston APNIC March 2007.
Growth Codes: Maximizing Sensor Network Data Persistence abhinav Kamra, Vishal Misra, Jon Feldman, Dan Rubenstein Columbia University, Google Inc. (SIGSOMM’06)
A Firewall for Routers: Protecting Against Routing Misbehavior1 June 26, A Firewall for Routers: Protecting Against Routing Misbehavior Jia Wang.
BGP topics to be discussed in the next few weeks: –Excessive route update –Routing instability –BGP policy issues –BGP route slow convergence problem –Interaction.
On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering.
Robustness of complex networks with the local protection strategy against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Wayne.
TCOM 509 – Internet Protocols (TCP/IP) Lecture 06_a Routing Protocols: RIP, OSPF, BGP Instructor: Dr. Li-Chuan Chen Date: 10/06/2003 Based in part upon.
By, Matt Guidry Yashas Shankar.  Analyze BGP beacons which are announced and withdrawn, usually within two hour intervals.  The withdraws have an effect.
Analyzing the Vulnerability of Superpeer Networks Against Attack Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
1 On the Performance of Internet Worm Scanning Strategies Authors: Cliff C. Zou, Don Towsley, Weibo Gong Publication: Journal of Performance Evaluation,
Routing Table Status Report Geoff Huston November 2004 APNIC.
Climate models -- the most sophisticated models of natural phenomena. Still, the range of uncertainty in responses to CO 2 doubling is not decreasing.
An internet is a combination of networks connected by routers. When a datagram goes from a source to a destination, it will probably pass through many.
1 On the Performance of Internet Worm Scanning Strategies Cliff C. Zou, Don Towsley, Weibo Gong Univ. Massachusetts, Amherst.
1 Modeling, Early Detection, and Mitigation of Internet Worm Attacks Cliff C. Zou Assistant professor School of Computer Science University of Central.
Mix networks with restricted routes PET 2003 Mix Networks with Restricted Routes George Danezis University of Cambridge Computer Laboratory Privacy Enhancing.
1 Monitoring and Early Warning for Internet Worms Authors: Cliff C. Zou, Lixin Gao, Weibo Gong, Don Towsley Univ. Massachusetts, Amherst Publish: 10th.
Routing Table Status Report Geoff Huston August 2004 APNIC.
Tracking the Internet’s BGP Table Geoff Huston Telstra December 2000.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Cascading failures of loads in interconnected networks under intentional attack Yongxiang Xia Department of Information Science and Electronic Engineering.
Mean Field Methods for Computer and Communication Systems Jean-Yves Le Boudec EPFL Network Science Workshop Hong Kong July
COS 561: Advanced Computer Networks
Intra-Domain Routing Jacob Strauss September 14, 2006.
Dynamic Routing Protocols part2
BGP update profiles and the implications for secure BGP update validation processing Geoff Huston PAM April 2007.
A stability-oriented approach to improving BGP convergence
COS 561: Advanced Computer Networks
Geoff Huston September 2002
IT351: Mobile & Wireless Computing
COS 461: Computer Networks
Routing Table Status Report
Presentation transcript:

Network Resilience: Exploring Cascading Failures Vishal Misra Columbia University in the City of New York Joint work with Ed Coffman, Zihui Ge and Don Towsley (Umass-Amherst)

Prologue On Tuesday, September 18, simultaneous with the onset of the propagation phase of the Nimda worm, we observed a BGP storm. This one came on faster, rode the trend higher, and then, just as mysteriously, turned itself off, though much more slowly. Over a period of roughly two hours, starting at about 13:00 GMT (9am EDT), aggregate BGP announcement rates exponentially ramped up by a factor of 25, from 400 per minute to 10,000 per minute, with sustained "gusts" to more than 200,000 per minute. The advertisement rate then decayed gradually over many days, reaching pre-Nimda levels by September 24th. Similar events were observed on July 19 th, the day CODE RED spread

Conjecture o The viruses started random IP port scanning o Most of these random IP addresses were not in the cached entries of the routing table, causing.... o frequent cache misses, and.. o in the case of invalid IP addresses, generation of ICMP (router error) messages.. o …both of the above causes led to router CPU overload, causing routers to crash o Router failure led to withdrawal announcements by the peers, generating a high level of advertisement traffic. o When the router came back on, it required a full state update from it's peers, creating a large spike in the load of it's peers that provided the state dump o Once the restarted router obtained all the dumps, it dumped its full state to all its peers, creating another spike in the load.. o Frequent full state dumps led to more CPU overload, leading to more crashes, and the propagation of the cycle... Cascading Failures?

Outline o Background o Modeling interactions o A Fluid model v Phase transitions o A Birth-Death model v More phase transitions o Insights o Future work

Studies in Cascading Failures o Cascading failures studied extensively in Power Networks (Zaborsky et al.) o Coupling in Power Networks between nodes well understood: e.g. differential equations describe voltage-phasor-load relationships o Coupling in data networks: Routing, Traffic engineering, policy routing, DNS…difficult to model!

Modeling interactions o We model coupling at BGP level o Study the interaction of a clique of BGP routers o Model three different kinds of phenomena: router crash, router repair and full state updates o System essentially forms a mutual aid collective

Clique of routers Routers form a fully connected graph All routers are peers of each other At the AS level, BGP routers form a clique of the order of 540 nodes

A fluid model for interactions o We consider a clique of N nodes o Study process of nodes that are down, D o k s : Rate at which single up node brings up down nodes o k l : Rate at which full state updates brings down up nodes o Typically, expect k s >> k l

Drift equations   (t) = Number of arrivals in [0,t) d  (t) = (N-D)*D*k s dt   (t) = Number of departures in [0,t) d  (t) = D *(N-D) /D k l dt = (N-D) *k l dt o Now, consider the drift in down nodes D dD(t) = d  (t) - d  (t)

Dynamics of D System shows Phase Transition If D(0) > k s / k l else

Phase transitions N = 100 k s / k l = 20

Properties of phase transition o Threshold is an absolute quantity rather than a fraction o Cliques with “powerful” (i.e., k s / k l high) nodes do not exhibit cascading failures o Smaller cliques more resistant to phase transitions

A Birth-Death model o Again consider a clique of N nodes o The system state i is the number of down nodes o Transitions rates are state dependent 01ii+1N-1N    i ii 

Transient model  Since  N =0, state N is an absorbing state o System ends up in N with probability 1 o Perform transient analysis, compute mean time to absorption, W i starting from state i o W i good indicator of stability of system, a low value indicates propensity to collapse to state N (where all nodes are down) o Physically, interpret W i as the ability for the system to recover if it ends up in state i through some exogenous process (e.g. attacks)

Solution for W i With boundary conditions and

Solution (cont.) and Yield a way to compute W i

Modeling transition rates i =(N-i) *i *k l + k a k a =ambient traffic load, k l similar to fluid model k s similar to fluid model  i =(N-i) *k s

The mean time to absorption N=20, k s =1, k l =0.01 System stable, mean time to absorption of the order 10 26, even if only one node is up

A larger clique N=100, k s =1, k l =0.01 System still stable, mean time to absorption of the order 10 48, if only one node is up

The appearance of phase transitions N=200, k s =1, k l =0.01 Mean time to absorption goes down from 10 47, to about 0 in a matter of few states

Dependence on service rate/load Transition point shifts right as ratio goes up

Dependence on clique size Transition point remains roughly the same, relative stability goes down as N goes up

Early conclusions o Cascading failures possible in mutual support systems like a BGP clique o Presence of phase transitions depends on system parameters strongly o Clique size an important threshold, larger cliques more likely to undergo cascading failures

Future work o Refine model, plug in numbers for parameters o Look at different topologies o Do more detailed modeling of single router (fixed point solutions)