Download presentation
Presentation is loading. Please wait.
Published byGillian Weaver Modified over 9 years ago
1
Inference, monitoring and recovery of large scale networks CSE Department PennState University Institute for Networking and Security Research Faculty: Thomas La Porta Post-Doc: Simone Silvestri Ph.D. Students: Srikar Tati, Brett Holbert, Michael Lin
2
Problems and challenges in large scale networks 2 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 Research problems Inferencing Monitoring Recovery Challenges Large scale Partial information Interdependent networks Constraints (time, cost,..) This research is sponsored by: Defense Threat Reduction Agency (DTRA) Army Research Lab and UK Ministry of Defence - ITA Program Internet router level topology Merlin Tool
3
Inferencing: motivation 3 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 The lack of global knowledge of the Internet topology Hinders network diagnostics (losses, failures, bottlenecks) Inflates IP path lengths Reduces accuracy of models Encourages overlay networks to ignore underlay Network operators rarely publish their topologies Current inference approaches rely on tools such as Traceroute Traceroute provides only partial information The network is only partially observable Previous approaches fail or peform poorly Our problem : infer the routing topology in the presence of partial information
4
Inferencing: our approach - iTop 4 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 iTop algorithm: Fills unobservable parts of the network with virtual links/routers Analyzes the traces to determine properties of the real topology Iteratively merges links to infer the real network Ground Truth topology Virtual topology Merging algorithm iTop + Inferred topology Trace analysis
5
Inferencing: our approach - Results 5 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 We compare our approach to state-of-art inferencing approaches: X. Jin, W.-P. Yiu, S.-H. Chan, and Y. Wang, “Network topology inference based on end-to-end measurements,” IEEE Journal on Selected Areas in Communications, vol. 24, no. 12, pp. 2182–2195, 2006 B. Yao, R. Viswanathan, F. Chang, and D. Waddington, “Topology inference in the presence of anonymous routers,” IEEE Infocom, 2003. We consider realistic networks We also show how iTop improves the performance of failure diagnosis algorithms in the presence of partial information
6
Monitoring: motivation (1) 6 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 Accurate knowledge of the internal network state enables Performance diagnosis Resoruce allocation Efficient routing Congestion control Monitoring large scale networks may incur high overhead Network tomography Infer internal network from end-to-end measurements Solve a linear system Enables efficient monitoring probing only a basis of the system =
7
Monitoring: motivation (2) 7 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 Failures are common events in modern networks Failures can significantly affect the performance of network tomography Probing incurs a cost, often a maximum budget is available Our problem : select a set of probing paths to maximize the performance of network tomography under failures with a limited budget
8
Monitoring: our approach 8 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 We translate the problem into a maximization of a submodular function under budget constraint We propose the algorithm RoMe Makes use of recent advances in submodular maximiztion theory Has an approximation factor (1-1/e)/2 It is optimal with additional constraint of linear independency Assumes knowledge of the failure distribution We consider the case of unknown failure distribution We propose the algorithm LSR (Learning with Submodular Rewards) Reinforcement learning approach Learns path availabilities Performance guarantees Init Update path availabilities Select paths Collect measurements
9
Monitoring: results 9 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 We compare our approach to state-of-art path selection algorithms Y. Chen, D. Bindel, H. Song, and R. H. Katz, “An algebraic approach to practical and scalable overlay network monitoring,” ACM SIGCOMM Comp. Com. Rev., 2004. We consider realistic topologies and failure models
10
Recovery: motivation 10 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 Modern networks are highly interdependent The Internet and the smart grid Water supply, transportaion, fuel and power stations are coupled together Interdependent networks are extremely sensitive to failures Failures may create performance degradation Degradation can also propagate in the surviving network Electrical blackout that occurred in Italy in September 2003
11
Recovery: research problems (1) 11 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 Recovery algorithms for overlay networks Two networks sharing the same infrastructure Failures occur in the underlay network and affect the overlay Models an emergency urban communication network after a weapon of mass destruction attack We aim at restoring the functionality of the overlay network repairing the underlay Objectives & constrains Bandwith Time Cost Utility
12
Recovery: research problems (2) 12 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 Models for temporal propagation of failures Two general interdependent networks Failures propagate over time Backup batteries/generators Local solar plant supply Given the initial failure our model will: Estimate the probability that one element fails at a given time Estimate the expected time at which one element fails Estimate the expected number of failed elements at a given time These information will be used to design recovery strategies These models will be mapped and validated with real interdependent networks
13
Recovery: research problems (3) 13 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014 Improve network robustness: Re-design existing networks Design new networks less prone to cascading effects Models and recovery strategies for performance degradation over time Partial knowledge Partial control Multiple interdependent networks
14
Thank you! Any question? 14 Inference, monitoring and recovery of large scale networks INSR Industry Day 2014
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.