Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira.

Slides:



Advertisements
Similar presentations
An Improved TCP for transaction communications on Sensor Networks Tao Yu Tsinghua University 2/8/
Advertisements

Challenges in Making Tomography Practical
Data-Plane Accountability with In-Band Path Diagnosis Murtaza Motiwala, Nick Feamster Georgia Tech Andy Bavier Princeton University.
Internet monitoring is essential
Theory Lunch. 2 Problem Areas Network Virtualization for Experimentation and Architecture –Embedding problems –Economics problems (markets, etc.) Network.
Multihoming and Multi-path Routing
Cristian Lumezanu Neil Spring Bobby Bhattacharjee Decentralized Message Ordering for Publish/Subscribe Systems.
Publish-Subscribe Approach to Social Annotation of News Top-k Publish-Subscribe for Social Annotation of News Joint work with: Maxim Gurevich (RelateIQ)
Countering DoS Attacks with Stateless Multipath Overlays Presented by Yan Zhang.
Large-Scale Distributed Systems Andrew Whitaker CSE451.
ROUTING TECHNIQUES IN WIRELESS SENSOR NETWORKS: A SURVEY Presented By: Abbas Kazerouni EE 360 paper presentation, winter 2014, EE Department, Stanford.
An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
A Measurement Study of Available Bandwidth Estimation Tools MIT - CSAIL with Jacob Strauss & Frans Kaashoek Dina Katabi.
Traversing symmetric NAT with predictable port allocation function SIN 2014 Dušan Klinec, Vashek Matyáš Faculty of Informatics, Masaryk University.
PODC 2007 © 2007 IBM Corporation Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed,
Lecture 4: Cloud Computing Security: a first look Xiaowei Yang (Duke University)
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
CCNA2 Module 4. Discovering and Connecting to Neighbors Enable and disable CDP Use the show cdp neighbors command Determine which neighboring devices.
Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.
1 Estimating Shared Congestion Among Internet Paths Weidong Cui, Sridhar Machiraju Randy H. Katz, Ion Stoica Electrical Engineering and Computer Science.
15-441: Computer Networking Lecture 26: Networking Future.
An Algebraic Approach to Practical and Scalable Overlay Network Monitoring Yan Chen, David Bindel, Hanhee Song, Randy H. Katz Presented by Mahesh Balakrishnan.
Traffic Engineering Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
NetQuest: A Flexible Framework for Internet Measurement Lili Qiu Joint work with Mike Dahlin, Harrick Vin, and Yin Zhang UT Austin.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Toward Optimal Network Fault Correction via End-to-End Inference Patrick P. C. Lee, Vishal Misra, Dan Rubenstein Distributed Network Analysis (DNA) Lab.
Yao Zhao 1, Yan Chen 1, David Bindel 2 Towards Unbiased End-to-End Diagnosis 1.Lab for Internet & Security Tech, Northwestern Univ 2.EECS department, UC.
A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems.
Game-based Analysis of Denial-of- Service Prevention Protocols Ajay Mahimkar Class Project: CS 395T.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Formal checkings in networks James Hongyi Zeng with Peyman Kazemian, George Varghese, Nick McKeown.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Improving the Reliability of Internet Paths with One-hop Source Routing Krishna Gummadi, Harsha Madhyastha Steve Gribble, Hank Levy, David Wetherall Department.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
1 Meeyoung Cha, Sue Moon, Chong-Dae Park Aman Shaikh Placing Relay Nodes for Intra-Domain Path Diversity To appear in IEEE INFOCOM 2006.
Advanced Networking Lab. Given two IP addresses, the estimation algorithm for the path and latency between them is as follows: Step 1: Map IP addresses.
Autonomous Replication for High Availability in Unstructured P2P Systems Francisco Matias Cuenca-Acuna, Richard P. Martin, Thu D. Nguyen
Network Tomography for Fault Diagnosis Renata Teixeira LIP6 Computer Laboratory CNRS and UPMC Paris Universitas.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Active Measurements on the AT&T IP Backbone Len Ciavattone, Al Morton, Gomathi Ramachandran AT&T Labs.
Protection and Restoration Definitions A major application for MPLS.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
6 December On Selfish Routing in Internet-like Environments paper by Lili Qiu, Yang Richard Yang, Yin Zhang, Scott Shenker presentation by Ed Spitznagel.
WSP: A Network Coordinate based Web Service Positioning Framework for Response Time Prediction Jieming Zhu, Yu Kang, Zibin Zheng and Michael R. Lyu The.
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
1 Effective Diagnosis of Routing Disruptions from End Systems Ying Zhang Z. Morley Mao Ming Zhang.
KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY DYSWIS.
Placing Relay Nodes for Intra-Domain Path Diversity Meeyoung Cha Sue Moon Chong-Dae Park Aman Shaikh Proc. of IEEE INFOCOM 2006 Speaker 游鎮鴻.
: MobileIP. : r Goal: Allow machines to roam around and maintain IP connectivity r Problem: IP addresses => location m This is important for efficient.
Automatic Network Management: Graphical Models for Fault Location Ricardo Morla INESC Porto / FEUP.
1 On the Impact of Route Monitor Selection Ying Zhang* Zheng Zhang # Z. Morley Mao* Y. Charlie Hu # Bruce M. Maggs ^ University of Michigan* Purdue University.
Fault Localization via Analysis of Network Dependency Victor Bahl, Ranveer Chandra, Albert Greenberg, Dave Maltz, Ming Zhang (MSR Redmond)
25/09/ Firewall, IDS & IPS basics. Summary Firewalls Intrusion detection system Intrusion prevention system.
PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services Ming Zhang, Chi Zhang Vivek Pai, Larry Peterson, Randy Wang Princeton.
Network Layer COMPUTER NETWORKS Networking Standards (Network LAYER)
Improved Algorithms for Network Topology Discovery
Basic Project Scheduling
Basic Project Scheduling
Providing Secure Storage on the Internet
Ling-Jyh Chen, Mario Gerla Computer Science Department, UCLA
Backbone Traffic Engineering
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

Minimizing Probing Cost for Detecting Interface Failures: Algorithms and Scalability Analysis Hung Nguyen (Univ. of Adelaide, Australia) Renata Teixeira (UPMC, France) Patrick Thiran (EPFL, Switzerland) Christophe Diot (Thomson, France)

The Internet is great, but problems happen UoA network Net1 Net2 Net3 How to automatically detect and identify problems? Is my connection ok? Is the server up? Is the problem in some of the networks in the path?

Current alarms are not enough Network equipments already have many alarms ◦ SNMP traps ◦ Anomaly detection systems But, alarms may not reflect user’s experience ◦ Hard to map users’ complaints to alarms ◦ Problem may not raise an alarm A C B D C wrongly filters packets to /24

Active monitoring system to detect faults Network admins often resort to active measurements ◦ Active monitoring servers inside their network ◦ Subscribe to third-party monitoring service e.g.,Keynote or RIPE TTM Challenge Cannot continuously overload the network or end-user’s machine to detect faults, which are rar e events

Problem definition M1 M2 T3 T1 T2 A C B D target hosts monitors Goal detect failures of any of the interfaces in the subscriber’s network with minimum probing overhead subscriber network

Simple solution: Coverage problem M1 M2 T3 T1 T2 A C B D Instead of probing all paths, select the minimum set of paths that covers all interfaces in the subscriber’s network

Coverage solution doesn’t detect all types of failures Detects full-stop failures ◦ Failures that affect all packets that traverse the faulty interface  Eg., interface or router crashes, fiber cuts, bugs But not path-specific failures ◦ Failures that affect only a subset of paths that cross the faulty interface  Eg., router misconfigurations

New formulation of failure detection problem Simultaneously select the frequency to probe each path ◦ Lower frequency per-path probing can achieve a high frequency probing of each interface M1 M2 T3 T1 T2 A C B D 1 every 9 mins 1 every 3 mins

Properties of solution Probe minimization for failure detection is no longer NP- hard ◦ Can find optimal solution using linear programming Needs synchronization among monitors ◦ Monitors need to collaborate to probe an interface Alternative probabilistic solution with Poisson probes to avoids synchronization overhead M1 M2 T3 T1 T2 A C B D 1 every 9 mins 1 every 3 mins

Scaling law of probing cost Probing cost (number of probes sent per second) scales almost linearly with the size of the subscriber’s network ◦ In our inferred internet graphs For a random power-law graph, probing cost is a linear function of the number of nodes (n) Bounded by the isometric path number of a graph, i(G) For other graphs: Graphi(G) Cycle2n/(n+1) Completen/2 Hypercuben/log n Gridn/2

Evaluation Paths obtained using traceroutes ◦ From 750 PlanetLab nodes to 3,000 DNS servers ◦ From 12 RON nodes to 60,000 targets Subscriber networks are probed ASes ◦ Map IPs to ASes using Mao et al.’s technique ◦ 1,366 ASes in PlanetLab ◦ 6,517 ASes in RON Compute probing costs varying parameters ◦ Set of paths, failure durations, subscriber’s network

Probing costs varying size of subscriber network in PlanetLab Duration Path-specific = 1000 sec Full-stop duration = 1 sec

Summary Practical formulation of failure detection problem ◦ Incorporates both full-stop and path-specific failures Solution minimizes probing cost ◦ Using linear programming Inferred internet graphs are among the most expensive to probe ◦ Probing cost scales almost linearly with network size Next step ◦ Deploy a system based on these probing techniques

Probing costs Duration Path-specific = 2 sec Full-stop duration = 1 sec

Varying Failure Durations Full-stop duration = 10 sec Path-specific failures dominate the cost Full-stop failures dominate the cost

Probing costs varying size of subscriber network in RON Duration Path-specific = 1000 sec Full-stop duration = 1 sec