An Algebraic Approach to Practical and Scalable Overlay Network Monitoring University of California at Berkeley David Bindel, Hanhee Song, and Randy H.

Slides:



Advertisements
Similar presentations
Intel Research Internet Coordinate Systems - 03/03/2004 Internet Coordinate Systems Marcelo Pias Intel Research Cambridge
Advertisements

1 Locating Internet Bottlenecks: Algorithms, Measurement, and Implications Ningning Hu (CMU) Li Erran Li (Bell Lab) Zhuoqing Morley Mao (U. Mich) Peter.
The Connectivity and Fault-Tolerance of the Internet Topology
On Selfish Routing In Internet-like Environments Lili Qiu (Microsoft Research) Yang Richard Yang (Yale University) Yin Zhang (AT&T Labs – Research) Scott.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Towards Unbiased End-to-End Network Diagnosis Name: Kwan Kai Chung Student ID: Date: 18/3/2007.
1 Estimating Shared Congestion Among Internet Paths Weidong Cui, Sridhar Machiraju Randy H. Katz, Ion Stoica Electrical Engineering and Computer Science.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice S 3 : A Scalable Sensing Service.
1 Ossama Younis and Sonia Fahmy Department of Computer Sciences Purdue University For slides, technical report, and implementation, please see:
Tomography-based Overlay Network Monitoring UC Berkeley Yan Chen, David Bindel, and Randy H. Katz.
Server-based Inference of Internet Performance V. N. Padmanabhan, L. Qiu, and H. Wang.
An Algebraic Approach to Practical and Scalable Overlay Network Monitoring Yan Chen, David Bindel, Hanhee Song, Randy H. Katz Presented by Mahesh Balakrishnan.
King : Estimating latency between arbitrary Internet end hosts Krishna Gummadi, Stefan Saroiu Steven D. Gribble University of Washington Presented by:
NetQuest: A Flexible Framework for Internet Measurement Lili Qiu Joint work with Mike Dahlin, Harrick Vin, and Yin Zhang UT Austin.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
Internet Iso-bar: A Scalable Overlay Distance Monitoring System Yan Chen, Lili Qiu, Chris Overton and Randy H. Katz.
1 Network Tomography Venkat Padmanabhan Lili Qiu MSR Tab Meeting 22 Oct 2001.
Tomography-based Overlay Network Monitoring and its Applications Joint work with David Bindel, Brian Chavez, Hanhee Song, and Randy H. Katz UC Berkeley.
Efficient Hop ID based Routing for Sparse Ad Hoc Networks Yao Zhao 1, Bo Li 2, Qian Zhang 2, Yan Chen 1, Wenwu Zhu 3 1 Lab for Internet & Security Technology,
Multiple Sender Distributed Video Streaming Thinh Nguyen, Avideh Zakhor appears on “IEEE Transactions On Multimedia, vol. 6, no. 2, April, 2004”
Tomography-based Overlay Network Monitoring UC Berkeley Yan Chen, David Bindel, and Randy H. Katz.
T. S. Eugene Ng Mellon University1 Global Network Positioning: A New Approach to Network Distance Prediction Tze Sing Eugene.
Tomography-based Overlay Network Monitoring and its Applications Joint work with David Bindel, Brian Chavez, Hanhee Song, and Randy H. Katz UC Berkeley.
Toward Optimal Network Fault Correction via End-to-End Inference Patrick P. C. Lee, Vishal Misra, Dan Rubenstein Distributed Network Analysis (DNA) Lab.
Study of Distance Vector Routing Protocols for Mobile Ad Hoc Networks Yi Lu, Weichao Wang, Bharat Bhargava CERIAS and Department of Computer Sciences Purdue.
Yao Zhao 1, Yan Chen 1, David Bindel 2 Towards Unbiased End-to-End Diagnosis 1.Lab for Internet & Security Tech, Northwestern Univ 2.EECS department, UC.
Tomography-based Overlay Network Monitoring Hugo Angelmar Slides courtesy of (Yan Chen, David Bindel, and Randy H. Katz)
11/4/2003ACM Multimedia 2003, Berkeley, CA1 PROMISE: Peer-to-Peer Media Streaming Using CollectCast Mohamed Hefeeda 1 Joint work with Ahsan Habib 2, Boyan.
Scalable and Deterministic Overlay Network Diagnosis Yao Zhao, Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Dept. of Computer.
On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.
Receiver-driven Layered Multicast Paper by- Steven McCanne, Van Jacobson and Martin Vetterli – ACM SIGCOMM 1996 Presented By – Manoj Sivakumar.
Roadmap-Based End-to-End Traffic Engineering for Multi-hop Wireless Networks Mustafa O. Kilavuz Ahmet Soran Murat Yuksel University of Nevada Reno.
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
On the Power of Off-line Data in Approximating Internet Distances Danny Raz Technion - Israel Institute.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jennifer Rexford Princeton University With Jiayue He, Rui Zhang-Shen, Ying Li,
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.
Phoenix: Towards an Accurate, Practical and Decentralized Network Coordinate System Yang Chen 1, Xiao Wang 1, Xiaoxiao Song 1, Eng Keong Lua 2, Cong Shi.
A Routing Underlay for Overlay Networks Akihiro Nakao Larry Peterson Andy Bavier SIGCOMM’03 Reviewer: Jing lu.
1 Passive Network Tomography Using Bayesian Inference Lili Qiu Joint work with Venkata N. Padmanabhan and Helen J. Wang Microsoft Research Internet Measurement.
Paper Group: 20 Overlay Networks 2 nd March, 2004 Above papers are original works of respective authors, referenced here for academic purposes only Chetan.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
A Light-Weight Distributed Scheme for Detecting IP Prefix Hijacks in Real-Time Lusheng Ji†, Joint work with Changxi Zheng‡, Dan Pei†, Jia Wang†, Paul Francis‡
Towards a Scalable, Adaptive and Network-aware Content Distribution Network Yan Chen EECS Department UC Berkeley.
Towards a Transparent and Proactively-Managed Internet Ehab Al-Shaer School of Computer Science DePaul University Yan Chen EECS Department Northwestern.
6 December On Selfish Routing in Internet-like Environments paper by Lili Qiu, Yang Richard Yang, Yin Zhang, Scott Shenker presentation by Ed Spitznagel.
1 An Efficient, Low-Cost Inconsistency Detection Framework for Data and Service Sharing in an Internet-Scale System Yijun Lu †, Hong Jiang †, and Dan Feng.
On Selfish Routing In Internet-like Environments Lili Qiu (Microsoft Research) Yang Richard Yang (Yale University) Yin Zhang (AT&T Labs – Research) Scott.
Network Computing Laboratory 1 Vivaldi: A Decentralized Network Coordinate System Authors: Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris MIT Published.
1 Network Tomography Using Passive End-to-End Measurements Venkata N. Padmanabhan Lili Qiu Helen J. Wang Microsoft Research DIMACS’2002.
ICDCS 2014 Madrid, Spain 30 June-3 July 2014
Reliable Multicast Routing for Software-Defined Networks.
NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Partially Overlapped Channels Not Considered Harmful Arunesh Mishra, Vivek Shrivastava, Suman Banerjee, William Arbaugh (ACM SIGMetrics 2006) Slides adapted.
1 Network Tomography Using Passive End-to-End Measurements Lili Qiu Joint work with Venkata N. Padmanabhan and Helen J. Wang.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
MicroGrid Update & A Synthetic Grid Resource Generator Xin Liu, Yang-suk Kee, Andrew Chien Department of Computer Science and Engineering Center for Networked.
PATH DIVERSITY WITH FORWARD ERROR CORRECTION SYSTEM FOR PACKET SWITCHED NETWORKS Thinh Nguyen and Avideh Zakhor IEEE INFOCOM 2003.
Drafting Behind Akamai (Travelocity-Based Detouring) Ao-Jan Su, David R. Choffnes, Aleksandar Kuzmanovic and Fabián E. Bustamante Department of EECS Northwestern.
Vivaldi: A Decentralized Network Coordinate System
Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting
Northwestern Lab for Internet and Security Technology (LIST) Yan Chen Department of Computer Science Northwestern University.
Content Distribution Network (CDN)
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

An Algebraic Approach to Practical and Scalable Overlay Network Monitoring University of California at Berkeley David Bindel, Hanhee Song, and Randy H. Katz Yan Chen Northwestern University ACM SIGCOMM 2004

Motivation Infrastructure ossification led to thrust of overlay and P2P applications Such applications flexible on paths and targets, thus can benefit from E2E distance monitoring –Overlay routing/location –VPN management/provisioning –Service redirection/placement … Requirements for E2E monitoring system –Scalable & efficient: small amount of probing traffic –Accurate: capture congestion/failures –Adaptive: nodes join/leave, topology changes –Robust: tolerate measurement errors –Balanced measurement load

Related Work General metrics: RON (n 2 measurement) Latency estimation –Link-level-measurement: min set cover (Ozmultu et al), similar approach for giving bounds of other metrics (Tang & McKinley) –Clustering-based: IDMaps, Internet Isobar, etc. –Coordinate-based: GNP, Virtual Landmarks, Vivaldi, etc. Network tomography –Focusing on inferring the characteristics of physical links rather than E2E paths –Limited measurements -> under-constrained system, unidentifiable links

Problem Formulation Given an overlay of n end hosts and O(n 2 ) paths, how to select a minimal subset of paths to monitor so that the loss rates/latency of all other paths can be inferred. Assumptions: Topology measurable Can only measure the E2E path, not the link

Outlines An algebraic approach framework Algorithms for a fixed set of overlay nodes Scalability analysis Adaptive dynamic algorithms Measurement load balancing Handling topology measurement errors Simulations and Internet experiments

Our Approach Select a basis set of k paths that fully describe O(n 2 ) paths (k «O(n 2 )) Monitor the loss rates of k paths, and infer the loss rates of all other paths Applicable for any additive metrics, like latency End hosts Overlay Network Operation Center topology measurements

Modeling of Path Space Path loss rate p, link loss rate l A D C B p1p1

Putting All Paths Together Totally r = O(n 2 ) paths, s links, s <<r … =

Sample Path Matrix x 1 - x 2 unknown => cannot compute x 1, x 2 To separate identifiable vs. unidentifiable components: x = x G + x N A D C B b1b1 b2b2 b3b3 1 2 Virtualization Virtual links All E2E paths (G) are orthogonal to x N, i.e., Gx N = 0

Intuition through Topology Virtualization Virtual links: minimal path segments whose loss rates uniquely identified Real links (solid) and all of the overlay paths (dotted) traversing them Virtualization Virtual links ’2’ Rank(G)= ’ 2’ 4 Rank(G)=3 3’ 4’ Can fully describe all paths x G composed of virtual links

Algorithms Select k = rank(G) linearly independent paths to monitor (one time) –Use QR decomposition –Leverage sparse matrix: time O(rk 2 ) and memory O(k 2 ) E.g., 79 seconds for n = 300 (r = 44850) and k = 2541 Compute the loss rates of other paths (continuously) –Time O(k 2 ) and memory O(k 2 ) … = … =

Outlines An algebraic approach framework Algorithms for fixed set of overlay nodes Scalability analysis Adaptive dynamic algorithms Measurement load balancing Handling topology measurement errors simulations and Internet experiments

How many measurements saved ? k « O(n 2 ) ? For a power-law Internet topology When the majority of end hosts are on the overlay When a small portion of end hosts are on overlay –If Internet a pure hierarchical structure (tree): k = O(n) –If Internet no hierarchy at all (worst case, clique): k = O(n 2 ) –Internet has moderate hierarchical structure [TGJ+02] k = O(n) (with proof) For reasonably large n, (e.g., 100), k = O(nlogn)

Linear Regression Tests of the Hypothesis BRITE Router-level Topologies –Barbarasi-Albert, Waxman, Hierarchical models Mercator Real Topology Most have the best fit with O(n) except the hierarchical ones fit best with O(nlogn) BRITE 20K-node hierarchical topology Mercator 284K-node real router topology

Outlines An algebraic approach framework Algorithms for fixed set of overlay nodes Scalability analysis Adaptive dynamic algorithms Measurement load balancing Handling topology measurement errors Simulations and Internet experiments

Topology Changes Basic building block: add/remove one path –Incremental changes: O(k 2 ) time (O(n 2 k 2 ) for re-scan) –Add path: check linear dependency with old basis set, –Delete path p : hard when –Intuitively, two steps Add/remove end hosts, Routing changes Routing relatively stable in order of a day => incremental detection

Topology Change Example A D C B b1b1 b2b2 b3b3 1 2 Virtualization Virtual links

Other Practical Issues Measurement load balancing –Randomly reorder the paths in G before scanning them for selection of –Has no effect on the loss rate estimation accuracy Topology measurement errors tolerance –Care about path loss rates than any interior links –Router aliases => Let it be: assign similar loss rates to the same links –Path (segments) without topology info => add virtual links to bypass

Outlines An algebraic approach framework Algorithms for fixed set of overlay nodes Scalability analysis Adaptive dynamic algorithms Measurement load balancing Handling topology measurement errors Simulations and Internet experiments

Areas and Domains # of hosts US (40).edu33.org3.net2.gov1.us1 Interna- tional (11) Europe (6) France1 Sweden1 Denmark1 Germany1 UK2 Asia (2) Taiwan1 Hong Kong1 Canada2 Australia1 Evaluation Extensive Simulations –See paper Experiments on PlanetLab –51 hosts, each from different organizations 51 × 50 = 2,550 paths –Simultaneous loss rate measurement 300 trials, 300 msec each In each trial, send a 40-byte UDP pkt to every other host –Topology measurement (traceroute) –100 experiments in peak hours of North America

Loss rate distribution On average k = 872 out of 2550 Metrics –Absolute error |p – p’ |: Average for all paths, for lossy paths –Relative error [BDPT02] Average 1.1 for all paths, and 1.7 for lossy paths loss rate [0, 0.05) lossy path [0.05, 1.0] (4.1%) [0.05, 0.1)[0.1, 0.3)[0.3, 0.5)[0.5, 1.0)1.0 %95.9%15.2%31.0%23.9%4.3%25.6% PlanetLab Experiment Results

More Experiment Results Running time –Setup (path selection): 0.75 seconds –Update (for all 2550 paths): 0.16 seconds –More results on topology change adaptation: see paper Robustness –Out of 14 sets of pair-wise traceroute … –On average 245 out of 2550 paths have no or incomplete routing information –No router aliases resolved Conclusion: robust against topology measurement errors

Simulation on an overlay of 300 end hosts, average load 8.5 With balancing: Gaussian-like load distribution Without: heavily skewed, with the max almost 20 times the average Results for Measurement Load Balancing

Conclusions A tomography-based overlay network monitoring system –Given n end hosts, characterize O(n 2 ) paths with a basis set of O(nlogn) paths –Selectively monitor the basis set for their loss rates, then infer the loss rates of all other paths –Adaptive to topology changes –Balanced measurement load –Topology measurement error tolerance Both simulation and PlanetLab experiments show promising results Built an adaptive overlay streaming media system on top of it

Backup Slides

Other Practical Issues Topology measurement errors tolerance –Care about path loss rates than any interior links –Poor router alias resolution => assign similar loss rates to the same links –Unidentifiable routers => add virtual links to bypass Measurement load balancing on end hosts –Randomly order the paths for scan and selection of

Modeling of Path Space Path loss rate p, link loss rate l Put all r = O(n 2 ) paths together Totally s links A D C B p1p1

Sample Path Matrix x 1 - x 2 unknown => cannot compute x 1, x 2 Set of vectors form null space To separate identifiable vs. unidentifiable components: x = x G + x N All E2E paths are in path space, i.e., Gx N = 0 A D C B b1b1 b2b2 b3b3 (1,-1,0) x2x2 x1x1 x3x3 (1,1,0) path/row space (measured) null space (unmeasured)

Intuition through Topology Virtualization Virtual links: Minimal path segments whose loss rates uniquely identified Can fully describe all paths x G is composed of virtual links A D C B b1b1 b2b2 b3b3 (1,-1,0) x2x2 x1x1 x3x3 (1,1,0) path/row space (measured) null space (unmeasured) 1 2 Virtualization Virtual links All E2E paths are in path space, i.e., Gx N = 0

Algorithms Select k = rank(G) linearly independent paths to monitor –Use rank revealing decomposition –Leverage sparse matrix: time O(rk 2 ) and memory O(k 2 ) E.g., 10 minutes for n = 350 (r = 61075) and k = 2958 Compute the loss rates of other paths –Time O(k 2 ) and memory O(k 2 ) … =

Practical Issues Topology measurement errors tolerance –Care about path loss rates than any interior links –Poor router alias resolution => assign similar loss rates to the same links –Unidentifiable routers => add virtual links to bypass Measurement load balancing on end hosts –Randomly order the paths for scan and selection of Topology Changes –Efficient algorithms for incrementally update of for adding/removing end hosts & routing changes

Measurement load balancing Putting load values of each node in 10 equally spaced bins Running time –Setup (path selection): 0.75 seconds –Update (for all 2550 paths): 0.16 seconds –More results on topology change adaptation: see paper More Experiment Results With load balancingWithout load balancing

Work in Progress Provide it as a continuous service on PlanetLab Network diagnostics: Which links or path segments are down Iterative methods for better speed and scalability

Evaluation Simulation –Topology BRITE: Barabasi-Albert, Waxman, hierarchical: 1K – 20K nodes Real topology from Mercator: 284K nodes –Fraction of end hosts on the overlay: % –Loss rate distribution (90% links are good) Good link: 0-1% loss rate; bad link: 5-10% loss rates Good link: 0-1% loss rate; bad link: 1-100% loss rates –Loss model: Bernouli: independent drop of packet Gilbert: busty drop of packet –Path loss rate simulated via transmission of 10K pkts Experiments on PlanetLab

Areas and Domains # of hosts US (40).edu33.org3.net2.gov1.us1 Interna- tional (11) Europe (6) France1 Sweden1 Denmark1 Germany1 UK2 Asia (2) Taiwan1 Hong Kong1 Canada2 Australia1 Evaluation Extensive Simulations Experiments on PlanetLab –51 hosts, each from different organizations –51 × 50 = 2,550 paths –On average k = 872 Results Highlight –Avg real loss rate: –Absolute error mean: % < –Relative error mean: % < 2.0 –On average 248 out of 2550 paths have no or incomplete routing information –No router aliases resolved

Sensitivity Test of Sending Frequency Big jump for # of lossy paths when the sending rate is over 12.8 Mbps

Loss rate distribution Metrics –Absolute error |p – p’ |: Average for all paths, for lossy paths –Relative error [BDPT02] –Lossy path inference: coverage and false positive ratio On average k = 872 out of 2550 loss rate [0, 0.05) lossy path [0.05, 1.0] (4.1%) [0.05, 0.1)[0.1, 0.3)[0.3, 0.5)[0.5, 1.0)1.0 %95.9%15.2%31.0%23.9%4.3%25.6% PlanetLab Experiment Results

Accuracy Results for One Experiment 95% of absolute error < % of relative error < 2.1

Accuracy Results for All Experiments For each experiment, get its 95% absolute & relative errors Most have absolute error < and relative error < 2.0

Lossy Path Inference Accuracy 90 out of 100 runs have coverage over 85% and false positive less than 10% Many caused by the 5% threshold boundary effects

Performance Improvement with Overlay With single-node relay Loss rate improvement –Among 10,980 lossy paths: –5,705 paths (52.0%) have loss rate reduced by 0.05 or more –3,084 paths (28.1%) change from lossy to non-lossy Throughput improvement –Estimated with –60,320 paths (24%) with non-zero loss rate, throughput computable –Among them, 32,939 (54.6%) paths have throughput improved, 13,734 (22.8%) paths have throughput doubled or more Implications: use overlay path to bypass congestion or failures

X UC Berkeley UC San Diego Stanford HP Labs Adaptive Overlay Streaming Media Implemented with Winamp client and SHOUTcast server Congestion introduced with a Packet Shaper Skip-free playback: server buffering and rewinding Total adaptation time < 4 seconds

Adaptive Streaming Media Architecture

Conclusions A tomography-based overlay network monitoring system –Given n end hosts, characterize O(n 2 ) paths with a basis set of O(nlogn) paths –Selectively monitor O(nlogn) paths to compute the loss rates of the basis set, then infer the loss rates of all other paths Both simulation and real Internet experiments promising Built adaptive overlay streaming media system on top of monitoring services –Bypass congestion/failures for smooth playback within seconds