1 Internet Networking and Application Troubleshooting Yao Zhao EECS Department Northwestern University.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Bayesian Piggyback Control for Improving Real-Time Communication Quality Wei-Cheng Xiao 1 and Kuan-Ta Chen Institute of Information Science, Academia Sinica.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Rake: Semantics Assisted Network- based Tracing Framework Yao Zhao (Bell Labs), Yinzhi Cao, Yan Chen, Ming Zhang (MSR) and Anup Goyal (Yahoo! Inc.) Presenter:
1 Internet Networking and Application Troubleshooting Yao Zhao EECS Department Northwestern University.
Yale LANS ShadowStream: Performance Evaluation as a Capability in Production Internet Live Streaming Networks Chen Tian Richard Alimi Yang Richard Yang.
Towards Unbiased End-to-End Network Diagnosis Name: Kwan Kai Chung Student ID: Date: 18/3/2007.
Small-world Overlay P2P Network
Traffic Engineering With Traditional IP Routing Protocols
Server-based Inference of Internet Performance V. N. Padmanabhan, L. Qiu, and H. Wang.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
An Algebraic Approach to Practical and Scalable Overlay Network Monitoring Yan Chen, David Bindel, Hanhee Song, Randy H. Katz Presented by Mahesh Balakrishnan.
NetQuest: A Flexible Framework for Internet Measurement Lili Qiu Joint work with Mike Dahlin, Harrick Vin, and Yin Zhang UT Austin.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
1 A Suite of Schemes for User-level Network Diagnosis without Infrastructure Yao Zhao, Yan Chen Lab for Internet and Security Technology, Northwestern.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
1 Load Balance and Efficient Hierarchical Data-Centric Storage in Sensor Networks Yao Zhao, List Lab, Northwestern Univ Yan Chen, List Lab, Northwestern.
User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.
Yao Zhao 1, Yan Chen 1, David Bindel 2 Towards Unbiased End-to-End Diagnosis 1.Lab for Internet & Security Tech, Northwestern Univ 2.EECS department, UC.
Polaris Financial Technologies Welcomes the members of Hyderabad chapter for the 2nd event on 4 th July 14 held by PACE (The Testing Practice)
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems.
Scalable and Deterministic Overlay Network Diagnosis Yao Zhao, Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Dept. of Computer.
New Challenges in Cloud Datacenter Monitoring and Management
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
Towards Highly Reliable Enterprise Network Services via Inference of Multi-level Dependencies Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
13/09/2015 Michael Chai; Behrouz Forouzan Staffordshire University School of Computing Transport layer and Application Layer Slide 1.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
VeriFlow: Verifying Network-Wide Invariants in Real Time
Configuring, Diagnosing, and Securing Data Center Networks and Systems Yan Chen Lab for Internet and Security Technology (LIST) Department of Electrical.
Security for the Optimized Link- State Routing Protocol for Wireless Ad Hoc Networks Stephen Asherson Computer Science MSc Student DNA Lab 1.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Trust- and Clustering-Based Authentication Service in Mobile Ad Hoc Networks Presented by Edith Ngai 28 October 2003.
Inference, monitoring and recovery of large scale networks CSE Department PennState University Institute for Networking and Security Research Faculty:
Muhammad Mahmudul Islam Ronald Pose Carlo Kopp School of Computer Science & Software Engineering Monash University, Australia.
Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols ► Acts as denial of service by disrupting the flow of data between a source and.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
A Light-Weight Distributed Scheme for Detecting IP Prefix Hijacks in Real-Time Lusheng Ji†, Joint work with Changxi Zheng‡, Dan Pei†, Jia Wang†, Paul Francis‡
TOPOLOGY MANAGEMENT IN COGMESH: A CLUSTER-BASED COGNITIVE RADIO MESH NETWORK Tao Chen; Honggang Zhang; Maggio, G.M.; Chlamtac, I.; Communications, 2007.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
1 G-REMiT: An Algorithm for Building Energy Efficient Multicast Trees in Wireless Ad Hoc Networks Bin Wang and Sandeep K. S. Gupta Computer Science and.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh Paolo Costa Antony Rowstron Greg O’Shea Austin Donnelly MICROSOFT RESEARCH Presented By Deng.
Troubleshooting Mesh Networks Lili Qiu Joint Work with Victor Bahl, Ananth Rao, Lidong Zhou Microsoft Research Mesh Networking Summit 2004.
A Runtime Verification Based Trace-Oriented Monitoring Framework for Cloud Systems Jingwen Zhou 1, Zhenbang Chen 1, Ji Wang 1, Zibin Zheng 2, and Wei Dong.
Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.
Change Is Hard: Adapting Dependency Graph Models For Unified Diagnosis in Wired/Wireless Networks Lenin Ravindranath, Victor Bahl, Ranveer Chandra, David.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
Yan Chen Dept. of Electrical Engineering and Computer Science Northwestern University Spring Review 2008 Award # : FA Intrusion Detection.
NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Bing Wang, Wei Wei, Hieu Dinh, Wei Zeng, Krishna R. Pattipati (Fellow IEEE) IEEE Transactions on Mobile Computing, March 2012.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY DYSWIS.
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
Problem: Internet diagnostics and forensics
A Study of Group-Tree Matching in Large Scale Group Communications
Wireless Sensor Network Architectures
End-user Based Network Measurement and Diagnosis
Dhruv Gupta EEC 273 class project Prof. Chen-Nee Chuah
Presentation transcript:

1 Internet Networking and Application Troubleshooting Yao Zhao EECS Department Northwestern University

2 Outline Motivation Dissertation Overview Application Layer Troubleshooting –Rake Network Layer Troubleshooting –VScope, Lend, FAD and SPA Conclusions and Future Work

3 Motivation “When something breaks in the Internet, the Internet's very decentralized structure makes it hard to figure out what went wrong and even harder to assign responsibility.” - “Looking Over the Fence at Networks: A Neighbor's View of Networking Research”, by Committees on Research Horizons in networking, National Research Council, 2001.

4 Troubleshooting Philosophy Entity Oriented Troubleshooting –Monitor entity separately E.g. Router packet drop rates, queue size and other SNMP counters E.g. Machine CPU load, I/O intensity, network utility and other performance counters –Potential problems Not all entities can be monitored Inferring entity performance from the counters may be challenging

5 Troubleshooting Philosophy Entity Oriented Troubleshooting Task Based Troubleshooting –Use task performance to infer entity performance E.g. From Internet path loss rate to infer link-level loss rates –Advantage Work with limited monitor points (e.g. end hosts) Focus on target performance directly

6 Thesis Statements We design troubleshooting systems that monitor and diagnosis the Internet distribute systems in both network layer and application layer using the task based troubleshooting philosophy.

7 Publications Conference Papers –Y. Zhao, Y. Chen, S. Ratnasamy, Load balanced and Efficient Hierarchical Data-Centric Storage in Sensor Networks, in the Proc. of SECON 2008 –Y. Gao, Y. Zhao, R. Schweller, S. Venkataraman, Y. Chen, D. Song, and M. Kao, Detecting Stealthy Spreaders Using Online Outdegree Histograms, in the Proc. of IWQos, 2007Detecting Stealthy Spreaders Using Online Outdegree Histograms –Y. Zhao and Y. Chen, A Suite of Schemes for User-level Network Diagnosis without Infrastructure, in the Proc. of IEEE INFOCOM, 2007A Suite of Schemes for User-level Network Diagnosis without Infrastructure –P. Narayana, R. Chen, Y. Zhao, Y. Chen, Z. Fu, and H. Zhou, Automatic Vulnerability Checking of IEEE WiMAX Protocols through TLA+, in Proc. of NPSec, 2006 Automatic Vulnerability Checking of IEEE WiMAX Protocols through TLA+ –Y. Zhao, Y. Chen, and D. Bindel, Towards Unbiased End-to-End Network Diagnosis, in Proc. of ACM SIGCOMM 2006Towards Unbiased End-to-End Network Diagnosis –Y. Zhao, Q. Zhang, B. Li, Y. Chen and W. Zhu, Hop ID based Routing in Mobile Ad Hoc Networks, in Proceedings of ICNP, 2005Hop ID based Routing in Mobile Ad Hoc Networks

8 Publications Journal Papers –Y. Zhao and Y. Chen, FAD and SPA: A Suite of Schemes for User-level Network Diagnosis without Infrastructure, submitted to Computer Networks Journal and the last review feedback is minor revision.FAD and SPA: A Suite of Schemes for User-level Network Diagnosis without Infrastructure –Y. Zhao, Y. Chen, and D. Bindel, Towards Unbiased End-to-End Network Diagnosis, submitted to IEEE/ACM Transactions on Networking and the last review feedback is minor revisionTowards Unbiased End-to-End Network Diagnosis –Y. Zhao, Y. Chen, B. Li, Q. Zhang, Hop ID: A Virtual Coordinate based Routing for Sparse Mobile Ad Hoc Networks, in IEEE Transaction on Mobile Computing, Vol. 6, No. 9, pp , September, 2007 Patents –E. C. Gillum, Q. Ke, Y. Xie, F. Yu and Y. Zhao, Graph Based Bot-User Detection, being filed through Microsoft Corporation, MS docket number –J. Wang, Y. Chen, D. Pei, Y. Zhao, and Z. Zhu, Towards Efficient Large- Scale Network Monitoring and Diagnosis Under Operational Constraints, being filed through AT&T, docket number

9 Outline Motivation Dissertation Overview Application Layer Troubleshooting –Rake Network Layer Troubleshooting –VScope, Lend, FAD and SPA Conclusions and Future Work

10 Internet Troubleshooting Diagnosis Model Data Link Network Transport Application Monitoring

11 Components in Network Troubleshooting Model –Defines the extrinsic observations and intrinsic faulty problems as well as the relationship between them Define and instantiate the model Monitoring –Collect the observations Diagnosis –Identify the faulty location and find out the root cause

12 Thesis Research Topics Diagnosis Model Data Link Network Transport Application Monitoring Lend, FAD and SPA VScop e Rake

13 Outline Motivation Dissertation Overview Application Layer Troubleshooting –Rake Network Layer Troubleshooting –VScope, Lend, FAD and SPA Conclusions and Future Work

14 Rake: Semantic Assisted Large Distributed System Diagnosis Motivation Related Work Rake Evaluation Conclusions

15 Motivation Large distributed systems involve hundreds or thousands of nodes –E.g. search system, CDN Host-based monitoring cannot infer the performance or detect bugs –Hard to translate OS-level info (such as CPU load) into application performance –Application log may not be enough Task-based approach is adopted in many diagnosis systems –WAP5, Magpie, Sherlock

16 Task-based Approaches The Critical Problem – Message Linking –Link the messages in a task together into a path or tree

17 Example of Message Linking in Search System URL Search keyword URL Search keyword Doc ID

18 Task-based Approaches The Critical Problem – Message Linking –Link the messages in a task together into a path or tree Black-box approaches –Do not need to instrument the application or to understand its internal structure or semantics –Time correlation to link messages Project 5, WAP5, Sherlock White-box approaches –Extracts application-level data and requires instrumenting the application and possibly understanding the application's source codes –Insert a unique ID into messages in a task X-Trace, Pinpoint

19 Problems of White-box and Black-Box White-box –Invasive due to source code modification Black-box –Rely on time Correlation –Accuracy affected by cross traffic

20 Related Work Non-InvasiveInvasive Network Sniffing Interpo- sition App or OS Logs Source code modification Black-box Project 5, Sherlock WAP5Footprint Grey-boxRakeMagpie White-box X-Trace, Pinpoint Invasiveness Application Knowledge

21 Rake Key Observations –Generally no unique ID linking the messages associated with the same request –Exist polymorphic IDs in different stages of the request Semantic Assisted –Use the semantics of the system to identify polymorphic IDs and link messages

22 Message Linking Example URL Search keyword URL Search keyword Doc ID

23 Questions on Semantics What Are the Necessary Semantics? –In worst case, re-implement the application How Does Rake Use the Semantics? –Naïve design is to implement Rake for each application with specific application semantics How Efficient Is the Rake with Semantics –Can message linking to accurate? –What’s the computational complexity of Rake?

24 Necessary Semantics Intra-node linking –The system semantics Inter-node link –The protocol semantics Node P Q R S

25 Utilize Semantics in Rake Implement Different Rakes for Different Application is time consuming –Lesson learnt for implementing two versions of Rake for CoralCDN and IRC Design Rake to take general semantics –A unified infrastructure –Provide simple language for user to supply semantics

26 Example of Rake Language (IRC) TCP 6667 Regular expression PRIVMSG\s+(.*) Same as Link ID No Return ID P Q R S Link_IDFollow_ID = Query_ID = Response_ID

27 Signature Signature to Classify Messages – TCP 6667 – Formats of Signatures –Socket information Protocol, port –Expression for TCP/IP header udp [10]&128==0 –Regular expression –User defined function

28 Link_ID and Follow_ID Follow_IDs –The IDs will be in the triggered messages by this message –One message may have multiple Follow_IDs for triggering multiple messages Link_ID –The ID of the current message –Match with Follow_ID previously seen Linking of Link_ID and Follow_ID –Mainly for intra-node message linking

29 Query_ID and Response_ID Query_IDs –The communication is in Query/Response style, e.g. RPC call and DNS query/response. –The IDs will be in the response messages to this message Response_ID –The ID of the current message to match Query_ID previously seen –By default requires the query and response to use the same socket Linking of Query_ID and Response_ID –Mainly for inter-node message linking

30 Complicated Semantics The process of generating IDs may be complicated –XML or regular expression is not good at complex computations –So let user provide own functions User provide share/dynamic libraries Specify the functions for IDs in XML Implementation using Libtool to load user defined function in runtime

31 Example for DNS UDP 53 udp[10] & 128 == 0 User Function dns.so Link_ID Link_ID Link_ID …………………………….. Extract the queried host

32 Accuracy Analysis One-to-one ID Transforming –Examples In search, URL -> Keywords -> Canonical format In CoralCDN, URL -> Sha1 hash value –Ideally no error if requests are distinct Request ambiguousness –Search keywords Microsoft search data Less than 1% messages with duplication in 1s –Web URL Two real http traces Less than 1% messages with duplication in 1s –Chat messages No duplication with timestamps

33 Potential Applications Search –Verified by a Microsoft guy CDN –CoralCDN is studied and evaluated Chat System –IRC is tested Distributed File System –Hadoop DFS is tested

34 Evaluation Application –CoralCDN –Deployed on PlanetLab Experiment –Employ PlanetLab hosts as web clients –Retrieve URLs from real traces with different frequency Metrics –Linking accuracy (false positive, false negative) –Diagnosis ability Compared Approach –WAP5

35 CoralCDN Task Tree

36 Message Linking Accuracy Rake Linking Accuracy is 100% for CoralCDN –Sha1 hash provides almost one-to-one URL to HashID mapping –The cache mechanism If the same URL is received twice, the 2 nd one will be blocked until the first one retrieves back the webpage Use Rake Linking as Ground Truth to Evaluate WAP5

37 Message Linking Accuracy (1) The higher request rate, the less accuracy in WAP5.

38 Message Linking Accuracy (1) The higher request rate, the less accuracy in WAP5.

39 Diagnosis Ability Controlled Experiments –Inject junk CPU-intensive processes –Calculated the packet processing time using WAP5 and Rake Obviously Rake can identify the slow machine, while WAP5 fails.

40 Discussion Implementation Experience –How hard for user to provide semantics CoralCDN – 1 week source code study DNS – a couple of hours Hadoop DFS – 1 week source code study Inter-process Communication Encryption –Dynamic library interposition

41 Conclusions of Rake Feasibility –Rake works for many popular applications in different categories Easiness –Rake allows user to write semantics via XML –Necessary semantics are easy to obtained given our experience Accuracy –Much more accurate than black-box approaches and probably matches white-box approaches

42 Outline Motivation Dissertation Overview Application Layer Troubleshooting –Rake Network Layer Troubleshooting –VScope, Lend, FAD and SPA Conclusions and Future Work

43 Network Layer Troubleshooting LEND [Sigcomm06] –Tomography Diagnosis with least statistic assumptions FAD & SPA [Infocom05] –On-demand loss rate diagnosis without infrastructure VScope [Patent] –Experimental design for ISP VPN network monitoring and diagnosis

44 LEND Motivation –Use end-to-end measurement to infer link level properties with the measurement infrastructure Problem Formulation –Given end-to-end measurements, what is the finest granularity of link properties can we achieve under the basic assumptions? Basic assumptions More and stronger statistic assumptions Virtual link Diagnosis granularity? Better accuracy

45 LEND Contributions –Define the minimal identifiable unit under basic assumptions (MILS) –Prove that only E2E paths are MILS with a directed graph topology (e.g., the Internet) –Propose good path algorithm (incorporating measurement path properties) for finer MILS Basic assumptions More and stronger statistic assumptions Virtual link Diagnosis granularity? Better accuracy

46 FAD & SPA Motivation –How do end users, with no special privileges, identify packet loss inside the network with one or two computers? Conclusions –We proposed three user-level loss rate diagnosis approaches –The combo of our approaches and Tulip [SOSP03] is much better than any single approach

47 VScope Challenges in ISP Network Monitoring and Diagnosis –Operational constraints on monitors and links A monitor can measure a certain number of paths at a time Measurement traffic through a link cannot exceed a threshold Path and monitor selection constraints –Real-time diagnosis Conclusions –Propose multi-round monitoring scheme for the monitor selection problem under operational constraints. –Propose continuous monitoring and diagnosis algorithm to quickly diagnose faulty links

48 Outline Motivation Dissertation Overview Application Layer Troubleshooting –Rake Network Layer Troubleshooting –VScope, Lend, FAD and SPA Conclusions and Future Work

49 Conclusions and Future Work Demonstrate Task-based Troubleshooting Is Promising –Network layer troubleshooting VScope, LEND, FAD and SPA –Application layer troubleshooting Rake Future Work –Make Rake a tool ready to publish –Extend Rake in diagnosis Timeline for Thesis Writing –From present to Feb. 1

50 Q & A? Thanks!

51

52 Backup

53 Monitor Setup Phase Single-round Monitoring –Measure all the target paths simultaneously –Basic and is adopted by most monitoring experimental design papers Multi-round Monitoring –Measure all the target paths in different time period (round) Tradeoff between time and link/node constraints –Multi-round Monitoring is necessary and efficient for two reasons Existing of operational constraints Star-like topology

54 Single-Round Monitor Selection Pure Greedy Algorithm –Select monitors one by one and every time select the monitor that can measure most uncovered links under the constraints To calculate the gain of adding a new monitor is a variant of Maximum k-Coverage problem –Simple and local optimized Greedy Assisted Linear Programming based algorithm

55 Greedy Assisted Linear Programming based algorithm Formulate Integer Linear Programming First –ILP is NP-hard problem Relaxation to Linear Programming –Change all {0,1}-variable to continuous variable between 0 and 1 Random Rounding –Solve the linear programming in polinomial time –Round the solutions within [0, 1] back to {0,1}-integers with certain probabilities

56 Multi-round Monitor Selection Star-like Topology and Operation Constraints Make Single-round Monitor Selection Inefficient –Multi-round monitoring vs Reducing measurement frequency Algorithms for Multi-round Monitor Selection –Multiple the constraints with the round number and run single-round monitor selection –Schedule the paths to measure in different rounds Greedy scheduling Random scheduling Linear programming based scheduling

57 Path Measurement Scheduling Greedy algorithm –Minimize link utilization in every step Random algorithm –Randomly schedule paths independently –Run random algorithm multiple times to get the best one Linear Programming based algorithm with random rounding

58 Monitoring and Diagnosis Path Monitoring and Faulty Path Discovery Faulty Link Diagnosis –Select and measure some paths which favor of the diagnosis of the potential faulty links

59 Background and Related Work Network Layer Diagnosis –Linear algebraic model –Monitoring experimental design –Diagnosis algorithms Application Layer Diagnosis –Sherlock: enterprise network service diagnosis

60 Linear Algebraic Model Path loss rate p i, link loss rate l j : A D C B p1p1 p2p2 Usually an underconstrained system G

61 Monitoring Experimental Design Monitor Placement Problem –Select least monitors that can measure some paths covering all the links [Infocom03] Path Selection Problem –Selection of the basis of the path matrix [Sigcomm04] –SVD based path selection [Infocom05] –Bayesian experimental design [Sigmetrics06] Network Layer Diagnosis

62 Network Layer Diagnosis Internet Tomography –Temporal correlations based algorithms Unbiased if multicast is supported –Statistic algorithms Introducing additional statistic assumption or optimization goal 0.1 0

63 Publications Papers –Y. Zhao, Y. Chen, S. Ratnasamy, Load balanced and Efficient Hierarchical Data- Centric Storage in Sensor Networks, in the Proc. of SECON 2008 –Y. Gao, Y. Zhao, R. Schweller, S. Venkataraman, Y. Chen, D. Song, and M. Kao, Detecting Stealthy Spreaders Using Online Outdegree Histograms, in the Proc. of IWQoS, 2007 Detecting Stealthy Spreaders Using Online Outdegree HistogramsIWQoS, 2007 –Y. Zhao and Y. Chen, A Suite of Schemes for User-level Network Diagnosis without Infrastructure, in the Proc. of IEEE INFOCOM, 2007A Suite of Schemes for User-level Network Diagnosis without InfrastructureIEEE INFOCOM, 2007 –P. Narayana, R. Chen, Y. Zhao, Y. Chen, Z. Fu, and H. Zhou, Automatic Vulnerability Checking of IEEE WiMAX Protocols through TLA+, in Proc. of NPSec, 2006Automatic Vulnerability Checking of IEEE WiMAX Protocols through TLA+NPSec, 2006 –Y. Zhao, Y. Chen, and D. Bindel, Towards Unbiased End-to-End Network Diagnosis, in Proc. of ACM SIGCOMM 2006Towards Unbiased End-to-End Network DiagnosisACM SIGCOMM 2006 –Y. Zhao, Q. Zhang, B. Li, Y. Chen and W. Zhu, Hop ID based Routing in Mobile Ad Hoc Networks, in Proceedings of ICNP, 2005Hop ID based Routing in Mobile Ad Hoc NetworksICNP, 2005 Patents –E. C. Gillum, Q. Ke, Y. Xie, F. Yu and Y. Zhao, Graph Based Bot-User Detection, being filed through Microsoft Corporation, MS docket number –J. Wang, Y. Chen, D. Pei, Y. Zhao, and Z. Zhu, Towards Efficient Large-Scale Network Monitoring and Diagnosis Under Operational Constraints, being filed through AT&T, docket number

64 Problem Definition (2) Monitor Setup Phase –From certain monitor candidates select minimal number of monitors, which in the measurement phase can measure a certain path set that covers all links in the network under the given measurement constraints –NP-hard even without considering constraints Monitoring and Fault Diagnosis Phase –When faulty paths are discovered in the path monitoring phase, how to quickly select some paths under the operational constraints to be further measured so that the faulty link(s) can be accurately identified?

65 VScope Motivation Two Important Services Provided by ISP –Internet access service –VPN service Monitoring and Diagnosis on ISP Networks –Ensure Service Level Agreement (SLA) –Help Network Operations