Presentation is loading. Please wait.

Presentation is loading. Please wait.

©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang.

Similar presentations


Presentation on theme: "©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang."— Presentation transcript:

1 ©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang Real-time Application Monitoring and Diagnosis for Service Hosting Platforms of Black Boxes

2 ©NEC Laboratories America 2 outline  Motivation  SRAMD architecture  Application component dependency discovery  Evaluation  Conclusions

3 ©NEC Laboratories America 3 Motivation  Service hosting systems  Web farms, service-oriented utility computing networks, Peer-to-Peer service composition based computing grids, …  Service management  Fault diagnosis, capacity planning, performance analysis, impact analysis, etc.  Challenges  Application components are usually delivered as black-boxes w/o sufficient instrumentation  The huge amount of logging information in large-scale systems makes real-time monitoring and debugging unrealistic with a centralized approach App. 1 App. 3 App. 2 App. 4

4 ©NEC Laboratories America 4 An intuition of the SRAMD Art Source: www.pictureMOSAICs.com

5 ©NEC Laboratories America 5 Scalable Real-time Application Monitoring and Diagnosis  SRAMD: an extensible tool that is  easy to deploy  scalable, and  able to effectively profile the intricate dependency relationships among interacting application components seen as black boxes.  Our approach  uses low level packet traces instead of high level event traces to get insight into application components  Has end-system instrumentation for close observation on the correlation between application performance and local resource utilization, and for enabling a rich set of queries for diagnosis  understands the overall system/application behavior and performance by aggregating and correlating summarizations from distributed components

6 ©NEC Laboratories America 6 SRAMD in Operation application X application Y application Z hosting server An application level passive resource monitor with active summarization An extensible framework for application topology discovery, capacity planning and performance debugging

7 ©NEC Laboratories America 7 The SRAMD Controller  Aggregator  retrieves, validates information blocks available in the repository, and organizes them into per-application groups. Collector Aggregator Visualizer Diagnosis  Diagnosis  generates probing requests to related monitors with operator interaction to get detailed information about application components and to isolate possible bottlenecks for performance debugging.  Visualizer  constructs in-memory DOT files [DOT] using outputs from the aggregator and calls the Grappa [Grappa] to visualize application topologies enriched with component traffic statistics and causal probabilities.  collector  passively collects summarization data from distributed monitors through UDP.

8 ©NEC Laboratories America 8 The SRAMD Controller snapshot

9 ©NEC Laboratories America 9 The SRAMD Monitor  Periodically probe for CPU, memory and disk usage of every registered application component.  Passively capture network traffic and associate captured packets to registered application components,  Actively calculate useful local application statistics and dependencies from packet traces  Temporarily perform diagnosis tasks on-demand to assist performance diagnosis and debugging.

10 ©NEC Laboratories America 10 Application component dependency discovery  Given two application components A and B in the system, we want to discover the following real-time dependency relationships between A and B during a time interval:  are the input requests of one components caused by another one (directly or indirectly)? and in what percentage if yes? ABCDE time line C B DE A C B DE A C B DE r1r1 r2r2 r3r3 A

11 ©NEC Laboratories America 11 Dealing with transient connections  Local Dependency Discovery (LDD)  Find IDs of peer application components that local ones talked to in the last report interval. Every SRAMD monitor sends a list of (LocalPort, AppCompID) to the monitor at every hosting server that the communicating application components are running on.  Count the number of requests (including nesting requests) between application components and calculate the probability of their causal dependency.  Although requests appear to be nested by accident, if the same nesting relationship appears with a high probability, it is highly possible that the nesting represents a causal dependency of application components.

12 ©NEC Laboratories America 12 Dealing with persistent connections and connectionless communications  Traffic Regulation based Component Dependency Discovery (TRCDD)  Divert socket based traffic regulation. Under investigation. A->B B->C

13 ©NEC Laboratories America 13 Evaluation: SRAMD overhead (1)  Experiment setup Receiver thrulayd Sender thrulay SRAM Controller UDP Packets over giga ethernet Intel 2.8GHz SMP SRAM Monitor

14 ©NEC Laboratories America 14 Evaluation: SRAMD overhead (2)  CPU overhead of the SRAMD monitor with bulk UDP traffic using different packet sending rates and packet sizes

15 ©NEC Laboratories America 15  CPU overhead of packet-application matching and sniffing  data rate 100Mb/s and packet size 1500 Bytes. Evaluation: SRAMD overhead (3) Association Probability 1/250k0.010.020.030.04 CPU Overhead %4.225.536.157.127.86

16 ©NEC Laboratories America 16 Evaluation: LDD algorithm (1)  Experiment setup Clients ( httperf ) Web Server ( tinyproxy ) Web Server ( tinyproxy ) Database Server ( derby ) Database Server ( derby ) Application Server I 1 Application Server I 2 W1W1 W2W2 A1A1 A2A2 D1D1 D2D2 C Application Server I 1 Application Server I 2 Logic view physical view

17 ©NEC Laboratories America 17 Evaluation: LDD algorithm (2)  Causal probability as observed on application server A1 with different number of concurrent clients

18 ©NEC Laboratories America 18 Conclusions and Future Work  An unobtrusive application-level monitoring and diagnosis tool that does not make any assumptions about the traced applications.  Two schemes to infer dependency relationships of application components in different scenarios.  An initial assessment of the quality and overhead of application-level packet tracing and an evaluation of the statistical dependency discovery scheme.  Possible extensions  A kernel module to obtain per-application disk read / write statistics  Application of data mining techniques to packet traces

19 ©NEC Laboratories America 19 Thanks!  Questions?

20 ©NEC Laboratories America 20 Backup slides

21 ©NEC Laboratories America 21 Calculate Response Time from Traces


Download ppt "©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang."

Similar presentations


Ads by Google