Download presentation
Presentation is loading. Please wait.
Published byRalf Hampton Modified over 9 years ago
1
©NEC Laboratories America 1 Huadong Liu (U. of Tennessee) Hui Zhang, Rauf Izmailov, Guofei Jiang, Xiaoqiao Meng (NEC Labs America) Presented by: Hui Zhang Real-time Application Monitoring and Diagnosis for Service Hosting Platforms of Black Boxes
2
©NEC Laboratories America 2 outline Motivation SRAMD architecture Application component dependency discovery Evaluation Conclusions
3
©NEC Laboratories America 3 Motivation Service hosting systems Web farms, service-oriented utility computing networks, Peer-to-Peer service composition based computing grids, … Service management Fault diagnosis, capacity planning, performance analysis, impact analysis, etc. Challenges Application components are usually delivered as black-boxes w/o sufficient instrumentation The huge amount of logging information in large-scale systems makes real-time monitoring and debugging unrealistic with a centralized approach App. 1 App. 3 App. 2 App. 4
4
©NEC Laboratories America 4 An intuition of the SRAMD Art Source: www.pictureMOSAICs.com
5
©NEC Laboratories America 5 Scalable Real-time Application Monitoring and Diagnosis SRAMD: an extensible tool that is easy to deploy scalable, and able to effectively profile the intricate dependency relationships among interacting application components seen as black boxes. Our approach uses low level packet traces instead of high level event traces to get insight into application components Has end-system instrumentation for close observation on the correlation between application performance and local resource utilization, and for enabling a rich set of queries for diagnosis understands the overall system/application behavior and performance by aggregating and correlating summarizations from distributed components
6
©NEC Laboratories America 6 SRAMD in Operation application X application Y application Z hosting server An application level passive resource monitor with active summarization An extensible framework for application topology discovery, capacity planning and performance debugging
7
©NEC Laboratories America 7 The SRAMD Controller Aggregator retrieves, validates information blocks available in the repository, and organizes them into per-application groups. Collector Aggregator Visualizer Diagnosis Diagnosis generates probing requests to related monitors with operator interaction to get detailed information about application components and to isolate possible bottlenecks for performance debugging. Visualizer constructs in-memory DOT files [DOT] using outputs from the aggregator and calls the Grappa [Grappa] to visualize application topologies enriched with component traffic statistics and causal probabilities. collector passively collects summarization data from distributed monitors through UDP.
8
©NEC Laboratories America 8 The SRAMD Controller snapshot
9
©NEC Laboratories America 9 The SRAMD Monitor Periodically probe for CPU, memory and disk usage of every registered application component. Passively capture network traffic and associate captured packets to registered application components, Actively calculate useful local application statistics and dependencies from packet traces Temporarily perform diagnosis tasks on-demand to assist performance diagnosis and debugging.
10
©NEC Laboratories America 10 Application component dependency discovery Given two application components A and B in the system, we want to discover the following real-time dependency relationships between A and B during a time interval: are the input requests of one components caused by another one (directly or indirectly)? and in what percentage if yes? ABCDE time line C B DE A C B DE A C B DE r1r1 r2r2 r3r3 A
11
©NEC Laboratories America 11 Dealing with transient connections Local Dependency Discovery (LDD) Find IDs of peer application components that local ones talked to in the last report interval. Every SRAMD monitor sends a list of (LocalPort, AppCompID) to the monitor at every hosting server that the communicating application components are running on. Count the number of requests (including nesting requests) between application components and calculate the probability of their causal dependency. Although requests appear to be nested by accident, if the same nesting relationship appears with a high probability, it is highly possible that the nesting represents a causal dependency of application components.
12
©NEC Laboratories America 12 Dealing with persistent connections and connectionless communications Traffic Regulation based Component Dependency Discovery (TRCDD) Divert socket based traffic regulation. Under investigation. A->B B->C
13
©NEC Laboratories America 13 Evaluation: SRAMD overhead (1) Experiment setup Receiver thrulayd Sender thrulay SRAM Controller UDP Packets over giga ethernet Intel 2.8GHz SMP SRAM Monitor
14
©NEC Laboratories America 14 Evaluation: SRAMD overhead (2) CPU overhead of the SRAMD monitor with bulk UDP traffic using different packet sending rates and packet sizes
15
©NEC Laboratories America 15 CPU overhead of packet-application matching and sniffing data rate 100Mb/s and packet size 1500 Bytes. Evaluation: SRAMD overhead (3) Association Probability 1/250k0.010.020.030.04 CPU Overhead %4.225.536.157.127.86
16
©NEC Laboratories America 16 Evaluation: LDD algorithm (1) Experiment setup Clients ( httperf ) Web Server ( tinyproxy ) Web Server ( tinyproxy ) Database Server ( derby ) Database Server ( derby ) Application Server I 1 Application Server I 2 W1W1 W2W2 A1A1 A2A2 D1D1 D2D2 C Application Server I 1 Application Server I 2 Logic view physical view
17
©NEC Laboratories America 17 Evaluation: LDD algorithm (2) Causal probability as observed on application server A1 with different number of concurrent clients
18
©NEC Laboratories America 18 Conclusions and Future Work An unobtrusive application-level monitoring and diagnosis tool that does not make any assumptions about the traced applications. Two schemes to infer dependency relationships of application components in different scenarios. An initial assessment of the quality and overhead of application-level packet tracing and an evaluation of the statistical dependency discovery scheme. Possible extensions A kernel module to obtain per-application disk read / write statistics Application of data mining techniques to packet traces
19
©NEC Laboratories America 19 Thanks! Questions?
20
©NEC Laboratories America 20 Backup slides
21
©NEC Laboratories America 21 Calculate Response Time from Traces
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.