Presentation is loading. Please wait.

Presentation is loading. Please wait.

Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech.

Similar presentations


Presentation on theme: "Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech."— Presentation transcript:

1 Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech

2 Source: Gartner (December 2005) Complexity, complexity, complexity…

3 Reasons for Complexity Application diversity Interdependencies Heterogeneous components –Too many different technologies and platform Too little “hints” from the system to the administrators –Legacy issues; Application-specific solutions Insufficient information about the system to drive self-management  Lack of Automation

4 Online System Management ControlExecute MonitorAnalyze Workload Scheduling Capacity and SLA management Design evaluation and tuning Bottleneck detection Resource provisioning, accounting, etc. Proposed Approach: Service Path

5 Service Path Front - end Web Servers Middle-tier Servlet Server Application Logic (EJBs, etc.) Data Base Back - end I n t e r n e t Proxy Server System abstractions that describe the dynamic dependencies between the different distributed application components Service Class: Application-level request class, e.g. SLA class

6 Service Path Characteristics End-to-End analysis Online Non-intrusive Application-generic

7 Outline Background Motivation Service path –Discovery with E2EProf –Refinement with SysProf –Automated SLA Enforcement Related Work Future Plans

8 E2EProf time (A  B) (B  C) time D1D1 D2D2 Black-box approach Correlate per-edge time series signals Monitor network packet traces ( source, destination, timestamps ) Model traces as per-edge time series signals or density functions A X B C D

9 Basic Approach Delay at B Compute cross-correlation (D 1 D 2 ) A X B C D (A  B) (B  C) (A  B) (B  D) Spike  Causality Spike’s position  Delay No spike

10 Evaluation with 4-tier RUBiS 1 Tomcat Server 1 Tomcat Server 2 MySQL Server Apache Web Server 1 http://rubis.objectweb.org/ Clients comment bidding CPU bound I/O bound EJB Server 2 EJB Server 1

11 Service Path Detection in RUBiS Highest delay node Highest delay nodes Static server assignment Round-robin load balancer

12 Change detection in RUBiS Injected Delay

13 Revenue Pipeline Total Traffic: 1.34 million / day (56k / hour) Delta Air Lines’ Application TACS IN & TACS OUT XIN & XOUT APEX IN & APEX OUT Error/Warning (Tivoli) Logs

14 Time of the day Latency (sec) Delta Air Lines’ Application TACS S1S1 S8S8 S7S7 S3S3 S2S2 Client requests TACS Huge request burst

15 Outline Background Motivation Service path –Discovery with E2EProf –Refinement with SysProf –Automated SLA Enforcement Related Work Future Plans

16 Beyond dependency and latency… C1 C2 S1 S3 S2 S5 S6 S4 Solution: Zoom into the servicepath with SysProf No application hints or instrumentation Monitor resource usage on per-class basis

17 SysProf Methodology eth driver BDD Network Stack System Call FS/ VM/ etc. A1A1 A2A2 ANAN Scheduler User Kernel Scheduler Instrumentation points From client To client Init CID Context Switches Net softirq system call parameters, PID, App functions Disk I/O Track request context –Work done for processing a request class –May span user-level or kernel-level –Executes in more than one contexts (e.g. processes, threads, softirqs) –Happens in a system-visible event (e.g. system calls)

18 Class ID Propagation Init CID Process  CID From client To client Msg  CID Packet  CID Inherits CID Front-Tier Middle-TierEnd-Tier User Kernel

19 Application of SysProf Resource Accounting Utility Billing Bottleneck detection Capacity Estimation Root-Cause Analysis Black-Box SLA management

20 Resource-Aware Adaptive Control Tomcat Server 1 Tomcat Server 2 MySQL Server EJB Server 2 EJB Server 1 Class 1 Class 2 Class 3 Cluster workloads contending for same resources Separate Queue/Controller for each cluster Front-end Controller + Scheduler

21 Resource-Aware Adaptive Control With SysProf Capacity = 80 req/s per server No SysProf

22 Summary Service Path –System abstractions to represent dependencies and request path E2EProf and Pathmap –Dependency and latency analysis SysProf –Service-based resource analysis Aid human operator and automate end-to-end performance management

23 Thank You! Questions? Email: sandip@cc.gatech.edu

24 Extra Slides

25 Pathmap Optimizations time Packet timestamp trace Time-series signal Or Density Function Cross-correlation series Bursty traffic Sliding window (W) Run-length compression Upper-bound On latency W


Download ppt "Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech."

Similar presentations


Ads by Google