Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech.

Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech

Source: Gartner (December 2005) Complexity, complexity, complexity…

Reasons for Complexity Application diversity Interdependencies Heterogeneous components –Too many different technologies and platform Too little “hints” from the system to the administrators –Legacy issues; Application-specific solutions Insufficient information about the system to drive self-management  Lack of Automation

Online System Management ControlExecute MonitorAnalyze Workload Scheduling Capacity and SLA management Design evaluation and tuning Bottleneck detection Resource provisioning, accounting, etc. Proposed Approach: Service Path

Service Path Front - end Web Servers Middle-tier Servlet Server Application Logic (EJBs, etc.) Data Base Back - end I n t e r n e t Proxy Server System abstractions that describe the dynamic dependencies between the different distributed application components Service Class: Application-level request class, e.g. SLA class

Service Path Characteristics End-to-End analysis Online Non-intrusive Application-generic

Outline Background Motivation Service path –Discovery with E2EProf –Refinement with SysProf –Automated SLA Enforcement Related Work Future Plans

E2EProf time (A  B) (B  C) time D1D1 D2D2 Black-box approach Correlate per-edge time series signals Monitor network packet traces ( source, destination, timestamps ) Model traces as per-edge time series signals or density functions A X B C D

Basic Approach Delay at B Compute cross-correlation (D 1 D 2 ) A X B C D (A  B) (B  C) (A  B) (B  D) Spike  Causality Spike’s position  Delay No spike

Evaluation with 4-tier RUBiS 1 Tomcat Server 1 Tomcat Server 2 MySQL Server Apache Web Server 1 http://rubis.objectweb.org/ Clients comment bidding CPU bound I/O bound EJB Server 2 EJB Server 1

Service Path Detection in RUBiS Highest delay node Highest delay nodes Static server assignment Round-robin load balancer

Change detection in RUBiS Injected Delay

Revenue Pipeline Total Traffic: 1.34 million / day (56k / hour) Delta Air Lines’ Application TACS IN & TACS OUT XIN & XOUT APEX IN & APEX OUT Error/Warning (Tivoli) Logs

Time of the day Latency (sec) Delta Air Lines’ Application TACS S1S1 S8S8 S7S7 S3S3 S2S2 Client requests TACS Huge request burst

Outline Background Motivation Service path –Discovery with E2EProf –Refinement with SysProf –Automated SLA Enforcement Related Work Future Plans

Beyond dependency and latency… C1 C2 S1 S3 S2 S5 S6 S4 Solution: Zoom into the servicepath with SysProf No application hints or instrumentation Monitor resource usage on per-class basis

SysProf Methodology eth driver BDD Network Stack System Call FS/ VM/ etc. A1A1 A2A2 ANAN Scheduler User Kernel Scheduler Instrumentation points From client To client Init CID Context Switches Net softirq system call parameters, PID, App functions Disk I/O Track request context –Work done for processing a request class –May span user-level or kernel-level –Executes in more than one contexts (e.g. processes, threads, softirqs) –Happens in a system-visible event (e.g. system calls)

Class ID Propagation Init CID Process  CID From client To client Msg  CID Packet  CID Inherits CID Front-Tier Middle-TierEnd-Tier User Kernel

Application of SysProf Resource Accounting Utility Billing Bottleneck detection Capacity Estimation Root-Cause Analysis Black-Box SLA management

Resource-Aware Adaptive Control Tomcat Server 1 Tomcat Server 2 MySQL Server EJB Server 2 EJB Server 1 Class 1 Class 2 Class 3 Cluster workloads contending for same resources Separate Queue/Controller for each cluster Front-end Controller + Scheduler

Resource-Aware Adaptive Control With SysProf Capacity = 80 req/s per server No SysProf

Summary Service Path –System abstractions to represent dependencies and request path E2EProf and Pathmap –Dependency and latency analysis SysProf –Service-based resource analysis Aid human operator and automate end-to-end performance management

Thank You! Questions? Email: sandip@cc.gatech.edu

Extra Slides

Pathmap Optimizations time Packet timestamp trace Time-series signal Or Density Function Cross-correlation series Bursty traffic Sliding window (W) Run-length compression Upper-bound On latency W

Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech.

Similar presentations

Presentation on theme: "Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech.

Similar presentations

Presentation on theme: "Systems Support for End-to-End Performance Management Sandip Agarwala PhD Advisor: Karsten Schwan College of Computing Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback