Monitoring Latency Sensitive Enterprise Applications on the Cloud Shankar Narayanan Ashiwan Sivakumar
Enterprise Applications (EA) Stock Trader Benchmark Application 2 Data Base (DB) Business Service (BS)Front End (FE) Configuration Service (CS) Order Processing Service (OS)
EA as Services 3 FE Users FE BS OS DB Load Balancers Service Endpoints
EA Characteristics 4 Notice: Dynamic and distributed nature of cloud deployments. Reducing user observed latency is the goal – Monitor this ! EA propertyRelevant cloud characteristic ScalabilityDynamic deployment sizes Availabilitygeo-redundancy EconomicsPay-as-you-use ElasticityDecoupled services Low latencyDeploy closer to user groups UtilizationLoad balancing
Performance Variation: Time Series and CDF of DB Latency 5 - data snapshot worth 4 hours across both the days
Monitoring Framework – Design Goals Resilience: Less sensitive to cloud variability Scalability: Capable of scaling with component instances Portability: Easy to integrate with applications Flexibility: Multiple levels of measurement User level latency Component level isolation Efficiency: Fast and accurate measurements 6
7 Why is Monitoring Hard Dynamic environment – number of components change Distributed deployment - needs a collection framework Variable request path – different choice of components Existing monitoring tools Do not support service oriented architectures Too detailed Not scalable Remember: user observed latency is our goal Abstract away un-necessary details !
Measuring End-points – Existing Tools FE BSDB Users HTTP Request SOAP Response HTTP Response MySQL Replies 8 Aggregate !!
Measurement Model T i,i+1 C i + 1 C i + 2 C i T i-1,i T i,i+1 T i+1,i+2 T’ i+1,i+2 T i+1,i+2 T i,i+2 T’ i,i+2 T’’ i,i+2 T i,i+2 T’ i,i+1 T i,i+1 T i+1,i+2 T’’’ i,i+2 T i,i+2 T’’’’ i,i+2 T i,i+2 CL i = Component latency of i th component LL i,i+1 = Link latency across components i, i+1 N = No of components C i communicates with nj = No of calls made by C i to each of the j components 9
Notification Q Instrumented application component Log server (local) Raw log Storage (local) Global collector Instrumented application component Log server (local) Raw log Storage (local) Aggregated log Monitoring Framework Architecture 10
Outline 11 Monitoring tool – Collection framework – Instrumentation framework
The Collection Framework Each component writes to local storage Front-end sends “done” message to local queue Queues: decouple producer, consumer entities Storage: persistence, no limit on size Both: scalable, robust 12 Question: Why this a right model ? When in doubt, measure!
Alternative Model 13 All components write to queue Collection framework de-queues Forms a P2P network to collate the data
Experiments on Azure and EC2 Experiments evaluating performance of storage and queues. Real cloud deployments (Microsoft Azure, Amazon AWS) Extensive measurements from all data-centers US (East/West/North/South) Europe (West/Central) Asia (East/South East) 14
Performance of Storage and Queues 15 Microsoft AzureAmazon AWS Measurements made in all 12 datacenter regions (Azure and AWS) Experiment length (24 – 26 hours) Approx 100,000 requests to storage 16,000 requests to the queues Write Q Read Q Write Q Write Store
Outline 16 Monitoring tool – Collection framework – Instrumentation framework
17 Instrumentation Framework - Goals Minimize coding effort and intervention Measure latency at the granularity of user request Automate instrumentation as much as possible Generate minimal measurement parameters
Comparison of Existing Tools 18
Instrumentation Framework Instrumented Application Component Original Application Component Aspects Specification for the application end- points (X-trace: log events) Measurement metric specification (X-trace: meta-data) Log Format specifications 19
Experiment Set-up 20 Deployed two similar benchmark applications DayTrader - Amazon AWS StockTrader - Windows Azure (prior work) Deployed the collection framework on AWS and Azure. User sessions and request patterns from DaCapo benchmark suite. Instrumentation: Automated using aspects – DayTrader (AWS) Custom coded - DayTrader and StockTrader
Aggregation Benefit: DayTrader 21 User request type Storage writes without aggregation Storage writes with aggregation FEBSFEBS Login3511 Portfolio10 11 Update profile4511 Home2211 Buy1711 Sell1811 Account3311 Total User sessions : 20, 1 every 10 seconds Results shown for a random user from DaCapo 78% writes reduced in above case transactions benefits
Aggregation Benefit: MedRec Application Suite 22 ApplicationStorage writes without aggregation Storage writes with aggregation FEBSFEBS MedRec App4811 Physician App81511 Admin App2511 Storage writes reduced by at least 50% from FE, 80% from BS
Instrumentation Benefit 23 Category Code (# of files) Handcrafted Code (# of files) X-Trace with Aspect same15250 (88)15250 (92) modified593 (74)465 (70) added878 (0)166 (2) automatable0 (0)166 (2) FE component code : automatable using aspects with x-trace Cross component calls : x-trace object passed as parameter New lines of code reduced by ~80% SLOC reduced by ~20% Aspects can be automated
Future Work 24 Scaling the framework Application scale to Framework scale ratio Per Datacenter ? Per VM ? Varies per cloud provider ? Impact of these design decisions on the sensitivity of the framework
Conclusions 25 Architectural benefits: Generic across - application, # of components, access patterns Scalable – decoupled entities Aggregation benefits: N writes to storage becomes one write Log server offloads work from application Instrumentation benefits: Easy to integrate with application New lines of code reduced by ~80% SLOC reduced by ~20%
26 Q & A
Back up slides 27
Azure Blob Read and Write Latency Blob read-write at least msec 28
Azure Queue Read and Write Latency Queue read costly, write comparable to blob 29
30 SQL Azure Performance Issue Snapshot (6 Days)