Download presentation
Presentation is loading. Please wait.
Published byDominick Walters Modified over 9 years ago
1
© 2008 AT&T Intellectual Property. All rights reserved. AT&T and the AT&T logo are trademarks of AT&T Intellectual Property. System Tomography Gradient-based models of multitier systems Kaustubh Joshi (AT&T Labs Research) Collaborators : Shuyi Chen (University of Illinois) Matti Hiltunen (AT&T Labs Research) Rick Schlichting (AT&T Labs Research) William Sanders (University of Illinois)
2
Page 2 System Tomography???? Performability not just for safety-critical systems Enterprise systems: ERM, CRM, web services, e-commerce Consumers: banking, shopping, portals, communications Predicting impact of topology (application–environment) Predicting impact of growth (application–workload) Predicting impact of outsourcing (application–application) Queuing Models. Petri Nets. But … Organic growth and constant evolution Organizational challenges Cost and time to market Complex hidden factors (firewalls, load balancers, caches, proxies, content accelerators, network file systems)
3
Page 3 System Tomography!!!! Use online measurements Minimal application instrumentation No isolated profiling environment Inject small runtime perturbations Construct simple predictive models, e.g. gradients Simple: linear, limited domain Cheap enough to allow online retraining Automatically generate models with predictive capability And do so for black-box applications
4
Page 4 Gradients: Definition and Use Defined for end-to-end “metrics” m and “knobs” k as The partial derivative of the end-to-end metric m for transaction t with respect to the knob vector k Use to predict end-to-end metric in a new configuration As a function of change In the knob Simple linear model
5
Gradient Measurement Page 5
6
Page 6 Gradient Measurement Inject perturbations, measure effect on response time Noise is a problem in production environments But, noise is often not periodic Frequency domain analysis using Fourier Transforms 95% Conf. Int.: 3.17msec Need change of 63.4msec For up to 10% error 95% Conf. Int.: 0.07msec Need change of 1.4msec For up to 10% error More than order of magnitude sensitivity improvement
7
Gradient Measurement Page 7
8
Page 8 Measurement Tool Calculate Gradient Central Coordinator Logs MySQL Daemon TomcatA Daemon TomcatB Daemon Apache Daemon 1 4 3 2
9
Page 9 Link Gradients Definition Rate of change of end-to-end response with respect to change in network link latency Perturbation Mechanisms Inject packet delay using Linux ipqueue, TUN/TAP In-network ARP/Route redirection Applications Determining impact of deployment decisions Application CDNs Estimating impact of network changes on applications Optimizing placement of shared components Runtime server migration depending on workload-mix, user-load
10
Page 10 And the predictions match …
11
Frequency Gradients Definition Rate of change of end-to-end response time with respect to change in CPU frequency of servers Perturbation Mechanisms Use DVFS. Change processor p-states. Uses Energy conservation Performance aware CPU scaling Machine upgrades Problems Nonlinearity much more severe Page 11
12
Nonlinearity via basis functions Recast the gradient using nonlinear “basis functions” The response time is linear in terms of the basis functions – i.e., instead of Nonlinearity is primarily due to queuing effects – M/G/1 PS queue Can use other basis functions Page 12
13
Prediction Accuracy Page 13
14
VM Gradients Definition Rate at which end-to-end response time changes wrt fraction of CPU allocated to individual node VMs Perturbation Mechanisms Xen hypervisor scheduler parameters: cap VM CPU usage Uses Cloud: resource sharing using statistical muxing Performance aware server consolidation Impact of adding/removing servers Page 14
15
Linearity with respect to Basis Function Page 15
16
Page 16 In conclusion We have a tool for gradient computation – Works for link, frequency gradients – VM, Capacity gradient validation ongoing Future Directions – Additional gradients – bandwidth, loss (VOIP) – Use models with policy generation framework to generate black- box application management capability
17
Page 17 Extra Slides
18
Future Capacity Gradients When basis functions aren’t enough Unpredictable nonlinearity Problem When and what resource of a system will first become a bottleneck ? i.e., Compute gradients at future workloads Gradient Rate at which throughput changes with respect to change in resource capacity at a different (higher) workload Applications Planning for upgrades Detecting current bottlenecks Page 18
19
Amp Modulation: changing operating point Page 19
20
Workload Spikes Page 20 Buffer requests to preserve mean request rate Produce short (few milliseconds) workload spikes
21
Page 21 Delay Injection Currently host-based – Using iptables to construct a redirecting “firewall” – Using a virtual network tun/tap device Completely in-network injection possible – Using ARP poisoning-based redirection – In router rules
22
Page 22 Link Gradients DB Incoming Transactions Srv Site 3 DB Srv Site 2 WS Srv Site 1 Browse, buy, sell, search Upgrade network link Move server to another site
23
Page 23 Why CPU Gradient? Energy consumption of IT activity increasingly serious issue Server farms In 2006, enterprise data centers accounted for 1.5% of total US electricity consumption (61 billion kWh) And it’s growing … 60% is consumed by low cost, commodity “volume servers” Multi-tier services are a major tenant CPU frequency scaling can save power But, applications have responsiveness SLAs Scaling at different nodes can affect system differently Scaling, response time, energy saving relationship complex Oops!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.