Download presentation
Presentation is loading. Please wait.
Published byMaria Blake Modified over 8 years ago
1
Coordinated Performance and Power Management Yefu Wang
2
2 ECE692 2009 Power/Performance Problems in Datacenters Power related problems – Power/thermal control (Capping) – Power optimization Performance related problems – Performance control – Performance optimization Problem Scale – Datacenter level – Cluster level – Server level – Application level
3
Co-Con: Coordinated Control of Power and Application Performance for Virtualized Server Clusters Xiaorui Wang and Yefu Wang Department of EECS University of Tennessee, Knoxville
4
4 ECE692 2009 Power and Performance Control Most prior work on power/performance control: control one and optimize the other – Power control: Power capping to avoid power overload or thermal failures due to increasing high server density. – Performance control: provide guarantees for Service-Level Agreements Performance-oriented [Chase’01], [Chen’05], [Elnozahy’02], [Sharma’03], [Wang’08], etc. Power-oriented [Minerick'02], [Lefurgy'08], [Wang'08], [Juang'05],etc. Performance- Oriented Controller Performance measurement Performance target Control Decision (Minimize Power) Power- Oriented Controller Power measurement Power target Control Decision: (Maximize performance) May violate power constraint Performance is not guaranteed
5
5 ECE692 2009 Coordinated Control of Power and Performance … Power Controller [HPCA’08] Power Budget Performance Requirements VM1VM2VM3VM4 Performance Monitors Performance Controllers Performance Monitors CPU allocation Cluster-level CPU Resource Coordinator
6
6 ECE692 2009 Response Time Controller VM CPU allocation Response time Response Time Controller Response time set point PID (Proportional-Integral-Differential) controller System modeling Controller design Controller analysis 750ms 700ms Error: 50ms Increase 2.4% Workload variation Frequency variation 700ms
7
7 ECE692 2009 Response Time Model PID controller System modeling Controller design Controller analysis Response time model System identification – Model orders – Parameters 127.9071.7068.08 105.5971.6271.09 99.3271.0967.99 Model Orders and Error
8
8 ECE692 2009 System Identification in Practice Operational point – Linearize the systme model locallly White noise – Generating a white noise Least squares method – Given find which makes the model best fits the measured data open T, "white_noise.log"; while( ){ chomp; $rand = int(40 + 10 * $p ); $cpu = 180 -40 -40 -$rand; allocate $cpu; $t=get_response_time; log $cpu, $t; sleep $step; }
9
9 ECE692 2009 Controller Design PID controller – Proportional – Integral – Differential Design: Pole placement PID controller System modeling Controller design Controller analysis CPU allocation Response time VM Response time set point Error
10
10 ECE692 2009 Coordination Coordination of the two control loops PID controller System modeling Controller design Controller analysis 1GHz3GHz Power control loop works CPU frequency changes Response time model changes Response time control loop still works? Stability range: Settling time < 24s The control period of the power control loop is selected to be longer than the settling time of the response time control loop.
11
11 ECE692 2009 System Implementation Servers – 2 Intel servers – 2 AMD servers – Storage server (NFS) VMs – 512Mb RAM, 10Gb storage via NFS, 2 VCPUs – Xen 3.1 with Credit scheduler – CPU allocation: cap in credit scheduler Workload: – PHP + Apache benchmark Server2 Server1 Server4 Storage (NFS)
12
12 ECE692 2009 Response Time Control Workload increase on VM2 Response time of VM2 is controlled to 700ms by increasing its CPU resource allocation. 700ms
13
13 ECE692 2009 Response Time Control Change CPU frequency Change CPU frequency Change workload Set point: 700ms Standard deviation: 51 Set point: 700ms Standard deviation: 57
14
14 ECE692 2009 Coordination: Power Budget Reduction Compare with baseline: Power control only Co-Con Baseline Power and response time guarantee Power control only: Violation of performance requirements [Minerick'02], [Lefurgy'08], [Wang'08], [Juang'05],etc. Performance control only : Power budget violation Undesired server shutdown Performance control only : Power budget violation Undesired server shutdown
15
15 ECE692 2009 Conclusion Co-Con: Coordinated control of power and application performance – Simultaneous control of power and performance Cluster-level power budget guarantee for server racks Application-level performance guarantee – Effective control despite workload/ CPU frequency variations
16
No “Power” Struggles: Coordinated Multi-level Power Management for the Data Center Ramya Raghavendra*, Parthasarathy Ranganathan†, Vanish Talwar†, Zhikui Wang†, Xiaoyun Zhu† *University of California, Santa Barbara †HP Labs, Palo Alto
17
17 ECE692 2009 Average power Peak thermal power Peak electrical power CPU Server Enclosure Rack X X X X X X X OS-wlm OS-gwlm SIM Vmotion VM-res.all LSF X The Problem VM heterogeneity Local optima global optima performance X X X X X X X X X X CHAOS!! (“Power” Struggle) X X X X
18
18 ECE692 2009 Research Questions Co-ordination Design – How to ensure correctness, stability, efficiency? – How to make local decisions with incomplete global info? – How to build in support for dynamism? Implications of Co-ordination – Can we simplify or consolidate controllers? – Do we revisit policies and mechanisms of the controllers? – How sensitive is the design to apps and systems considered?
19
19 ECE692 2009 A “Representative” Subset of Problems Overlap in objective functions Overlap in actuators Different time constants Different problem formulations
20
20 ECE692 2009 Solution in This Paper First unified architecture for data center power management – Interfaces and information exchange between loops – Leverages feedback control theory – Evaluation on real-world traces: significant savings Insights on design trade-offs – Architectural alternatives for various objective functions – Implementation alternatives (time constants and hw/sw) – Mechanisms (p-states, VMs) & policies (pre-emptive, fair-share, …)
21
21 ECE692 2009 System Models Power model: Performance model:
22
22 ECE692 2009 Unified and Extensible Architecture
23
23 ECE692 2009 Coordination SM:Expose API to EM and GM to change power budget EC:Expose API to SM to change r_ref EM: Expose API to GM to change power budget VMC: Use "real utilization"; use power budgets as constraints
24
24 ECE692 2009 Implementation Not implemented in hardware testbed – Requires many servers – Requires DVFS support – Each controller must be individually configured – Requires real world applications Simulation – Trace-driven simulation – Power / performance models from real hardware
25
25 ECE692 2009 Results : Benefits from coordination: Compared by a baseline without control
26
26 ECE692 2009 VM Migration vs. Local Power Control Coordinated solution provides the most power savings
27
27 ECE692 2009 Guaranteeing Stability (1) This paper provides stability guarantee for EC and SM – Server-level performance and power control Stability of EC – Assumptions CPU frequency is continues Frequency demand of workloads is a constant CPU utilization is defined as – Control law – Stability proof Since, this paper proves
28
28 ECE692 2009 Guaranteeing Stability (2) Stability of SM – Assumptions The settling time of EC is shorter than the control period of SM Power consumption can be modeled as – Controller – Close loop system – System is stable
29
29 ECE692 2009 Conclusions Coordination architecture for five individual solutions Simulations based on close to 200 server traces from realworld enterprise deployments Compared with non-coordinated solution – Less constraint violations – More power efficient
30
30 ECE692 2009 Critiques to Co-Con Average response time is not an ideal performance metric – Can be extended to 90-percentile response time The response time monitor is not perfectly implemented Only CPU resource is considered – Extension to IO, network, etc. Evaluation is based on simple workloads – A simple PHP script – Single tier – No IO/database operations
31
31 ECE692 2009 Critiques to No “Power” Struggles Controllers are highly coupled Performance model is over simplified Coordination between VMc and EC is over simplified – How can CPU be allocated to VMs? – How will DVFS affect the performance of multiple VMs? – How about hetorogenous servers? Lack of implementation in real hardware
32
32 ECE692 2009 Comparison of Two Papers Co-ConNo “Power” Struggles Performance metricResponse timePercentage of work done Number of levels35 CoordinationTwo control loops are designed independently with coordination analysis Control loops are coupled with APIs EvaluationTestbedSimulation Power aware VM consolidation NoYes Stability proofTime domain + z-domainTime domain
33
33 ECE692 2009 Q&A Acknowledgments: Some slides are adapted based on the slides of Vanish Talwa
34
Backup Slides
35
Cluster-level CPU Resource Coordinator
36
Response Times and CPU Allocation of the VMs Under Different CPU Frequencies
37
Response Times and CPU Allocation of the VMs Under Different Workloads
38
38 ECE692 2009 VMC in No “Power” Struggles
39
39 ECE692 2009 Controllers in No “Power” Struggles
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.