1 Server-level Power Control Ming Chen. 2 Motivations(1) Clusters of hundreds, even thousands of servers; Occupy one room of a building or even a whole.

Slides:



Advertisements
Similar presentations
Feedback Control Real- time Scheduling James Yang, Hehe Li, Xinguang Sheng CIS 642, Spring 2001 Professor Insup Lee.
Advertisements

Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.
Performance, Energy and Thermal Considerations of SMT and CMP architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Power Aware Virtual Machine Placement Yefu Wang. 2 ECE Introduction Data centers are underutilized – Prepared for extreme workloads – Commonly.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
Control System for Energy Efficient Data Centers Ozlem Bilgir.
Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.
AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.
Yefu Wang and Kai Ma. Project Goals and Assumptions Control power consumption of multi-core CPU by CPU frequency scaling Assumptions: Each core can be.
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.
Towards Eco-friendly Database Management Systems W. Lang, J. M. Patel (U Wisconsin), CIDR 2009 Shimin Chen Big Data Reading Group.
Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power.
Adaptive Control of Virtualized Resources in Utility Computing Environments HP Labs: Xiaoyun Zhu, Mustafa Uysal, Zhikui Wang, Sharad Singhal University.
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Power Issues in On-chip Interconnection Networks Mojtaba Amiri Nov. 5, 2009.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
University of Virginia Proportional Control Spring 2015 Jack Stankovic University of Virginia.
Low-Power Wireless Sensor Networks
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma.
1 ECE692 Topic Presentation Power/thermal-Aware Utilization Control Xing Fu 22 September 2009.
An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Managing the Performance Impact of Administrative Utilities Paper by S. Parekh,K. Rose, J.Hellerstein, S. Lightstone, M.Huras, and V. Chang Presentation.
Computer Science Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs Min Yeol Lim Computer Science Department Sep.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.
Embedded System Lab. 김해천 The TURBO Diaries: Application-controlled Frequency Scaling Explained.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Critical Power Slope: Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi †,Charles Lefurgy ‡, Eric Van Hensbergen ‡, Ram Rajamony ‡,
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Present by Sheng Cai Coordinating Power Control and Performance Management for Virtualized Server Clusters.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Accounting for Load Variation in Energy-Efficient Data Centers
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
ECE692 Course Project Proposal Cache-aware power management for multi-core real-time systems Xing Fu Khairul Kabir 16 September 2009.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.
Coordinated Performance and Power Management Yefu Wang.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Overview Motivation (Kevin) Thermal issues (Kevin)
Measurement-based Design
SECTIONS 1-7 By Astha Chawla
Department of Computer Science University of California, Santa Barbara
Feedback Control Real-time Scheduling
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

1 Server-level Power Control Ming Chen

2 Motivations(1) Clusters of hundreds, even thousands of servers; Occupy one room of a building or even a whole building; Servers racked in cabinets with high density; Cabinets are ordered in rows and columns to occupy a whole room.

3 Motivations(2) From Spring 2005, Data Center User’s Group Conference, The adaptive Data Center: Managing Dynamic Technologies Power and energy consumption have become key concerns in data centers; Solutions: − Peak power management to decrease the cost of cooling systems and power delivery systems; − Power efficient design to improve performance/watts;

4 Outline Power management for CPU Server-level Power Control (paper 1) Formal Control Techniques for Power-Performance Management (paper 2) Comparison between the two papers

5 Why CPU Power Management? The most used actuator in power management; The majority of total power consumption of servers; − More than 60% of the total power consumption. Well-documented interfaces to adjust power scaling. − P-states; − T-states.

6 CPU Power Knob (1)—P-states f p DVFS PowerNOW, SpeedStep, Cool’n’Quiet

7 CPU Power Knob (2)—T-states Duty cycle p

8 Server-level Power Control Charles Lefurgy, Xiaorui Wang and Malcolm Ware IBM Research, Austin University of Tennessee, Knoxville

9 Goal: − Manage the peak power control to avoid unnecessary under-provisioned cooling systems and power delivery systems. Motivations Workload varies a lot very few but worst cases of power consumption Over-provision the cooling system and power delivery system

10 Control Options Open-loop − No measurement of power; − Choose fixed speed for a given power budget; − Based on most power hungry workload; Ad-hoc control − measure power and compare it with the power budget; − raise/lower one level of performance state based on the comparison;

11 Contributions The first paper that manages the peak power of a single server with a closed-loop control system; A feedback controller based on control theory; Detailed derivation and analysis of the stability and accuracy; Empirical results in a physical hardware system; Better application performance than previous methods

12 Platform IBM BladeCenter HS20 blade server with Intel Xeon processors; Power constraint: 250 W No overload of power supply for more than 1 second;

13 System Modeling(1) Power changes immediately as the performance state changes (within 1 ms) Curve fitting Which A to choose?

14 System Modeling(2)

15 Controller Design(1) First-order delta-sigma modulator: − Map a series of discrete throttling levels to the floating-point output of the controller; − For example: 6.2 is discretized as 6, 6, 6, 6, 7, 6, 6, 6, 6, 7; Controller: P controller; Plant:

16 Controller Design(2) Different workloads on the same server have different slope; The same workload on the different servers has different slopes. Slope variatio n Minimal prototype: Real model:

17 Performance Analysis Stability 0 < g < 2 Steady state error Settling time

18 System Architecture Power Monitor − A hardware which can measure the power at 1000 samples/second; − A firmware in the service processor average the power measurements; Controller − Compute the ideal throttling level Actuator − Map the discrete throttling levels to floating-point levels and write the CPU register to throttle the clock.

19 Comparison with Ad-hoc controller(1)

20 Set points are from 180W to 260W with 1W increment; P4MAX is used; The average of three runs is plotted; P controller has a precision of 0.1W; The safe margin of Ad- hoc controller is 6.1 W. Comparison with Ad-hoc controller(2)

21 Comparison of Three Controllers Open-loop set point – P4MAX without violation of power budget P controller set point – Reducing the power budget by 2% measurement error; Ad-hoc controller set point – 6.1W lower than P controller set point.

22 Application Performance

23 Conclusion A control-theoretic peak power management solution for servers is presented; Better control performance and application performance than two baselines; Stability, settling time and zero steady state error are analyzed based on control theory.

24 Critiques Peak power management Vs. performance/watts; Clock throttling + DVFS, what is the solution? A high precision hardware is required which is not available to everyone.

25 Formal Control Techniques for Power-Performance Management Qiang Wu, Philo Juang, Margaret Martonosi, Li- Shiuan Peh, Douglas W.Clark Princeton University

26 Background: MCD Each function block operates with an independent clock; Advantages: − less clock distribution − less clock skew; − less power consumption; − DVFS flexibility Use queue structures between domains for efficiency. INTexec FPexec Ld/Stexec Ifetch/Decode f1f1f1f1 f2f2f2f2 f3f3f3f3 f4f4f4f4

27 Basic Idea Adapt frequency to workload changes; capability > demand: wasted; capability < demand: degraded performance; Queue occupancy – clues about capability and demand; – a feedback signal to control the domain frequency. DVFS controller f q

28 queue q arrival rate frequency f 2 service rate  clock domain demand clock domain frequency f 1 System Modeling(1)

29 queue q arrival rate frequency f 2 service rate  clock domain demand clock domain frequency f 1 System Modeling(1)

30 System Modeling(2) queue q arrival rate frequency f 2 service rate  clock domain demand clock domain frequency f 1 λ t and μ t : independent and stationary random processes; Each control period T includes N sampling period Δ t; q’ k is the controlled variable.

31 System Linearization f is the manipulated variable, but it is nonlinear in the model; It is generally hard to design an effective controller for nonlinear system; Fortunately, the nonlinear part in this system can be separated.

32 Controller Design PI controller – Proportional gain (K_p) – Integral gain (K_i)

33 How aggressively to save energy? – Or preserve performance? A simple lever – q ref position – Increase q ref – more aggressive in saving energy – Decrease q ref – value performance more Software/hardware cooperation – Software – make overall tradeoff decisions – Hardware – implement details of speed adaptation Energy-Performance Tradeoff

34 Experiments(1)– Illustrative Exp Benchmark Epic_Decode: frequency settings queueentries

35 Simulator: SimpleScalar + Wattch power estimation extension + MCD processor extension Benchmarks: 18 benchmarks Experiments Results

36 Extension for CMPs (1) Using task queues; Dependency among parallel application threads; – Parallel sections require all threads to finish before moving on. Two valid assumptions: − The tile with the highest queue occupancy is on the critical path. − The tile on the critical path should run in full speed. What is the solution?

37 Extension for CMPs (2) – Dist_PID q ref the performance lever Each tile estimates q target ; The tiles exchanges their q target ; The tile with the highest q target is identified as the critical path; Other tiles set their q ref as the highest q target.

38 Experiment for Dist_PID Simulator: modified Xtrem (a validated SimpleScalar ARM simulator); Dist_PID has lower EDP than Local_PID thus it has better performance.

39 Conclusion A control-based solution for power-performance tradeoffs of MCD processors and CMPs is presented; An analytical queue model between different MCDs is analyzed; Based on the PI controller for MCDs, a Dist_PID is introduced for CMPs; Simulation results are provided to verify the performance of the controllers.

40 Critiques Effects of λ on the stability or the accuracy of the controller? Simulation results are not convincing enough; Dist_PID only compares with Local_PID. How about other solutions for CMPs? Overhead or delay for exchanging q target in the dist_PID?

41 Comparison between the two papers Server-level Power ControlControl for Power-Performance Control targetCPU Control levelSystem-levelComponent-level ModelCurve-fittingAnalytical ControllerP controllerPI controller GoalsPeak power managementPower-performance tradeoffs ExperimentsPhysical testbedSimulations

42 Thanks!