Optimal Power Allocation in Server Farms

Slides:



Advertisements
Similar presentations
VARUN GUPTA Carnegie Mellon University 1 Partly based on joint work with: Anshul Gandhi Mor Harchol-Balter Mike Kozuch (CMU) (CMU) (Intel Research)
Advertisements

Anshul Gandhi (Carnegie Mellon University) Varun Gupta (CMU), Mor Harchol-Balter (CMU) Michael Kozuch (Intel, Pittsburgh)
When you have completed your study of this chapter, you will be able to C H A P T E R C H E C K L I S T Calculate and graph a budget line that shows the.
12 Consumer Choice and Demand
© 2013 Pearson. How much would you pay for a song?
11 PART 4 Consumer Choice and Demand A CLOSER LOOK AT DECISION MAKERS
Copyright © Cengage Learning. All rights reserved.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Chapter 6 Production. ©2005 Pearson Education, Inc. Chapter 62 Topics to be Discussed The Technology of Production Production with One Variable Input.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Chapter 3 Balancing Costs and Benefits McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc. All Rights Reserved.
6 CHAPTER Output and Costs © Pearson Education 2012 After studying this chapter you will be able to:  Distinguish between the short run and the long.
Stat 13, Tue 5/29/ Drawing the reg. line. 2. Making predictions. 3. Interpreting b and r. 4. RMS residual. 5. r Residual plots. Final exam.
Critical Power Slope: Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi †,Charles Lefurgy ‡, Eric Van Hensbergen ‡, Ram Rajamony ‡,
Adding and Subtracting Decimals © Math As A Second Language All Rights Reserved next #8 Taking the Fear out of Math 8.25 – 3.5.
Lecture by: Jacinto Fabiosa Fall 2005 Consumer Choice.
S PP 2 T OPIC 1 I SSUE 2: C HANGE IN PRICES AFFECT CONSUMER ’ S CHOICES T HE THEORY OF CONSUMER CHOICE Xiaozhen Chen Hai Tran.
CSE 591: Energy-Efficient Computing Lecture 3 SPEED: processor Anshul Gandhi 347, CS building
Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.
1 © 2015 Pearson Education, Inc. Consumer Decision Making In our study of consumers so far, we have looked at what they do, but not why they do what they.
Area Under the Curve We want to approximate the area between a curve (y=x2+1) and the x-axis from x=0 to x=7 We will use rectangles to do this. One way.
Friction Investigating static and kinetic friction of a body on different surfaces.
Production.
Consumer Behavior: Utility Maximization
OPERATING SYSTEMS CS 3502 Fall 2017
What is a Hidden Markov Model?
Consumers, Producers, and the Efficiency of markets
EGR 2201 Unit 6 Theorems: Thevenin’s, Norton’s, Maximum Power Transfer
Understanding Buffer Size Requirements in a Router
Chapter 6 Production.
C H A P T E R C H E C K L I S T When you have completed your study of this chapter, you will be able to Calculate and graph a budget line that shows.
§ 2.3 The First and Second Derivative Tests and Curve Sketching.
Statistics: The Z score and the normal distribution
How will execution time grow with SIZE?
EGR 2201 Unit 5 Linearity, Superposition, & Source Transformation
Supply & Demand Made Easy
Ching-Chi Lin Institute of Information Science, Academia Sinica
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
Server Allocation for Multiplayer Cloud Gaming
Data Mining Lecture 11.
Flavius Gruian < >
Localizing the Delaunay Triangulation and its Parallel Implementation
Hidden Markov Models Part 2: Algorithms
ID1050– Quantitative & Qualitative Reasoning
Datapaths For the rest of the semester, we’ll focus on computer architecture: how to assemble the combinational and sequential components we’ve studied.
For example: Does the function
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
Graphical Analysis of Motion
How can we find data in the cache?
EGR 2201 Unit 5 Linearity, Superposition, & Source Transformation
Graphs, Linear Equations, and Functions
CSE 591: Energy-Efficient Computing Lecture 18 SPEED: power
What LIMIT Means Given a function: f(x) = 3x – 5 Describe its parts.
The Basics of Physics with Calculus – Part II
Analysis of Algorithms
TECHNIQUES OF INTEGRATION
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
Grey Level Enhancement
Topic 4 Consumer Behavior.
Derivatives in Action Chapter 7.
Computer Graphics Seminar 2018 Stanislav Belogrivov
Analysis of Algorithms
EE384Y: Packet Switch Architectures II
Analysis of Algorithms
Lesson 66 – Improper Integrals
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Optimal Power Allocation in Server Farms ANSHUL GANDHI Carnegie Mellon Univ. Mor Harchol-Balter Carnegie Mellon Univ. Rajarshi Das IBM, T.J. Watson Charles Lefurgy IBM, Austin

U.S. Data Center Energy Consumption 120 billion kWh kWh (in billions)  50 billion kWh $ 8.4 billion Let’s start by looking at why power is important. The graph here illustrates the total energy consumption in the US datacenters per year. As we can see, the energy consumption has increased by more than a factor of 4 between the years 2000 and 2006, and is projected to go up by almost a factor of 10 by the year 2011. <CLICK> In terms of money, that would amount to almost 7.4 billion dollars. Now that’s a lot of money ! 12 billion kWh Source: EPA report to Congress on Server and Data Center Energy Efficiency ,2007

Get the best performance from the Goal Get the best performance from the power, P, that we have. Data Center Thus, we focus our attention on datacenters. A datacenter is made up of many racks of servers. <CLICK> Each rack looks something like this. We shall refer to a rack of servers as a server farm. Each server farm is limited to a fixed power consumption of P. Our goal is to get the best performance out of the rack, given the fixed power limit P. (cont.) P

Goal How to split P to minimize mean response time? Right answer can improve performance by up to 5X P1 P P2 In this talk, we will look at mean response time as our performance metric. Response time for a job is defined as the time from when a job comes into the system, till it departs. Now, a natural question that comes up is, how to split this power P among the servers in the server farm, say, P1, P2 and P3. <CLICK> Our constraint, therefore, is that P should be greater than or equal to the sum of P1, P2 and P3. <CLICK> As we will show, the right answer can improve server mean response time by up to a factor of 5. Note that we are assuming a fixed power limit P, and we want to minimize the mean response time. Thus, we don’t care whether the total power consumption is equal to P or less than P. When a data center is set up, it is typically provisioned for a maximum power consumption. Thus, the cooling requirements, the circuit breakers and the alternate power supplies are all built keeping this power limit in mind. So in this talk, we simply assume we have a fixed power limit P. P3 Constraint: P ≥ P1 + P2 + P3

Power Efficient Load Balancer Frequency = server speed Freq (GHz) Freq (GHz) Output Power (Watts) Power (Watts) Input P Speed scaling Workload Arrival rate Open vs. Closed Max speed Min speed . q1 P1 P q2 POWER EFFICIENT LOAD BALANCER So here is our server farm. What we will do is create <CLICK> a power efficient load balancer <CLICK> that will magically output the power distribution, P1, P2 and P3 and also the load distribution q1, q2 and q3. q1 here represents the fraction of incoming load sent to server 1. Similarly we have q2 and q3 for servers 2 and 3. <CLICK> As its input, the power efficient load balancer takes a long list of factors. The total power P. <CLICK> The speed scaling technology in a server. By speed scaling, we mean the mechanism by which a server can reduce its power consumption by running at a lower clock frequency. Depending on the technology used, the power to frequency mapping for a server can vary. In these graphs, we have power consumed on the x-axis and server frequency on the y-axis. When we say frequency, we mean the speed of the server. Also, we will assume that we have a homogenous server farm. Thus, all the servers will have the same speed scaling technology. Of course, the workload running at the server farm can affect the way in which you will split power among the servers. This is because <CLICK> the workload can change the power to frequency relationship even for a given speed scaling technology. <CLICK> There are also other important factors such as arrival rate of the workload. Whether we have an open loop or a closed loop workload configuration. By open loop, we mean a server farm in which the arrivals are external to the system. This is similar to a web server. By a closed loop, we mean a server farm where the number of jobs in the farm is fixed. Think of this as a ** <CLICK> There are various other factors such as the maximum server speed, or the minimum server speed etc. We will take all these factors into account, when building our power efficient load balancer. P2 P3 q3

Outline Experimental Setup Power  Speed Speed  Response time How power affects server speed for a single server Speed  Response time How response time of server farm depends on individual server speeds Here’s the outline for the rest of the talk. We’ll start by describing our experimental setup. <CLICK> Then we’ll move on to understanding how power affects the server speed for a single server. Obviously, as we allocate more power to a server, it will run faster. But exactly how fast will it run, depends on many factors. The power-to-speed relationship depends on the scaling technology and has many parameters. Also, when we say power, we mean system power, and not just the processor power. So it is not obvious how speed scaling affects the total power of the system. Thus, this understanding of the power-to-speed relationship is an important contribution of our work. Then we’ll look at how the performance depends on the individual server speeds of all the servers in the server farm. Finally, we’ll look at both theoretical and experimental results for optimal power allocation in server farms. Here, we’ll show you how to find the optimal power allocation for your server farm, as a function of all the inputs that we mentioned before. So let’s get started with our experimental setup. Optimal power allocation Theorems and Experiments

P Experimental Setup P1 P2 P3 Blade Intel Xeon 5000 series 3 GHz, quad core 4 GB RAM Scaling tech. DFS, DVFS, DVFS+DFS Workload CPU bound (LINPACK, DAXPY) Memory bound (STREAM) Other (WebBench, GZIP, BZIP2) IBM BladeCenter HS21 Rack with 7 blade servers P1 P As before, here’s our server farm setting. <CLICK> The rack of servers that we use is an IBM bladecenter HS21, with 7 blades. Each blade is an Intel Xeon 5000 series server with a 3 GHz quad core CPU and 4 GB of memory. Each blade is equipped with 3 different speed scaling technologies. We’ll talk about these in detail in the next few slides. Finally, we experiment with a bunch of workloads. These include CPU bound workloads such as Intel’s LINPACK and the DAXPY workload, memory bound workloads such as STREAM, and various other workloads. However, for this talk, we’ll only show results for CPU bound LINPACK and memory bound STREAM. POWER EFFICIENT LOAD BALANCER P2 P3

Outline Experimental Setup Power  Speed Speed  Response time How power affects server speed for a single server Speed  Response time How response time of server farm depends on individual server speeds Next, I’ll talk about how power affects server speed within a single server. Optimal power allocation Theorems and Experiments

Our Experimental Results How power affects server speed for a single server DFS: Dynamic Frequency Scaling Frequency (GHz) (server speed) DFS “linear” The first scaling technology we consider is DFS, Dynamic Frequency Scaling. Here, we lower the power consumption of the server, by reducing its clock frequency directly. As you can see from the graph, as power decreases, server frequency also decreases. The graph here is for CPU bound LINPACK. <CLICK> We analytically track this curve using a simple linear fit. Here s is the server speed or frequency, and P is the power coming into the server. The constants are as follows: Pmin is the lowest power in the graph, which is 180 watts. Smin is the lowest speed in the graph, which is around 1.2 GHz. Finally, alpha is the slope of the linear fit. Though it is widely believed in literature that power to frequency has a cubic relationship, we find that a linear fit works well for DFS. <CLICK> This is because we are considering system power, and not just the processor power, which has a cubic dependence on frequency theoretically. P = system power NOT processor power Power (Watts)

Our Experimental Results How power affects server speed for a single server DVFS DVFS +DFS Frequency (GHz) “LINPACK” CPU BOUND DFS Frequency (GHz) Frequency (GHz) Power (Watts) Power (Watts) Power (Watts) The next speed scaling technology we consider is DVFS, which is shown here in blue. In DVFS, we lower the server frequency and the voltage together. This leads to a greater savings in power than DFS. Finally, we consider a mixture of DVFS and DFS, shown here in black. Note that DVFS+DFS looks more like a cubic than DFS and DVFS. <CLICK> So far, all these curves only deal with the CPU bound LINPACK workload. Recall that we are looking at system power and not just the processor power. We’ll now see how these graphs change, when we use a memory bound workload, STREAM. OUR MEASUREMENTS <CLICK> If you notice, the curves for STREAM are mostly cubic, even for DFS and DVFS. This is because of the following: At extremely low server frequencies, the bottleneck for STREAM is the CPU. So, every watt of power added to the system at such low frequencies, goes into improving the CPU clock speed. After a point, the bottleneck for STREAM becomes the memory subsystem. So, every watt of power added to the system at high frequencies is used up by the memory subsystem, and the improvement in CPU frequency is minimal. DVFS DVFS +DFS Frequency (GHz) DFS “STREAM” MEM BOUND Frequency (GHz) Frequency (GHz) Power (Watts) Power (Watts) Power (Watts)

Outline Experimental Setup Power  Speed Speed  Response time How power affects server speed for a single server Speed  Response time How response time of server farm depends on individual server speeds Now that we have some understanding of how power relates to server speed within a single server, lets look at how the individual server speeds affect the response times of jobs in the server farm. This is non-trivial. To see how non-trivial, we have a pop quiz. Optimal power allocation Theorems and Experiments

Pop Quiz DVFS Results DFS Results High arrival rate Given P = 720W and DVFS. Which allocation is better? 180|180|180|180 240| 240|240|0 Response Time (sec) 180 x 4 240 x 3 DVFS Results PowMin PowMax 2. Given P = 720W and DFS. Which allocation is better? 180|180|180|180 240| 240|240|0 PowMin PowMax Response Time (sec) 180 x 4 240 x 3 DFS Results Before we do so, let’s try and answer some questions about power allocation. <CLICK> First, assume you have a total power of 720 watts, and your servers are all using DVFS, which is the blue line in the graph. Which power allocation do you think would be better for minimizing mean response times? Your first choice is having 4 slow servers at 180watts each. We’ll call this choice as PowMin. Or, would you rather have 3 fast servers at 240watts each, which we’ll call PowMax. How about the same question <CLICK>, except this time your servers are all using DFS, the red line in the graph. PowMax is ten times better than PowMin !! If you look at the graph on the bottom left, for DFS, you’ll see that PowMin corresponds to a very low server frequency of around 1 GHz. So, 4 servers would mean a total of 4 GHz of speed. Whereas for DVFS, PowMin corresponds to almost 2.5 GHz, which is quite good. So, 4 servers would mean a total of 10 GHz of speed. So you might think this is the reason behind the results. Well, all these results were for high arrival rates. What happens when we look at low arrival rates ? (cont.)

Pop Quiz DVFS Results DFS Results Low arrival rate Given P = 720W and DVFS. Which allocation is better? 180|180|180|180 240| 240|240|0 Response Time (sec) 180 x 4 240 x 3 DVFS Results PowMin PowMax 2. Given P = 720W and DFS. Which allocation is better? 180|180|180|180 240| 240|240|0 PowMin PowMax Response Time (sec) 180 x 4 240 x 3 DFS Results How about the case where we use DVFS, the blue line? <CLICK> Surprisingly, PowMin is now worse than PowMax. So we see a reversal of results for DVFS. How about DFS? Well, the results are the same as before for DFS !! If you found the results of the pop quiz surprising, we’ll now explain them and show you how a simple theoretical model allows us to predict all these results.

Abstract Model of Server Farm Each server: Processor Sharing q1 s1 P1 P q2 POWER EFFICIENT LOAD BALANCER s2 To understand the effects of server speeds and arrival rate on the mean response time of the system, we need to build an abstract model of our farm. Here’s our familiar server farm, with power P coming into the system. <CLICK> The load balancer splits this power into individual server powers. Using queueing theory, we model each server as a queueing system. Each server does processor sharing among its jobs. This means that if we have n jobs at a server, they each receive one nth of the server speed. This is similar to running jobs on a UNIX time sharing machine. Now the power at each server corresponds to some server speed. We denote these speeds by s1, s2 and s3. Of course, we have some workload coming into the server farm. We model the arrival stream as a Poisson arrival process, with some rate of lambda jobs per second. Our load balancer will also output <CLICK> the fraction of load going into each server. P2 Poisson arrivals With rate λ jobs/sec s3 P3 q3

Response Time for Server Farm (Mean Resp. Time) Using queueing theory, we can show that the mean response time of this system, is as follows. <CLICK> q1 divided by s1 minus lambda times q1, plus q2 divided by s2 minus lambda times q2 and so on. Observe that the mean response time is <CLICK> non-linear in server speeds and arrival rate. Thus, we can’t simply look at the sum of server speeds in PowMin and PowMax, in the quiz, and find the optimal power allocation. How about the arrival rate, lambda. If lambda is really low, we’ll have very few jobs in our server farm at any moment of time, right ? Thus, using PowMin, which leads to many slow servers, will result in poor utilization of some servers. This is why PowMax was preferred in the quiz for low arrival rates. When lambda is high, all servers are well utilized. So, finding the optimal power split is not straightforward. It will depend on other factors, such as the scaling technology used by the servers. RECALL POWMIN AND POWMAX GENERIC DFINTION Non-linear in si and qi If λ:low If λ:high PowMin PowMin results in poor utilization of some servers All server well utilized. Choice of PowMin vs. PowMax depends on scaling tech. PowMin PowMax

Outline Experimental Setup Power  Speed Speed  Response time How power affects server speed for a single server Speed  Response time How response time of server farm depends on individual server speeds Now that we have some intuition of how power relates to speed, and how speed and arrival rate relate to mean response time, let us look at our results for optimal power allocation. Optimal power allocation Theorems and Experiments

Power Allocation Choices PowMin DVFS Ex: P = 720W PowMin = 4 X 180 DFS DVFS +DFS Frequency (GHz) PowMax Ex: P = 720W PowMax = 3 X 240 We’ll first formally define certain power allocation choices. The graph here shows DFS, DVFS and DVFS+DFS for CPU bound LINPACK. We define three parameters for these graphs: Pmin is the lowest power consumption, which is 180watts. Pmax is the highest power consumption, which is 240watts. Pknee is the power consumption at the knee of DVFS+DFS. In our case, this is 210watts. <CLICK> We define PowMin as the power allocation where we run P over Pmin servers at power Pmin each, and all other servers are off. For example, if we had a total power of 720watts, PowMin would mean running 4 servers at 180watts each. Likewise, we define PowMax as running P over Pmax servers at Pmax watts each. This would mean 3 servers at 240watts each in our example. Finally, we define PowMed as running P over Pknee servers at Pknee watts each. Note that in certain cases, such as the last example, we don’t have enough power left to turn on additional servers. Throughout the rest of the talk, we will choose the total power P to be a near exact multiple of Pmin, Pmax and Pknee, so we don’t have wastage. 180 210 240 PowMed Ex: P = 720W PowMed = 3 X 210

Power Allocation Theorems OUTPUT Optimal Power Allocation INPUTS System Parameters linear steep flat cubic PowMin Speed scaling technology Workload type Pmin, Pmax Arrival rate: (2 regimes) Open vs. Closed workload configuration THEOREMS PowMax The optimal power allocation, as we have seen, depends on various system parameters, such as <CLICK> the speed scaling technology .. <CLICK> .. which can be variants of linear, or cubic .. <CLICK> .. on the workload type, since <CLICK> it affects the power-to-speed relationship <CLICK> Pmin and Pmax are also important, since they define our allocation choices. <CLICK> They can depend on the speed scaling technology. <CLICK> Clearly, the arrival rate is important. Our theorems show that it matters whether you are above a certain threshold arrival rate, which we derive, or below it. <CLICK> Another important factor is whether we have an open workload configuration or a closed workload configuration. For this talk, <CLICK> we’ll only look at open configurations. Theorems and experimental results for closed configurations can be found in our paper. We have come up with theorems <CLICK> that take all these factors into account <CLICK> and output the optimal power allocation. We have found that the optimal power allocation is one of PowMin, PowMax and PowMed only, and no other allocation. λ < λ0 λ ≥ λ0 PowMed

Power Allocation Results: Outline CPU bound “LINPACK” Memory bound “STREAM” DFS DVFS DVFS+DFS The results will be presented as follows: We’ll first look at the CPU bound workload LINPACK. We’ll look at <CLICK> DFS, <CLICK> DVFS and <CLICK> DVFS+DFS. Then, we’ll look at the results for <CLICK> memory bound STREAM.

Power Allocation Results CPU bound “LINPACK” Memory bound “STREAM” DFS DVFS DVFS+DFS Alright, so our first result deals with DFS, which is the red curve here. <CLICK> Our first theorem tells us that if the speed scaling is linear and steep, then you always want to use PowMax. Alpha here is the slope of the speed scaling. Since DFS is steep, we predict PowMax to be optimal for mean response time. Let’s see what our experiments tell us. The y-axis in this graph is the mean response time. Thus, lower the better. The total power value P, in this experiment was 720 watts. Thus PowMin would mean 4 servers at 180watts and PowMax would mean 3 servers at 240watts. As you can see, PowMax, the grey curve, is below PowMin, the green curve, for all arrival rates. Thus, we have rightly predicted the optimal power allocation in the case of DFS. Also notice that the difference in mean response time is as much as a factor of 5 for high arrival rates. DFS Frequency (GHz) Power (Watts)

Power Allocation Results CPU bound “LINPACK” Memory bound “STREAM” DFS DVFS DVFS+DFS Next, we consider DVFS, the blue curve. <CLICK> Our second theorem tells us that if the speed scaling is linear and flat, such as DVFS, then the optimal allocation depends on the arrival rate. If the arrival rate is low, then PowMax is optimal. However, if the arrival rate is high, then PowMin is optimal. Experimentally, we find that this is exactly the case. Note how the green line, PowMin, produces lower response times than the grey line, PowMax at high arrival rates. Again, our theorems have rightly predicted the optimal power allocation. DVFS Frequency (GHz) Power (Watts)

Power Allocation Results CPU bound “LINPACK” Memory bound “STREAM” DFS DVFS DVFS+DFS Finally, we look at DVFS+DFS, which is the downwards concave curve here. <CLICK> Our third theorem tells us that if the speed scaling is cubic, like DVFS+DFS, then the optimal allocation depends upon the arrival rate. If the arrival rate is low, then PowMax is optimal. If the arrival rate is high, then PowMed is optimal. That is exactly what we see in our experiments. Note how the brown line, PowMed, achieves lower mean response times than the grey line, PowMax, at high arrival rates. For the sake of completion, we also show PowMin, here in green. Note that PowMin is not good at all in this case. DVFS +DFS Frequency (GHz) Power (Watts)

Power Allocation Results DFS CPU bound “LINPACK” Memory bound “STREAM” DFS DVFS DVFS+DFS Mean Resp. Time (sec) Arrival rate (jobs/sec) DVFS DVFS+DFS For STREAM, recall that all the speed scaling mechanisms had a cubic relationship. For a cubic, we expect PowMax to be optimal at low arrival rates and PowMed to be optimal at high arrival rates. <CLICK> For the DFS and DVFS, this is exactly what we observe in our experiments, although the difference is very small. However, for DVFS+DFS, we find that PowMax, the grey line, is optimal throughout the range of arrival rates. This is because the threshold value above which PowMed is optimal is greater than the arrival rates we have here. Mean Resp. Time (sec) Mean Resp. Time (sec) Arrival rate (jobs/sec) Arrival rate (jobs/sec)

Conclusions: How to allocate power optimally Speed Scaling? Linear, Steep Linear, Flat Cubic Arrival Rate? Arrival Rate? Arrival Rate? I would like to conclude this talk with a pictorial description of our optimal power allocation algorithm. This is quite easy to follow. Low High Low High Low High PowMax PowMax PowMax PowMin PowMax PowMed