Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thermal Aware Server Provisioning (TASP) and Workload Distribution (TAWD) for Internet Data Centers (IDCs) Zahra Abbasi, Georgios Varsamopoulos and Sandeep.

Similar presentations


Presentation on theme: "Thermal Aware Server Provisioning (TASP) and Workload Distribution (TAWD) for Internet Data Centers (IDCs) Zahra Abbasi, Georgios Varsamopoulos and Sandeep."— Presentation transcript:

1 Thermal Aware Server Provisioning (TASP) and Workload Distribution (TAWD) for Internet Data Centers (IDCs) Zahra Abbasi, Georgios Varsamopoulos and Sandeep Gupta Impact Laboratory School of Computing, Informatics and Decision Systems Engineering Arizona State University http://impact.asu.edu/ Funded in parts by NSF CNS grants #0834797 and #0855277, and by Intel Corp

2 Introduction-Motivation The magnitude of data center energy consumption  Internet users’ growth in the world from 2000-2009: 400% [http://www.internetworldstats.com/stats.htm]  Data center energy consumption grew 20-30% annually in 2006 and 2007 [ Uptime Institute research] Addressing energy saving for internet Data Center  Thermal awareness to improve energy consumption 2 Historical energy use Future energy use projection - current efficiency trend Projected Electricity Use of data centers\, 2007 to 2011 Typical data center energy end use [Source: EPA] [Source: Department of energy]

3 The BlueTool project http://impact.asu.edu/BlueTool/

4 Talk outline Why thermal awareness for data centers? Opportunities for energy saving in IDCs TASP and TAWD Modeling IDCs:  Software: two tier architecture  Hardware: performance and power consumption Heuristics for TASP and TAWD Simulation model and results Experimental validation of TASP and TAWD 4

5 Typical layout of a data center Rack outlet temperature T out Rack inlet temperature T in Computing Room Air conditioner (CRAC) supply temperature T sup T out T in T sup Redline temperature:T red of T in Fahrenheit [Source: Uptime Institute research ] Heat recirculation

6 Possible ways to save energy in IDCs Resizing the active server set  Dynamically changing number of active servers according to the long traffic fluctuation (couple of hours) [Chase et al. SOSP ’01], [Chen et al. NSDI ’08] DVFS  Adapting CPU speed with respect to incoming workload during fine time slots [Ranganathan et al. ISCA ’06] Virtualization  Consolidating applications in a few physical machines with respect to their performance requirement [Kusic et al. CCJ ’09] Thermal awareness  Considering servers’ thermally impact on the cooling system. Servers’ thermal impact is tightly related to the heat recirculation. 6

7 Why thermal awareness? PUE (Power Usage Effectiveness) : A metric to measure data center energy efficiency  Large value for PUE is indication of large value for cooling energy Current state of PUE 7 Total Facility Power {cooling, IT equip., lighting, other} PUE= IT equipment Power ScenarioPUE Current Trends1.9 Improved Operations 1.7 Best Practices1.3 State-of-the-Art1.2 EPA estimated value of PUE for 2011 (2007 report)

8 Improving cooling energy by thermal aware task placement [Moore et al. ATEC’ 05], [Tang et al. T-PDS ’08] Task placement determines IDCs’ thermal profile Servers thermally interfere with each other by recirculated heat Heat recirculation is uneven and creates hot spots CRAC must supply sufficient cooling to keep hot spots under the redline temperature Thermal aware task placement can reduce heat recirculation and hot spots and improve cooling efficiency 8

9 Thermal aware task placement 9 [Source: HP] Heat removed CoP= Work required to remove the heat Coefficient of Performance T sup : CRAC Supply Temperature =+ × TinT sup DP inlet temperatures supplied air temperatures heat distribution power vector Cooling = P computing /CoP(T sup ) =P computing /CoP(T red – max i (D i P)) N1AC N3 N2 d 21 d 31 d 11 d 12 d 13 T sup T AC, in T in Recirculation Servers thermally interfere with each other by recirculated heat Linear model for the heat recirculation [Tang et al. T-PDS ’08] CRAC ‘s CoP Directly affected by task placement

10 Measuring thermal efficiency: LRH Thermal efficiency: least contribution on the heat recirculation LRH: A metric of thermal efficiency of a server [Tang et al. T-PDS ’08] 10  Based on a two-layer rank calculation  Rank the servers as recipients of heat recirculation  Rank the servers as contributors of heat recirculation LRH weight of S = Σ recipients recipient value  amount of heat from S to recipient LRH rank of Server B is worse than A B The direction and amount of heat recirculation A Example: LRH ranking of servers A and B

11 Measuring thermal efficiency: LRH Thermal efficiency: least contribution on the heat recirculation LRH: A metric of thermal efficiency of a server [Tang et al. T-PDS ’08] 11  Based on a two-layer rank calculation  Rank the servers as recipients of heat recirculation  Rank the servers as contributors of heat recirculation LRH weight of S = Σ recipients recipient value  amount of heat from S to recipient LowMediumHigh Server A: Lowlow medium Server B: Mediummedium high Server C: highmediumhigh Contribution on theheat recirculation Incoming heat to the recipients of heat recirculation (Low means better LRH rank) Example: LRH ranking of servers A, B and C with respect to their heat recipients

12 TASP and TAWD TASP  Saving energy by choosing active server set according to servers’ computing power efficiency(Joules/MIPS) AND thermal efficiency (e.g. LRH)  Doing TASP in long time intervals (couple of hours) called epochs TAWD  Saving more energy by skewing workload toward thermally efficient and computing power efficient servers in fine time slots Constraints  Maintain performance [response time]  Prevent redlining of servers 12

13 13 Server 1Server 2Server 3 Server N Load Dispatcher …… TASP Tier (Epochs) {λi}{λi} TAWD Tier (Slots) λ request/sec On/Off Control Traffic flow Parameters Control data Server N-1 Two-Tier architecture for IDCs Time index (every 5 second) Number of requests HTTP requests over time, 1998 FIFA World Cup Server 1Server 3 Server N-1 λ 1 =0 λ2λ2 λ 3 =0 λNλN λ N-1 =0 Heat recirculation contribution Computing capabilities of machines Computing power efficiency ∑ λ i = λ

14 TASP and TAWD: Problem statements TASP :  Input: Data center server set S with N servers, epoch lengths (T), history of arrival rate  Find: Active server set: Ŝ ⊆ S, where, | Ŝ|=n ≤ N  Objective: Minimizing total energy  Constraint: Performance requirements [i.e. response time] TAWD:  Input: Active server set Ŝ, L fine time slots of length t (T=Lt), history of arrival rate  Find: For time slot m (1 ≤ m ≤ L) the workload distribution among active servers : λ im ∀ s i ∈ Ŝ  Objective: Minimizing total energy  Constraint: Performance requirements [i.e. response time] 14

15 TASP analytical formulation : Prerequisites 15  Observed relationship b/w CPU utilization, arrival rate and turnaround time  Model: Energy consumption modeling  Assumption: Dynamic CRAC temperature setting  E total = E computing + E cooling (=E computing /CoP(T sup )) Performance modeling This λ max u thresh dual-CPU dual-core E7520- chipset “Sossaman” Xeon LV systems Arrival rate (Requests/Sec)

16 TASP analytical formulation: Prerequisites Power Consumption Modeling  Linear relationship between power consumption and utilization Workload Prediction Request_Arrival Peak = Request_arrival Avg. + Request_arrival Std. dev 16 ω Idle power power ω + α Maximum power Utilization 0 1 Active server set size overestimation factor (>1) Kalman Filtering Λ peak

17 Formulating TASP: Optimization problem Unknown variable  How many servers are required?  Which servers among all servers should be chosen as active server set? Objective: Minimizing total energy consumption: Constraints:  Meet the capacity requirement:  x is a binary vector: 17 Defining a binary vector as the variable. Each element determines if a server should be chosen or not. x: 100101….. Computing power Heat recirculation

18 Heuristic approaches for TASP MinMax Approximation  Numerical approx.:SQP (Seq. Quad. Prog.)  Ŝ determined by discretization real solution  High time complexity (QP: O(n 5 )) sLRH (scaled LRH)  Ranking servers based on heat recirculation and computing performance:  Ŝ = High ranking server sufficient for peak request arrival CP-sLRH (Computing Power efficiency and sLRH)  servers are first ranked according to their computing power efficiency (J/MIPS) and then according to sLRH  Ŝ determined similar to sLRH 18 Series1 Series2 Series1 Series2 Heat recirculation Computing power efficiency: Series1>Series2 Series1 Series2 Least recirculated heat (LRH i ) Computing efficiency (MIPS) sLRH i =

19 Heuristic solution for TAWD  Ranking servers based on CP-sLRH  Giving the maximum affordable workload to the highest ranking servers Skewing workloads toward thermal efficient server rather than performance oriented Load Balancing (LB) Solution for TAWD 19 Utilization in LBUtilization in TAWD

20 Evaluation Baseline algorithms  TASP versus CPSP (Computing Power efficient Server Provisioning)  TAWD versus LB Simulation  Heterogeneous DC  Model ASU HPCI (PUE>1.3) Heat recirculation using CFD 50 computing nodes (1000 cores) Experimental validation  Using carton boxed Sossaman systems 20 ASU HPCI data center

21 Validating thermal awareness TASP Validation:  NoSP: All servers are equally utilized (25%)  TASP(sLRH): Two thermal efficient servers are utilized 50% and the other two machines are turned off  CPSP (Thermally oblivious): Two non thermal efficient ate utilized 50% and the other two are turned off TAWD Validation  LB: All servers are equally utilized such that their utilization fluctuate over fine time slots (30 second)  TAWD: Workload is skewed toward thermal efficient servers in fine time slots, such that the total workload in any moment equals to LB scenario 21

22 Simulation: Workload model SPECweb2009 benchmark (e-commerce ) suite Synthesizing SPECweb2009 + FIFA World CUP 1998 22 HTTP requests over time, 1998 FIFA World Cup Request arrival rate over time of SPECweb2009 epoch-level peaks are obtained from the 1998 FIFA World Cup traces

23 Simulation results: TASP versus CPSP 23 Energy saving with respect to CPSP for different TASP schemes over different values for

24 Simulation results- TASP-TAWD versus TASP-LB 24 Energy saving of TAWD with respect to LB.

25 Conclusion Extra energy saving by choice of servers  TASP approaches have nothing to do with QoS violations  TASP MiniMax approach yields to the maximum energy saving  CP-sLRH, the low complex heuristic approach, can be used for large sale data center More energy saving by combining TASP with TAWD  TAWD improved cooling energy by 3% 25

26 Future Works BlueTool Project (Ongoing project) http://impact.asu.edu/BlueTool/wiki/index.php/Main_Page Enhancing Thermal Modeling  Considering the dynamic behavior of cooling systems Virtualization, internet multi-tier applications 26

27 THANKS Questions? 27

28 References [Moore et al. ATEC ’05] J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making scheduling "cool": temperature-aware workload placement in data centers,” in ATEC ’05: Proceedings of the annual conference on USENIX Annual Technical Conference. [Tang et al. T-PDS ’08] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, “Energy-ecient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach,” IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 11, pp. 1458– 1472, 2008. [Chase et al. SOSP ’01] J. Chase, D. Anderson, P. Thakar, A. Vahdat, and R. Doyle, “Managing energy and server resources in hosting centers,” in SOSP ’01: Proceedings of the eighteenth ACM symposium on Operating systems principles. New York, NY, USA: ACM, 2001, pp. 103–116. [Chen et al. NSDI ’08] Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gautam, “Managing server energy and operational costs in hosting centers,” SIGMETRICS Performance Evaluation Review, vol. 33, no. 1, pp. 303–314, 2005. [Ranganathan et al. ISCA ’06] P. Ranganathan, P. Leech, D. Irwin, and J. Chase, “Ensemble-level power management for dense blade servers,”. ISCA ’06. 33 rd International Symposium in Computer Architecture, 2006, pp. 66–77. [Kusic et al. CCJ ’09] D. Kusic, J. O. Kephart, J. E. Hanson, N. Kandasamy, and G. Jiang, “Power and performance management of virtualized computing environments via lookahead control,” Cluster Computing, vol. 12, pp. 1–15, 2009. 28

29 LRH weight Experimental setup to validate TASP and TAWD Heat recirculation Coefficient LRH A =0.69P 2 LRH B =0.36P 2 LRH C =0 LRH D =0 A B CD

30 Experimental Results TASP Validation:  NoSP: All servers are equally utilized (25%)  TASP(sLRH): Two thermal efficient servers are utilized 50% and the other two machines are turned off  CPSP (Thermally oblivious): Two non thermal efficient ate utilized 50% and the other two are turned off TAWD Validation  LB: All servers are equally utilized such that their utilization fluctuate over fine time slots (30 second)  TAWD: Workload is skewed toward thermal efficient servers in fine time slots, such that the total workload in any moment equals to LB scenario 30

31 System model and assumptions Heterogeneous Data Center  Different computing efficiency (MIPS)  Different computing power efficiency (Joules/MIPS)  The solutions can be applied for Homogenous data centers Heat recirculation in the data center room  Computing Racks are organized in hot aisle and cold isle  Heat recirculation among computing nodes  Different thermal efficiency 31

32 Energy consumption model of Data Center 32 Coefficient of Performance (source: HP) =+ × TinTsupDP inlet temperatures supplied air temperatures heat distribution power vector Computing power Cooling power + E total = [1] J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making scheduling "cool": temperature-aware workload placement in data centers,” in ATEC ’05: Proceedings of the annual conference on USENIX Annual Technical conference. Berkeley, CA, USA: USENIX Association, 2005, pp. 5–5. [2] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, “Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach,” IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 11, pp. 1458–1472, 2008. = + Improving cooling energy by minimizing the maximum of servers’ inlet temperature

33 Perquisites of the analytical formulation for TASP Energy consumption modeling Computing Power consumption modeling  Linear model with respect to utilization Performance modeling (Response time)  Posing a cap for the CPU utilization: Workload modeling  Kalman filtering to predict average traffic, 33 = + Computing power Cooling power Idle power Workload arrival during fine time slots

34 Formulating TASP: Optimization problem Unknown variable  How many servers are required?  Which servers among all servers should be chosen as active server set? Objective: Minimizing total energy consumption: Constraint:  Meet the capacity requirement:  x is a binary vector: 34 Defining a binary vector as the variable. Each element determines if a server should be chosen or not. x: 100101….. Computing power Heat recirculation

35 Formulating TAWD Unknown Variable:  Finding the workload distribution weights of : Objective: Minimizing total energy consumption during a slot Constraints:  Performance Constraint :  Capacity Constraint: Solutions: Using heuristic approaches (CP-sLRH)  Ranking servers based on CP-sLRH  Giving the maximum affordable workload to the highest ranking servers 35

36 Evaluation Baseline Algorithms  TASP with respect to CPSP  TAWD with respect to LB Evaluation methods  Experiments (Small scale)  Simulation 36 Series1 Series2

37 Simulation results -The performance of various TASP approaches Saving energy over time: MinMax and CP-LRH always surpasses CPSP, sLRH may perform worse than CPSP when active server set becomes large 37 Energy consumption of thermal aware server provisioning scenario over time (intervals in epochs). MinMax always do better than CPSP. Request arrival rate over time of SPECweb2009 where epoch-level peaks are obtained from the 1998 FIFA World Cup traces

38 The More overestimation The less energy saving The more overestimation the less QoS violations The smaller active server size the larger saving Energy saving with respect to CPSP for different TASP schemes over ϒ. Note that higher utilization yields higher savings. 38 Simulation results -Energy saving with respect to the overestimation factor ( ϒ ) CPU utilization violations with respect to ϒ over time. Violations for ϒ =1 are much higher than for the rest values. Energy saving of MiniMax over different number of active server size for ϒ =6.

39 Simulation results -The performance of various TASP approaches 39 Total energy consumption with respect to server provisioning scenario. The energy-saving percentages are with respect to CPSP.

40 Simulation results Performance of TAWD Saving energy through skewing workload toward thermal efficient servers Average data center utilization of each server (over one week), as sorted with respect to LRH. The effects of TAWD’s load skewing on the utilization are obvious. 40

41 System model and assumptions Virtualized Data Center  All systems are capable of running any web application Internet traffic  Short transaction-based traffic  Short and long term variation 41 - Software assumptions http://www.internetworldstats.com/stats.htm

42 TASP Algorithm 42

43 TAWD Algorithm 43

44 How effective are TASP and TAWD? Simulation setup according to the physical layout of ASU HPCI data centre and the combination of SPECweb traffic profile and FIFA World CUP 1998 web trace Evaluating TASP compared to CPSP(Computing Power based Server Provisioning)  Saving energy from 4.5% to 8.4% with respect to TASP scenario and overestimation of active server set size Evaluating TAWD compared to LB(Load Balancing)  Saving 1% more energy by combination of TASP and TAWD 44


Download ppt "Thermal Aware Server Provisioning (TASP) and Workload Distribution (TAWD) for Internet Data Centers (IDCs) Zahra Abbasi, Georgios Varsamopoulos and Sandeep."

Similar presentations


Ads by Google