Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.

Similar presentations


Presentation on theme: "Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer."— Presentation transcript:

1 Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer Sc. & Engg. Arizona State University & Phil Cayton, Intel Corp.

2 Heating problem in Data Center Power densities are increasing exponentially along with Moore Law Current cooling solutions at various levels Chip / component level Server/board level Rack level Data center level

3 Two steps of reducing heating effects Design and deployment stage (Civil & Mechanical Engineering Approach ) Increasing air conditioner capacity Designing optimized layout to facilitate air circulation Operation stage (Computer Science Approach) Example: dynamically assigning tasks to avoid overheated servers and to achieve thermal balancing Assigning task to servers who consume less energy

4 Thermal Management of Datacenter Motivation and significance Compute Intensive Applications (Online Gaming, Computer Movie Animation, Data Mining) requiring increased utilization of Data Center Maximizing computing capacity is a demanding requirement New blade servers can be packed more densely Energy cost is rising dramatically Goal Improving thermal performance Lowering hardware failure rate Reducing energy cost

5 Typical layout of a datacenter Rack outlet temperature T out Rack inlet temperature T in Air conditioner supply temperature T s

6 Schematic View of Thermal Management

7 Thermal-Aware Scheduling versus Datacenter Energy Cost

8 Thermal Scheduling: Problem Statement We present results of thermal-aware scheduling to improve the (blade server based) energy efficient of datacenter Given a total task C, how to divide it among N server node to finish computing task with minimal total energy cost ?

9 Energy Conservation Inlet Airflow, a mixture of Supplied cold air and Recirculated hot air Outlet Airflow Server Power Consumption P i Depending on amount of computing task

10 Thermal Management Different task assignment result in different power consumption distribution Different power consumption distribution results in different temperature distribution Different temperature distribution results in different total energy cost

11 Example Inlet temperature distribution without Cooling 25  C Cooling lowered Inlet temperature lowered blow redline threshold Different scheduling Results different inlet Temperature distribution Scheduling 1 Scheduling 2 Demand for cooling load /energy

12 Total Energy Cost of Datacenter Computing energy cost Cooling energy cost keep the maximal inlet temperature below the redline temperature of devices 25  C COP: Coefficient Of Performance (COP) Total Energy Cost the amount of heat removed the energy consumed by the cooling device. COP =

13 Observation Even with the same computing power dissipation, different temperature distribution may demand different cooling load, results in different total energy cost We can manipulating task scheduling to achieve best temperature distribution, consequently minimize total energy cost

14 Naive Scheduling Algorithm

15 Uniform Outlet Profile Why Naive Based on observation and intuition No mathematical formalization Uniform Outlet Profile (UOP) Assigning tasks in a way trying to achieve unifrom outlet temperature distribution Tc Assigning more task to nodes with low inlet temperature (water filling process) Tc Temperature rise due to power consumption Inlet Temperature

16 Uniform Task Uniform Task (UT) Assigning all chassis the same amount of tasks (power consumptions) All nodes experience the same power consumption and temperature rise

17 Minimum Computing Energy Minimum computing energy (cooling inlet) Assigning tasks in a way to keep the number of active (power on) chassis as small as possible

18 Abstract Heat Flow Mode & Cross Interference Coefficients

19 Abstract Heat Flow Model Observation Airflow pattern are stable (confirmed through CFD simulation) Hypothesis The amount of recirculated heat is stable, can be characterized Define a ij the percentage of recirculated heat from node i to node j

20 Cross Interference among Server Nodes Cross Interference Coefficients (CIC) Define a ij the percentage of recirculated heat from node i to node j Cross interference coefficients Cross Interference Matrix Correlations among power consumption (utilization rate), temperature, and cross interference

21 Fast Thermal Evaluation Use profiling process to calculate cross interference coefficients Temperature Prediction A Configuration of Distributed System Numerical Simulation (hours) Fast Thermal Evaluation (real time) Thermal Performance Evaluation

22 Recirculation Minimized Scheduling: XInt

23 Formalizing optimization problem To minimize cooling energy cost, we only need to minimize maximal inlet temperature Formalized optimization problem based on abstract heat flow model, can be converged into LP, ILP, linear, nonlinear problems according to different models and policies

24 Simulation Results

25 Simulation Environment 2 Row Datacenter Ten standard 42U racks Each rack has five Dell 1855 Blade server CFD simulation is used for evaluate temperature distribution (Flovent from Flomerics)

26 DataCenter model Node 1 Node 2 Node 5 Node 50 Node 25 Node 30

27 Cross Interference Coefficients Confirmed with datacenter reality Strong interference to neighboring nodes

28 Fast Thermal Evaluation Results Provides fast and accurate temperature prediction Practical for online real-time thermal management

29 Simulation Results: Cooling Cost

30 Simulation Results: Analysis & Summary XInt consistently outperforms all other scheduling algorithms Compared with MinHR, XInt is more practicabel Task oriented scheduling vs. Power oriented scheduling Online, real-time XInt is mathematically formalized

31 Future Works Integrating with cluster management software platforms Moab, Torque, etc Considering task priorities and time constraints

32 Questions ?

33 Related Works Consil vs Fast Thermal Evaluation Deduction vs. Prediction Current vs. future, which is more important for proactive and preventive thermal management MinHR vs. XInt Both characterize recirculation in similar granulites Aggregated effects vs. point to point Offline vs. online Power oriented vs. Task oriented

34 Supply Heat Index (SHI) Roughly characterize recirculation Cannot differentiate the same SHI but different temperature distribution


Download ppt "Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer."

Similar presentations


Ads by Google