Download presentation
Presentation is loading. Please wait.
Published byClaribel Pierce Modified over 9 years ago
1
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer Sc. & Engg. Arizona State University & Phil Cayton, Intel Corp.
2
Heating problem in Data Center Power densities are increasing exponentially along with Moore Law Current cooling solutions at various levels Chip / component level Server/board level Rack level Data center level
3
Two steps of reducing heating effects Design and deployment stage (Civil & Mechanical Engineering Approach ) Increasing air conditioner capacity Designing optimized layout to facilitate air circulation Operation stage (Computer Science Approach) Example: dynamically assigning tasks to avoid overheated servers and to achieve thermal balancing Assigning task to servers who consume less energy
4
Thermal Management of Datacenter Motivation and significance Compute Intensive Applications (Online Gaming, Computer Movie Animation, Data Mining) requiring increased utilization of Data Center Maximizing computing capacity is a demanding requirement New blade servers can be packed more densely Energy cost is rising dramatically Goal Improving thermal performance Lowering hardware failure rate Reducing energy cost
5
Typical layout of a datacenter Rack outlet temperature T out Rack inlet temperature T in Air conditioner supply temperature T s
6
Schematic View of Thermal Management
7
Thermal-Aware Scheduling versus Datacenter Energy Cost
8
Thermal Scheduling: Problem Statement We present results of thermal-aware scheduling to improve the (blade server based) energy efficient of datacenter Given a total task C, how to divide it among N server node to finish computing task with minimal total energy cost ?
9
Energy Conservation Inlet Airflow, a mixture of Supplied cold air and Recirculated hot air Outlet Airflow Server Power Consumption P i Depending on amount of computing task
10
Thermal Management Different task assignment result in different power consumption distribution Different power consumption distribution results in different temperature distribution Different temperature distribution results in different total energy cost
11
Example Inlet temperature distribution without Cooling 25 C Cooling lowered Inlet temperature lowered blow redline threshold Different scheduling Results different inlet Temperature distribution Scheduling 1 Scheduling 2 Demand for cooling load /energy
12
Total Energy Cost of Datacenter Computing energy cost Cooling energy cost keep the maximal inlet temperature below the redline temperature of devices 25 C COP: Coefficient Of Performance (COP) Total Energy Cost the amount of heat removed the energy consumed by the cooling device. COP =
13
Observation Even with the same computing power dissipation, different temperature distribution may demand different cooling load, results in different total energy cost We can manipulating task scheduling to achieve best temperature distribution, consequently minimize total energy cost
14
Naive Scheduling Algorithm
15
Uniform Outlet Profile Why Naive Based on observation and intuition No mathematical formalization Uniform Outlet Profile (UOP) Assigning tasks in a way trying to achieve unifrom outlet temperature distribution Tc Assigning more task to nodes with low inlet temperature (water filling process) Tc Temperature rise due to power consumption Inlet Temperature
16
Uniform Task Uniform Task (UT) Assigning all chassis the same amount of tasks (power consumptions) All nodes experience the same power consumption and temperature rise
17
Minimum Computing Energy Minimum computing energy (cooling inlet) Assigning tasks in a way to keep the number of active (power on) chassis as small as possible
18
Abstract Heat Flow Mode & Cross Interference Coefficients
19
Abstract Heat Flow Model Observation Airflow pattern are stable (confirmed through CFD simulation) Hypothesis The amount of recirculated heat is stable, can be characterized Define a ij the percentage of recirculated heat from node i to node j
20
Cross Interference among Server Nodes Cross Interference Coefficients (CIC) Define a ij the percentage of recirculated heat from node i to node j Cross interference coefficients Cross Interference Matrix Correlations among power consumption (utilization rate), temperature, and cross interference
21
Fast Thermal Evaluation Use profiling process to calculate cross interference coefficients Temperature Prediction A Configuration of Distributed System Numerical Simulation (hours) Fast Thermal Evaluation (real time) Thermal Performance Evaluation
22
Recirculation Minimized Scheduling: XInt
23
Formalizing optimization problem To minimize cooling energy cost, we only need to minimize maximal inlet temperature Formalized optimization problem based on abstract heat flow model, can be converged into LP, ILP, linear, nonlinear problems according to different models and policies
24
Simulation Results
25
Simulation Environment 2 Row Datacenter Ten standard 42U racks Each rack has five Dell 1855 Blade server CFD simulation is used for evaluate temperature distribution (Flovent from Flomerics)
26
DataCenter model Node 1 Node 2 Node 5 Node 50 Node 25 Node 30
27
Cross Interference Coefficients Confirmed with datacenter reality Strong interference to neighboring nodes
28
Fast Thermal Evaluation Results Provides fast and accurate temperature prediction Practical for online real-time thermal management
29
Simulation Results: Cooling Cost
30
Simulation Results: Analysis & Summary XInt consistently outperforms all other scheduling algorithms Compared with MinHR, XInt is more practicabel Task oriented scheduling vs. Power oriented scheduling Online, real-time XInt is mathematically formalized
31
Future Works Integrating with cluster management software platforms Moab, Torque, etc Considering task priorities and time constraints
32
Questions ?
33
Related Works Consil vs Fast Thermal Evaluation Deduction vs. Prediction Current vs. future, which is more important for proactive and preventive thermal management MinHR vs. XInt Both characterize recirculation in similar granulites Aggregated effects vs. point to point Offline vs. online Power oriented vs. Task oriented
34
Supply Heat Index (SHI) Roughly characterize recirculation Cannot differentiate the same SHI but different temperature distribution
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.