Download presentation
Presentation is loading. Please wait.
Published byMorris Tyler Modified over 9 years ago
1
Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab (www.impact.asu.edu) Department of Comp. Sc. & Engg. Arizona State University
2
2 Outline Motivation Dynamic Thermal Management in Datacenters Thermal-aware task scheduling Software Architecture Conclusions and Future work
3
3 Motivation Computing clusters are increasingly deployed in current datacenters limited by power and thermal capacity Computing clusters are increasingly deployed in current datacenters limited by power and thermal capacity High server density to achieve higher computation capability - Leads to high heat densityHigh server density to achieve higher computation capability - Leads to high heat density Reliability and longevity of the overheated servers is affected - System downtime may increaseReliability and longevity of the overheated servers is affected - System downtime may increase Rising cost for datacenters Rising cost for datacenters Large scale datacenters can run into millions of dollars - Cooling cost comprises almost half of thisLarge scale datacenters can run into millions of dollars - Cooling cost comprises almost half of this Current trend of overcooling based on worst case thermal characteristics lead to high utilities costCurrent trend of overcooling based on worst case thermal characteristics lead to high utilities cost A dynamic thermal-aware control platform is necessary for online thermal evaluation that can achieve a tradeoff between these extremes. A dynamic thermal-aware control platform is necessary for online thermal evaluation that can achieve a tradeoff between these extremes.
4
4 Thermal Management of Datacenter Motivation and significance Compute Intensive Applications (Online Gaming, Computer Movie Animation, Data Mining) requiring increased utilization of Data Center Compute Intensive Applications (Online Gaming, Computer Movie Animation, Data Mining) requiring increased utilization of Data Center Maximizing computing capacity is a demanding requirementMaximizing computing capacity is a demanding requirement New blade servers can be packed more densely New blade servers can be packed more densely Energy cost is rising dramatically Energy cost is rising dramatically Goal Improving thermal performanceImproving thermal performance Lowering hardware failure rateLowering hardware failure rate Reducing energy costReducing energy cost
5
5 Typical layout of a datacenter Rack outlet temperature T out Rack inlet temperature T in Air conditioner supply temperature T s
6
6 Schematic View of Thermal Management
7
7 Research Issues of Thermal Management in Datacenter Abstract Heat Flow Model Power & Load Characterization Modeling Thermal Performance Multiscale & Multimodal Info Analysis Thermal Performance Evaluation Cost Optimization Scheduler Other Impact Factors Understanding Control
8
8 Task scheduling and Thermal Distribution Co- relation Reaction Chain Scheduling Requirements Real-time measurement Online lightweight temperature prediction Thermal-awareness in the scheduling decisions Task Assignment Power Consumption Distribution Temperature Distribution Energy Cost Task Assignment Power Consumption Distribution Inlet temperature distribution without Cooling 25 C Cooling lowered Inlet temperature lowered Blow redline threshold Demand for cooling load /energy Demand for cooling load/energy
9
9 Thermal-aware scheduling Techniques Uniform Task distribution (UT) Assigning all chassis the same amount of tasks (power consumptions) Assigning all chassis the same amount of tasks (power consumptions) Uniform Outlet Profile (UOP) Assigning tasks in a way trying to achieve outlet temperature balance (uniform distribution) Assigning tasks in a way trying to achieve outlet temperature balance (uniform distribution) Minimum Computing Energy (coolest inlet) (MCE) Assigning tasks in a way to keep the number of active (power on) chassis as small as possible Assigning tasks in a way to keep the number of active (power on) chassis as small as possible Recirculation Minimized Scheduling (XInt) Use profiling process to calculate cross interference coefficients Use profiling process to calculate cross interference coefficients
10
10 Total Energy Cost Comparisons
11
11 System Model & Cluster Set-up Saguaro Cluster is the main cluster maintained by the High Performance Computing Initiative at ASU. 4 racks, 5 chassis per rack, 10 dual- processors per chassis4 racks, 5 chassis per rack, 10 dual- processors per chassis
12
12 Cluster Management S/W Infrastructure We used Moab scheduler for job allocation in this cluster. We used Moab scheduler for job allocation in this cluster. Easy to useEasy to use Provides good graphical interface in the form of Moab Cluster Manager (MCM).Provides good graphical interface in the form of Moab Cluster Manager (MCM). Job re-allocation is allowed based on priorityJob re-allocation is allowed based on priority uses of the underlying resource management software (such as torque) and enforces the scheduling policies (such as fair-share) selected from the GUIuses of the underlying resource management software (such as torque) and enforces the scheduling policies (such as fair-share) selected from the GUI Thermal awareness is integrated into the Moab Scheduler. Thermal awareness is integrated into the Moab Scheduler. Priority is set as a function of temperature, utilization, etc.Priority is set as a function of temperature, utilization, etc. PHP based datacenter visualization. PHP based datacenter visualization. Moab Cluster Management GUI Moab Server Resource Management (Torque) Data Center
13
13 Chassis Level Sensor Data Collection SNMP based script periodically queries sensors and updates server database SNMP based script periodically queries sensors and updates server database PHP script periodically accesses the database for presenting the thermal history in the webpage PHP script periodically accesses the database for presenting the thermal history in the webpage 11 outlet Temperature sensors at back of the chassis 3 housing Temperature sensors at middle of the chassis Sensor Placement at each chassis* * There is only one inlet sensor at the front of the chassis
14
14 Visualization and Scheduler Integration Temperature data is included as Generic Metric (GMETRIC) in Moab. Node priority is set based on moab GMETRIC data.
15
15 Putting it all together: Software Architecture Presentation Scheduling Control Datacenter Servers Access data from the chassis level sensors
16
16 Modularized Implementation of Thermal Awareness in Task Scheduling
17
17 Conclusions Proposed Architecture enables dynamic on-line thermal management during datacenter operation. enables dynamic on-line thermal management during datacenter operation. provides visualization of thermal distribution provides visualization of thermal distribution Implemented in fully operational ASU datacenter. Prototype development and demonstration at the Research @ Intel day.
18
Questions ??
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.