Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.

Slides:



Advertisements
Similar presentations
Data Center Design Issues Bill Tschudi, LBNL
Advertisements

International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
1 * Other names and brands may be claimed as the property of others. Copyright © 2010, Intel Corporation. Data Center Efficiency with Optimized Cooling.
Variable Frequency Drives VFD Basics
Thermal-Scheduling For Ultra Low Power Mobile Microprocessor May, Thermal-Scheduling For Ultra Low Power Mobile Microprocessor George Cai 1 Chee.
Efficient Resource Management for Cloud Computing Environments Andrew J. Younge, Gregor von Laszewski, Lizhe Wang, Sonia Lopez-Alarcon, Warren Carithers.
Cloud Computing Data Centers Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of Computing, UNF.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
Utility-Function-Driven Energy- Efficient Cooling in Data Centers Authors: Rajarshi Das, Jeffrey Kephart, Jonathan Lenchner, Hendrik Hamamn IBM Thomas.
Keeping Hot Chips Cool Thermal Management for Green Computing Yang Ge Professor Qinru Qiu.
Power Delivery Challenges for High Performance Low Voltage Microprocessors Tanay Karnik Microprocessor Research Labs Intel Corporation November 9, 2001.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
1 The Problem of Power Consumption in Servers L. Minas and B. Ellison Intel-Lab In Dr. Dobb’s Journal, May 2009 Prepared and presented by Yan Cai Fall.
Effect of Rack Server Population on Temperatures in Data Centers CEETHERM Data Center Laboratory G.W. Woodruff School of Mechanical Engineering Georgia.
CoolAir Temperature- and Variation-Aware Management for Free-Cooled Datacenters Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini 1.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Presented by:- NAME : Sanjay Kumar Pani BRANCH : Electrical & Electronics GROUP : ‘B1’ ROLL NO. : E/04/14.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Green IT and Data Centers Darshan R. Kapadia Gregor von Laszewski 1.
Department of Computer Science Engineering SRM University
CHAPTER 18 Power Supplies. Objectives Describe and Analyze: Power Supply Systems Regulation Buck & Boost Regulators Flyback Regulators Off-Line Power.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Last Time Performance Analysis It’s all relative
The 4 functions of a computer are 1.Input 2.Output 3.Storage 4.Processing.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
Summer Report Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab ( Department.
Thermal Aware Data Management in Cloud based Data Centers Ling Liu College of Computing Georgia Institute of Technology NSF SEEDM workshop, May 2-3, 2011.
Green Computing Metrics: Power, Temperature, CO2, … Computing system: Many-cores, Clusters, Grids and Clouds Algorithm and model: task scheduling, CFD.
ATAC: Ambient Temperature- Aware Capping for Power Efficient Datacenters Sungkap Yeo Mohammad M. Hossain Jen-cheng Huang Hsien-Hsin S. Lee.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
A Closer Look at Energy Demands: Quantification and Characterisation.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Thermal-aware Task Placement in Data Centers Qinghui Tang Sandeep K S Gupta Georgios Varsamopoulos IMPACT Lab Arizona State University.
Accounting for Load Variation in Energy-Efficient Data Centers
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Thermal Management in Datacenters Ayan Banerjee. Thermal Management using task placement Tasks: Requires a certain number of servers (cores) for a specified.
CS203 – Advanced Computer Architecture
1 PCE 2.1: The Co-Relationship of Containment and CFDs Gordon Johnson Senior CFD Manager at Subzero Engineering CDCDP (Certified Data Center Design Professional)
Overview Motivation (Kevin) Thermal issues (Kevin)
CS203 – Advanced Computer Architecture
Unit 2: Chapter 2 Cooling.
Thermal-aware Task Placement in Data Centers (part 4)
Forecasting with Cyber-physical Interactions in Data Centers (part 3)
Temperature and Power Management
Green cloud computing 2 Cs 595 Lecture 15.
Thermal-aware Task Placement in Data Centers
Architecture & Organization 1
System Control based Renewable Energy Resources in Smart Grid Consumer
Sensing the Datacenter
Cloud Computing Data Centers
Architecture & Organization 1
Where Does the Power go in DCs & How to get it Back
Cloud Computing Data Centers
The University of Adelaide, School of Computer Science
Presentation transcript:

Thermal-aware Issues in Computers IMPACT Lab

Part A Overview of Thermal-related Technologies

Importance of thermal management ► Cooling cost very high:  at providing cool air: equals the power consumed in computation  at bring the cool medium (air/liquid) to the circuitry: new density requires $2  Watt of material/equipment if 40+ Watts of IC ► Excessive heat accelerates material degradation ► Power density only to increase in the future

Thermal management at various levels ► Physical dimension  At IC level  At chassis/case level  At room level ► Software dimension  Firmware level  Operating system level  Middleware level  Application level Source: Intel Source: Apple Source: Berkeley Lab

At integrated circuit level ► Issues  Higher temperature  Increased power leakage  Increased power leakage  Higher temperature  Heat density – hot spots ► Applied Solutions  Dynamic Voltage Scaling  Dynamic Frequency Scaling  Clock gating (“pause” mode) ► Research solutions  Redundant circuitry ► Redundant “cores” [Chapparro 2004] ► Redundant pipelines [Lim 2002] ► Switch from one circuitry to the other either regularly or when temperature exceeds levels

At chassis/case level At chassis/case level ► Issues  Fan capacity at low RPMs not enough for generated heat  Fan noise level at high RPMs too high ► Solutions  Dynamic Fan Speed  CPU load balancing  Activity Adjustments ► Dynamic Memory bandwidth scaling [Apple TN2156] ► Dynamic FSB frequency scaling Layout forces flow of air in a linear fashion Source: Intel Source: Apple Terms: inlets, outlets

At room level ► Solutions:  Pause execution of tasks  Turn machines off ► Performance impacts  Degraded performance Source: Source: Elibo, Hong Kong Terms: hot aisle, cold aisle, raised floors, CRAC/HVAC

A typical data center Source: Siemens Terms: hot aisle, cold aisle, raised floors, CRAC/HVAC

CRAC & thermal maps: knowing where the hot spots are ► Purpose  Knowing air temperature at any 3-D point  Adjust CRAC operation  Adjust computer operation ► Obtaining by  Strategically placed sensors  On-board sensors ► Predicting by  Thorough testing  CFD simulations

Thermal issues in dense computer rooms (Data centers, Computer Clusters, Data warehouses) ► Heat recirculation  Hot air from the equipment outlets is fed back to the equipment inlets ► Hot spots  Effect of Heat Recirculation  Areas in the data center with alarmingly high temperature ► Impact  Cooling has to be set well low to have all inlet temperatures in safe operating range Courtesy: Intel Labs Terms: heat recirculation, hot spots, inlet temperatures, outlet temperatures, redline temperature, peak temperature

Thermal Management solutions ICCase/chassisroom firmware O/S Application (middleware) Dynamic voltage scaling Dynamic frequency scaling Circuitry redundancy Fan speed scaling CPU Load balancing Thermal-aware JVM Data center job scheduling software dimension physical dimension

Part B Reducing Heat Recirculation (at room level)

Reducing heat recirculation (1) ► Heat Recirculation is the only reason for increase inlet temperatures  Without recirculation, the inlet temperatures would be equal to supplied air temp. ► The peak inlet temperature defines the CRAC operational temperature Inlet temperature distribution without Cooling 25  C Inlet temperature distribution with Cooling

Reducing heat recirculation (2) ► First things first  Find the causes of it  Find ways to predict it ► What is causing it 1.The air flow from the CRAC is not adequate to feed all inlets 2.Imperfect layout ► Usually 1. and 2. are not adjustable once the equipment is bought and in place  Find other ways to reduce it

Reducing heat recirculation (3) ► Other ways to reduce it  Find who is contributing the most heat recirculation  Mitigate the heat recirculation by throttling activity at main contributors of recirculation (contributor = equipment unit that is generating heat) ( throttling activity = change the jobs or the execution of them) ► How to know how much heat each equipment contributes?  But: how to know how much heat each equipment generates? (i.e. power profile)

Reducing heat recirculation (general plan of action) Assess the effect of a task on the equipment (cpu, memory, I/O) Assess the heat generated by the equipment from the task Assess how much of that heat is recirculated Assess the inlet temperatures given the heat recirculation ► If we had a mechanism like this  we could predict the effects of a running (or potentially running) job and  decide about its fate according to its effects Terms: task profile, power profile, thermal map prediction

Task profiling (1) ► Task profiling  Assess how much CPU utilization, memory activity, disk I/O, network traffic etc, the application generates ► Task profiling can be done  Offline, by code analyzers, or  Online, by test runs ► Dirty (and convenient) fact about HPC (high- performance computing):  Incoming jobs have highly predictable profile

Power profiling ► Power Profiling  Assess how much heat is generated from each component (i.e. CPU, memory, disk I/O, network etc)  Assess how much power is consumed from each component (i.e. CPU, memory, disk I/O, network etc) ► Power profiling is usually preformed offline

Example results of power profiling ► Power Consumption is mainly affected by the CPU utilization ► Power consumption is linear to the CPU utilization P = a U + b

A simple thermal model From A/C To A/C Power consumed From other machines to other machines

Effect of CPU utilization to outlet temperature ► Task profiling  Assess how much CPU utilization the application generates ► Outlet Temperature is a function of utilization plus input T outlet = f (U) + T inlet

Assessing recirculation for the given computational tasks ► Assessing Recirculation  Obtaining the thermal map for the given task assignment ► Compare with offline measurements ► But we don’t need to know the temperature at every point in the air  Only at the inlets and the outlets Courtesy: Intel Labs N1N1 N2N2 N3N3 N4N4 N5N5

Recirculation coefficients ► Purpose  Knowing air temperature at any 3-D point  Adjust CRAC operation  Adjust computer operation ► Obtaining by  Strategically placed sensors  On-board sensors ► Predicting by  Thorough testing  CFD simulations

How scheduling impacts cooling cost Inlet temperature distribution without Cooling 25  C Inlet temperature distribution with Cooling Scheduling 1 Scheduling 2 Different demands for cooling capacity

Part C Integrated Thermal-aware Management

Functional model of scheduling ► Tasks arrive at the data center ► Scheduler figures out the best placement  Placement that has minimal impact on peak inlet temperatures ► Assigns task accordingly Scheduler Task Tasks

Architectural View Scheduler (SLURM)

Part D Potential Term Projects

Scheduling Algorithms ► Current work assumed incoming jobs that  Are Identical (same profile)  Are long-running ► Enhance scheduling algorithm to work with  Heterogeneous data center  Asynchronous job arrival  Jobs have non-identical execution time

Scheduler Programming ► Enhance existing job management software (Moab, SLURM etc) to work with  Gathering thermal data  Assigning jobs according to policy