Thermal-aware Task Placement in Data Centers Qinghui Tang Sandeep K S Gupta Georgios Varsamopoulos IMPACT Lab Arizona State University.

Slides:



Advertisements
Similar presentations
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Advertisements

Hadi Goudarzi and Massoud Pedram
Transportation Problem (TP) and Assignment Problem (AP)
1 * Other names and brands may be claimed as the property of others. Copyright © 2010, Intel Corporation. Data Center Efficiency with Optimized Cooling.
Efficient Resource Management for Cloud Computing Environments Andrew J. Younge, Gregor von Laszewski, Lizhe Wang, Sonia Lopez-Alarcon, Warren Carithers.
MENG 547 LECTURE 3 By Dr. O Phillips Agboola. C OMMERCIAL & INDUSTRIAL BUILDING ENERGY AUDIT Why do we audit Commercial/Industrial buildings Important.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
Soft Real-Time Semi-Partitioned Scheduling with Restricted Migrations on Uniform Heterogeneous Multiprocessors Kecheng Yang James H. Anderson Dept. of.
Utility-Function-Driven Energy- Efficient Cooling in Data Centers Authors: Rajarshi Das, Jeffrey Kephart, Jonathan Lenchner, Hendrik Hamamn IBM Thomas.
SLA-aware Virtual Resource Management for Cloud Infrastructures
Application Models for utility computing Ulrich (Uli) Homann Chief Architect Microsoft Enterprise Services.
Keeping Hot Chips Cool Thermal Management for Green Computing Yang Ge Professor Qinru Qiu.
Effect of Rack Server Population on Temperatures in Data Centers CEETHERM Data Center Laboratory G.W. Woodruff School of Mechanical Engineering Georgia.
Scheduling a Large DataCenter Cliff Stein Columbia University Google Research June, 2009 Monika Henzinger, Ana Radovanovic Google Research.
Optimal Fan Speed Control for Thermal Management of Servers UMass-Amherst Green Computing Seminar September 21 st, 2009.
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Green IT and Data Centers Darshan R. Kapadia Gregor von Laszewski 1.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Thermodynamic Feasibility 1 Anna Haywood, Jon Sherbeck, Patrick Phelan, Georgios Varsamopoulos, Sandeep K. S. Gupta.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.
Network Aware Resource Allocation in Distributed Clouds.
Cloud Computing Energy efficient cloud computing Keke Chen.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Energy Usage in Cloud Part2 Salih Safa BACANLI. Cooling Virtualization Energy Proportional System Conclusion.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
Summer Report Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY
Scheduling policies for real- time embedded systems.
High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Joint Power Optimization Through VM Placement and Flow Scheduling in Data Centers DAWEI LI, JIE WU (TEMPLE UNIVERISTY) ZHIYONG LIU, AND FA ZHANG (CHINESE.
A Node and Load Allocation Algorithm for Resilient CPSs under Energy-Exhaustion Attack Tam Chantem and Ryan M. Gerdes Electrical and Computer Engineering.
Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab ( Department.
TSV-Constrained Micro- Channel Infrastructure Design for Cooling Stacked 3D-ICs Bing Shi and Ankur Srivastava, University of Maryland, College Park, MD,
An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,
Thermal Aware Data Management in Cloud based Data Centers Ling Liu College of Computing Georgia Institute of Technology NSF SEEDM workshop, May 2-3, 2011.
Green Computing Metrics: Power, Temperature, CO2, … Computing system: Many-cores, Clusters, Grids and Clouds Algorithm and model: task scheduling, CFD.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Static Process Scheduling
Accounting for Load Variation in Energy-Efficient Data Centers
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Thermal Management in Datacenters Ayan Banerjee. Thermal Management using task placement Tasks: Requires a certain number of servers (cores) for a specified.
1 1 Thermal-Aware Scheduling in Environmentally Coupled Cyber-Physical Distributed Systems Qinghui Tang Committee Dr. Sandeep Gupta Dr. Martin Reisslein.
Adaptable Approach to Estimating Thermal Effects in a Data Center Environment Corby Ziesman IMPACT Lab Arizona State University.
1 PCE 2.1: The Co-Relationship of Containment and CFDs Gordon Johnson Senior CFD Manager at Subzero Engineering CDCDP (Certified Data Center Design Professional)
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Ruihong Lin 1, Yuhui Deng 1,2, Liyao Yang 1 1 Department of Computer Science, Jinan University, Guangzhou, , China 2 State Key Laboratory of Computer.
OPERATING SYSTEMS CS 3502 Fall 2017
Thermal-aware Task Placement in Data Centers (part 4)
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Green cloud computing 2 Cs 595 Lecture 15.
Georgios Varsamopoulos, Zahra Abbasi, and Sandeep Gupta
Thermal-aware Task Placement in Data Centers
Cooling-Aware and Thermal-Aware Workload Placement for Green HPC Data Centers Sandeep K. S. Gupta (co-authors: Ayan Banerjee, Tridib Mukherjee, George.
System Control based Renewable Energy Resources in Smart Grid Consumer
CPU SCHEDULING.
Parallel Programming in C with MPI and OpenMP
Towards Predictable Datacenter Networks
Presentation transcript:

Thermal-aware Task Placement in Data Centers Qinghui Tang Sandeep K S Gupta Georgios Varsamopoulos IMPACT Lab Arizona State University

Growth Trends in data centers ► Power density increases  Circuit density increases by a factor of 3 every 2 years  Energy efficiency increases by a factor of 2 every 2 years  Effective power density increases by a factor of 1.5 every 2 years [Keneth Brill: The Invisible Crisis in the Data Center] ► Maintenance/TCO rising  Data Center TCO doubles every three years  By 2009, the three-year cost of electricity will exceed the purchase cost of the server  Virtualization/Consolidation is a 1-time/short term solution [Uptime Institute] ► Thermal management corresponds to an increasing portion of expenses  Thermal-aware solutions becoming prominent  Increasing need for thermal awareness

Related Work (extended domain) IC Case/chassis room firmware O/S Application (middleware) Dynamic voltage scaling Dynamic frequency scaling Circuitry redundancy Fan speed scaling CPU Load balancing Thermal-aware VM Thermal-aware data center job scheduling software dimension physical dimension

Thermal issues in dense computer rooms (i.e. Data centers, Computer Clusters, Data warehouses) ► Heat recirculation  Hot air from the equipment air outlets is fed back to the equipment air inlets ► Hot spots  Effect of Heat Recirculation  Areas in the data center with alarmingly high temperature ► Consequence  Cooling has to be set very low to have all inlet temperatures in safe operating range Courtesy: Intel Labs

Conceptual overview of thermal-aware task placement Task placement determines temperature distribution Temperature distribution determines the equipment peak air inlet temperature Peak air inlet temperature determines upper bound to CRAC temperature setting CRAC temperature setting determines it’s efficiency (Coefficient of Performance) bottom line There is a task placement that maximizes cooling efficiency. Find it! The lower the peak inlet temperature the higher the CRAC efficiency Coefficient of Performance (source: HP)

Prerequisites for thermal management ► Task profiling  CPU utilization, I/O activity etc ► Equipment power profiling  CPU consumption, disk consumption etc ► Heat recirculation modeling ► Task management technologies ► Need for a comprehensive research framework

Thermal-aware job scheduling On-line job scheduling algorithm to minimize peak air inlet temperature, thus minimizing the cost of cooling. Thermal Models To enable on-line real-time thermal-aware job scheduling ► fast (analytical, non CFD based) ► non-evasive (machine-learning) Characterization Characterize the power consumption of a given workload (CPU, memory, disk etc) on a given equipment Thermal management research framework Model the thermal impact of multicore systems Sandeep Gupta Qinghui Tang Tridib Mukherjee Michael Jonas Georgios Varsamopoulos

Task Profiling measurements at ASU HPC Data Center (one chassis)

Power Model and Profiling ► Power Consumption is mainly affected by the CPU utilization ► Power consumption is linear to the CPU utilization P = a U + b

Linear Thermal Model ► Heat Recirculation Coefficients  Analytical  Matrix-based ► Properties of model  Granularity at air inlets (discrete/simplified)  Assumes steadiness of air flow = + × inlet temperatures supplied air temperatures heat distribution power vector T in T sup DP

Benefit: fast thermal evaluation Give workload Run CFD simulation (days) Extract temperatures Give workloadCompute vector (seconds) + × T in T sup DP Yields temperatures Courtesy: Flometrics

Thermal-aware Task Placement Problem Given an incoming task, find a task partitioning and placement of subtasks to minimize the (increase of) peak inlet temperature = + × inlet temperatures supplied air temperatures heat distribution utilization vector T in T sup DU (a(a + ) bbbbbbbbbbbbbbb XInt Algorithm Approximation solution (genetic algorithm) ► Take a feasible solution and perform mutations until certain number of iterations P = a U + b

Inlet Temperature Contrasted scheduling approaches ► Uniform Outlet Profile (UOP)  Assigning tasks in a way that tries to achieve uniform outlet temperature distribution  Assigning more task to nodes with low inlet temperature (water filling process) ► Minimum computing energy  Assigning tasks in a way that keeps the number of active (power-on) chassis as few as possible  Server with coolest inlet temperature first ► Uniform Task (UT)  Assigning all chassis the same amount of tasks (power consumptions)  All nodes experience the same power consumption and temperature rise Outlet Temperature

Simulated Environment ► ► Used Flometrics Flovent ► ► Simulated a small scale data center ► ► physical dimensions 9.6m  8.4m  3.6m ► ► two rows of industry standard 42U racks arranged ► ► CRAC supply at 8 m 3 /s ► ► There are 10 racks   each rack is equipped with 5 chassis ► ► 1000 processors in data center.   232KWatts at full utilization

Performance Results ► Xint outperforms other algorithms ► Data Centers almost never run at 100%  Plenty of room for benefits!

Performance Results ► Xint outperforms other algorithms ► Data Centers almost never run at 100%  Plenty of room for benefits!

Power Vector Distribution key Xint contradicts “rule of thumb” placement at bottom

Supply Heat Index (SHI) ► Supply Heat Index  Metric developed by HP Labs  quantifies the overall heat recirculation of data center ► Xint consistently has the lowest SHI

Conclusions ► Thermal-aware task placement can significantly reduce heat recirculation  XInt performance thrives at around 50% CPU utilization ► Not much can be done at 100% utilization  Cooling savings can exceed 30% (in comparison to other schemes) ► Cost of operation reduces by 15% (if initially 1:1 ratio of computing-2-cooling)

Related Work in Progress ► Waiving simplifying assumptions  Equipment heterogeneity [INFOCOM 2008]  Stochastic task arrival ► Thermal maps thru machine learning  Automated, non-invasive, cost-effective [GreenCom 2007] ► Implementations  Thermal-aware Moab scheduler  Thermal-aware SLURM  SiCortex product thermal management

Algorithm Assumptions ► HPC model in mind  Long-running jobs (finish time is the same — infinity) ► One-time arrival (starting time is the same) ► Utilization homogeneity (same utilization throughout task’s length) ► Non preemptive/movable tasks ► Data Center equipment homogeneity  power consumption  computational capability ► Cooling is self-controlled

Thank You ► Questions? ► Comments? ► Suggestions?

Additional Slides

Functional model of scheduling ► Tasks arrive at the data center ► Scheduler figures out the best placement  Placement that has minimal impact on peak inlet temperatures ► Assigns task accordingly Scheduler Task Tasks

Architectural View Scheduler (Moab, SLURM) dispatch Machine Learning create/update provide Monitoring Processes Thermal Model report control

A simple thermal model ► Basic Idea:  We don’t need an extensive CFD model  We only need to know the effect of recirculation at specific points ► Express recirculation as “coefficients” Courtesy: Intel Labs N1N1 N2N2 N3N3 N4N4 N5N5

Recirculation coefficients: a fast thermal model ► Reduce/Simplify the “thermal map” concept to points of interest: equipment air inlets ► Can be computed from CFD models/simulations Matrix A a ij : portion of heat exhausted from node i that directly goes to node j A recirculation coefficients

Opportunities & Challenges ► Data centers don’t run at fulll unitilization  Can choose among multiple CPUs to allocate a job  Different thermal impact per CPU ► Need for fast thermal evaluation ► Temporal and spatial Heterogeneity of Data Centers  In equipment  In workload Thermal issues ► Heat recirculation  Increases as equipment density exceeds cooling capacity as planned ► Hot spots  Effect of Heat Recirculation ► Impact: Cooling has to be set low enough to have all inlet temperatures in safe operating range Data Center Thermal Management Increasing need for thermal awareness ► Power density increases  Circuit density increases by a factor of 3 every 2 years  Energy efficiency increases by a factor of 2 every 2 years  Effective power density increases by a factor of 1.5 every 2 years [Keneth Brill: The Invisible Crisis in the Data Center] ► Maintenance/TCO rising  Data Center TCO doubles every three years  By 2009, the three-year cost of electricity will exceed the purchase cost of the server  Virtualization/Consolidation is a 1-time/short term solution ► Thermal management corresponds to an increasing portion of expenses  Thermal-aware solutions becoming prominent ICCase/chassisroom firmware O/S Application (middleware) Dynamic voltage scaling Dynamic frequency scaling Circuitry redundancy Fan speed scaling CPU Load balancing Thermal-aware VM Data center job scheduling software dimension physical dimension Thermal-aware solutions at various levels A dynamic thermal- aware control platform is necessary for online thermal evaluation A dynamic thermal- aware control platform is necessary for online thermal evaluation without thermal-aware management With thermal-aware management computation cooling $1M $10M $100M year

Scheduling Impacts Cooling Setting Inlet temperature distribution without Cooling 25  C Inlet temperature distribution with Cooling Scheduling 1 Scheduling 2 Different demands for cooling capacity

Results(1) ► Recirculation Coefficients  Consistent with datacenter observations  Large values are observed along diagonal  Strong recirculation among neighboring servers, or between bottom servers and top servers