Cooling-Aware and Thermal-Aware Workload Placement for Green HPC Data Centers Sandeep K. S. Gupta (co-authors: Ayan Banerjee, Tridib Mukherjee, George.

Slides:

Advertisements

Similar presentations

Adaptive QoS Control Based on Benefit Optimization for Video Servers Providing Differential Services Ing-Ray Chen, Sheng-Yun Li, I-Ling Yen Presented by.

Advertisements

Hadi Goudarzi and Massoud Pedram

Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.

A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Utility-Function-Driven Energy- Efficient Cooling in Data Centers Authors: Rajarshi Das, Jeffrey Kephart, Jonathan Lenchner, Hendrik Hamamn IBM Thomas.

*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.

Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.

System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.

Intelligent Placement of Datacenters for Internet Services Íñigo Goiri, Kien Le, Jordi Guitart, Jordi Torres, and Ricardo Bianchini 1.

CS : Creating the Grid OS—A Computer Science Approach to Energy Problems David E. Culler, Randy H. Katz University of California, Berkeley August.

Energy Aware Network Operations Authors: Priya Mahadevan, Puneet Sharma, Sujata Banerjee, Parthasarathy Ranganathan HP Labs IEEE Global Internet Symposium.

CoolAir Temperature- and Variation-Aware Management for Free-Cooled Datacenters Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini 1.

The Fully Networked Car Geneva, 3-4 March 2010 Enabling Electric Vehicles Using the Smart Grid George Arnold National Coordinator for Smart Grid Interoperability.

Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.

Green IT and Data Centers Darshan R. Kapadia Gregor von Laszewski 1.

VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.

XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.

Thermodynamic Feasibility 1 Anna Haywood, Jon Sherbeck, Patrick Phelan, Georgios Varsamopoulos, Sandeep K. S. Gupta.

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

Thermal Aware Server Provisioning (TASP) and Workload Distribution (TAWD) for Internet Data Centers (IDCs) Zahra Abbasi, Georgios Varsamopoulos and Sandeep.

Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.

Energy Usage in Cloud Part2 Salih Safa BACANLI. Cooling Virtualization Energy Proportional System Conclusion.

Introduction to Mobile Computing -CSE 535 Fall 2007 Sandeep K. S. Gupta School of Computing and Informatics Arizona State University.

RECON: A TOOL TO RECOMMEND DYNAMIC SERVER CONSOLIDATION IN MULTI-CLUSTER DATACENTERS Anindya Neogi IEEE Network Operations and Management Symposium, 2008.

Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.

Summer Report Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

A Comparison of Layering and Stream Replication Video Multicast Schemes Taehyun Kim and Mostafa H. Ammar Networking and Telecommunications Group Georgia.

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

Optimal Selection of Power Saving Classes in IEEE e Lei Kong, Danny H.K. Tsang Department of Electronic and Computer Engineering Hong Kong University.

Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.

Copyright © 2011, Performance Evaluation of a Green Scheduling Algorithm for Energy Savings in Cloud Computing Truong Vinh Truong Duy; Sato,

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab ( Department.

Thermal Aware Data Management in Cloud based Data Centers Ling Liu College of Computing Georgia Institute of Technology NSF SEEDM workshop, May 2-3, 2011.

1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.

XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.

Thermal-aware Task Placement in Data Centers Qinghui Tang Sandeep K S Gupta Georgios Varsamopoulos IMPACT Lab Arizona State University.

Introduction to Mobile Computing -CSE 535 Fall 2010 Sandeep K. S. Gupta School of Computing, Informatics and Decision Systems Engineering Arizona State.

#watitis2015 TOWARD A GREENER HORIZON: PROPOSED ENERGY SAVING CHANGES TO MFCF DATA CENTERS Naji Alamrony

Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Thermal Management in Datacenters Ayan Banerjee. Thermal Management using task placement Tasks: Requires a certain number of servers (cores) for a specified.

1 1 Thermal-Aware Scheduling in Environmentally Coupled Cyber-Physical Distributed Systems Qinghui Tang Committee Dr. Sandeep Gupta Dr. Martin Reisslein.

Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,

Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.

Adaptable Approach to Estimating Thermal Effects in a Data Center Environment Corby Ziesman IMPACT Lab Arizona State University.

1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,

DENS: Data Center Energy-Efficient Network-Aware Scheduling

Seminar Announcement December 24, Saturday, 15:00-17:00, Room: A302, WNLO Title: Quality-of-Experience (QoE) and Power Efficiency Tradeoff for Fog Computing.

Energy Aware Network Operations

Thermal-aware Task Placement in Data Centers (part 4)

Forecasting with Cyber-physical Interactions in Data Centers (part 3)

Jacob R. Lorch Microsoft Research

Measurement-based Design

Green cloud computing 2 Cs 595 Lecture 15.

Georgios Varsamopoulos, Zahra Abbasi, and Sandeep Gupta

Thermal-aware Task Placement in Data Centers

System Control based Renewable Energy Resources in Smart Grid Consumer

ElasticTree Michael Fruchtman.

Adaptive Cloud Computing Based Services for Mobile Users

Automated Cost-Aware Data Center Management (part 4)

Towards Green Aware Computing at Indiana University

ElasticTree: Saving Energy in Data Center Networks

Taehyun Kim and Mostafa H. Ammar

Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,

Gustavo Rau de Almeida Callou

Narander Kumar and Shalini Agarwal

Presentation transcript:

Cooling-Aware and Thermal-Aware Workload Placement for Green HPC Data Centers Sandeep K. S. Gupta (co-authors: Ayan Banerjee, Tridib Mukherjee, George Varsamopoulos) School of Computing, Informatics and Decision Systems Engg. Arizona State University

Sandeep Gupta, IEEE Senior Member Heads @ School of Computing & Informatics Use-inspired, Human-centric research in distributed cyber-physical systems Pervasive Health Monitoring Criticality Aware-Systems Thermal Management for Data Centers Intelligent Container ID Assurance Mobile Ad-hoc Networks BEST PAPER AWARD: Security Solutions for Pervasive HealthCare – ICISIP 2006. BOOK: Fundamentals of Mobile and Pervasive Computing, Publisher: McGraw-Hill Dec. 2004 Area Editor TCP Chair TCP Co-Chair: GreenCom’07 Also for IEEE TPDS WINET http://www.bodynets.org http://impact.asu.edu/greencom Email: Sandeep.Gupta@asu.edu; IMPACT Lab URL: http://impact.asu.edu;

IMPACT: Current Research Thrusts Challenges – Traffic congestion, Energy Scarcity, Climate Change, Medical Cost … Smart Infrastructure – distributed CPS (Cyber-Physical Embedded System (of systems)) Criticality (Context)-awareness to enhance dependability (security, safety, reliability) of CPS systems Unifying Framework to enhance our understanding in developing (energy) efficient, sustainable, assured CPS Model-based Design and Development to harness complexity (simultaneously ensure safety, security, efficiency etc.) as well cost. Enhanced Usability and Interoperability to reduce manageability overhead and enhance

IMPACT Lab Members and Collaborators Faculty: Sandeep K. S. Gupta (Professor) Postdoc: Georgios Varsamopoulos Tridib Mukherjee Students: Zahra Abbasi (CSE Phd) Ayan Banerjee (CSE Phd) Michael Jonas (CSE Phd) Sailesh Kandula (CSE MS) Su Kim (CSE Phd) Collaborator From: Microsoft Embedded Innovation Center, Aachen FDA University of Florence Intel Corp. Texas Instruments U. Penn

Introduction-Motivation Projected Electricity Use of data centers\, 2007 to 2011 The magnitude of data center energy consumption Internet users’ growth in the world from 2000-2009: 400% [http://www.internetworldstats.com/stats.htm] Data center energy consumption grew 20-30% annually in 2006 and 2007 [ Uptime Institute research] Addressing energy saving for internet/HPC Data Centers Thermal and Cooling awareness to improve energy consumption Future energy use projection - current efficiency trend Historical energy use [Source: EPA] Typical data center energy end use [Source: Department of energy]

The BlueTool project http://impact.asu.edu/BlueTool/

Overview of problem and results Can we save energy by coordinating job scheduling and cooling? How much? Results and Contributions SP-EIR: an energy inefficiency metric of spatial scheduling higher SP-EIR → worse energy performance of a schedule lower heat recirculation → lower SP-EIR Higher thermostat setting → lower SP-EIR (because of CoP) HTS a spatial scheduling algorithm that heuristically maximizes the thermostat setting Evaluation of the HTS combined with FCFS or EDF: EDF-HTS saves 15% over EDF-LRH

Outline of talk Background System model Problem definition thermal awarenes and cooling awareness System model Physical assumptions, job model, cooling model Heat recirculation and thermostat Dependency between job and cooling Problem definition How to schedule jobs so as to minimize the need for low cooling temperature SP-IER and HTS Simulation-based comparison of various combinations between FCFS and EDF with LRH and HTS. On-going work Energy-proportional computing and its savings

Job scheduling and energy awareness Most energy-aware approaches are power-aware (e.g. DVFS schemes) Thermal awareness: to know heat recirculation Cooling awareness to know cooling performance Why cooling awareness? Cooling, along with PDUs, responsible for PUE>1 Optimizing for cooling can save additional energy About 15% for the simulated data center job scheduling schemes energy-oblivious energy-aware or power-aware performance-oriented power-aware (thermally oblivious) thermal-aware (aware of heat effects) cooling-oblivious cooling-aware (aware of the cooling model)

System model (1) Cold-aisle, hot-aisle configuration Tred: red-line Each job comes with a deadline performance heterogeneity fast and slow machines CRAC (cooling equipment) Two cooling power modes low (preferred for energy eff.) High Two (programmable) set points Low->high High->low Mode-switching delay tsw Coefficient of performance depends on the current mode Tsup = Tsen-Tdiffmode Epoch: the interval between two consecutive triggers is called an Computing equipment: linear power consumption model P = a U + b Tthresholds (set points) supply air temperature (Tsup) Input air temperature (Tsen) Ppeak system power (P) CPU utilization (U) Pidle=b low high Challenge is to set low->high set point as high as possible.

System model (2) Models assumed Tin(t) = FTsup(t)+DP(t)≤Tred Equip. 1 Equip. 2 Equip. 3 CRAC f1 f2 f3 d13 d31 d12 Tin≤Tred d21 d23 d32 Models assumed Cooling distribution matrix F Diagonal matrix: fii: portion of cool air going to equipment i Heat recirculation matrix D dij: portion of heat going from equipment i to equipment j Tin(t) = FTsup(t)+DP(t)≤Tred  Tsup(t) ≤ F-1 [ Tred -D(aU(t)+ω) ] Tsup(t) has to be dynamically adjusted in accordance to U(t) to match the Tred constraint Highest thermostat setting, maxTsen, can be derived as: maxTsen = Tsup + Tdiffm - [temperature increase due to mode-switching delay] Selecting/scheduling a different set of servers (i.e. changing a and ω) can change the requirements on Tsup.

Problem definition and HTS Given a data center and its running jobs, for a given new job, find: a spatial schedule (i.e. server assignment) for that job, and thermostat settings for the CRAC, that minimize the energy consumption while meeting the deadlines. Algorithm HTS (Highest Thermostat Setting) Spatial scheduling only algorithm (i.e. server assignment) Find a spatial schedule that maximizes (lw->hi) thermostat setting Assign ranking grade to each server Rank(server j) = Tred – [temperature rise to j caused by all servers at full blast] Assign the job to the available servers with the highest rank values

SP-EIR: an energy inefficiency metric SP-EIR(alg, job set J) = Challenging to compute max SP-EIR over all possible jobs sets Akin to competitive ratio in performance domain One (naïve) upper bound to SP-EIR: Ealg(100% utilization)/Eopt(idle) Note the naïve upper bound is independent of the algorithm (for 100% utilization, there is only one possible schedule) It is solely dependent on datacenter thermal and cooling behavior For simulated data center, upper bound is 1.69 Here we “measure” SP-EIR using simulation – Leave theoretical analysis as a challenge for theoreticians

Simulation setup F and D derived from a CFD model of the ASU HPCI data center 9.6 m  8.4 m  3.6 m 30 Dell 1955 chassis, 20 Dell 1855 chassis P derived from power measurements of the computing equipment Some variations of the spatial scheduling algorithms: cooling oblivious (e.g. LRH): statically use the thermostat setting for 100% of data center utilization m (e.g. LRHm): statically use the maximum thermostat setting for the given job trace d (e.g. LRHd): dynamically adjust the thermostat setting to match Tred. 5% overall data center utilization 40% overall data center utilization 80% overall data center utilization

Measuring thermal efficiency: LRH Thermal efficiency: least contribution on the heat recirculation LRH: A metric of thermal efficiency of a server [Tang et al. T-PDS ’08] Based on a two-layer rank calculation Rank the servers as recipients of heat recirculation Rank the servers as contributors of heat recirculation LRH weight of S = Σrecipients recipient value  amount of heat from S to recipient The direction and amount of heat recirculation A Example: LRH ranking of servers A and B B LRH rank of Server B is worse than A

SP-EIR as measured Reference algorithm used as optimal: Observations minimize the product DP Observations HTS alwas has the lowest SP-EIR in the simulations Enhancing any algorithm with cooling-awareness reduces the SP-EIR. MTDP has lower SP-EIR than LRH although it is thermally oblivious Power-aware workload consolidation (MTDP) has higher saving effect than thermal aware job scheduling (LRH)\ Enhancing LRH with cooling awareness can bring the SP-EIR lower than MTDP

EDF-HTS: results on energy savings with respect to other schemes Idle on LRH HTS Data center utilization cooling oblivious m d (inherently dynamic) FCFS-backfill 5% 12.41 10.65 40% 5.70 3.27 80% 3.30 0.85 EDF 3.70 3.32 0.87 0.00 (ref point) 1.85 1.49 0.83 1.40 1.31 0.73 Idle off LRH MTDP HTS Data center utilization cooling oblivious m d (inherently dynamic) FCFS-backfill 5% 23.78 21.30 40% 21.50 17.22 80% 15.80 10.81 EDF 12.53 12.30 5.17 8.17 0.00 (ref point) 16.00 15.56 10.84 8.73 5.73 9.03 8.86 0.66 3.47 0.47

Conclusions Cooling awareness SP-EIR Advantages Disadvantages Additional energy savings with other thermal-aware (but cooling-oblivious) schemes Savings up to 23% in the simulations Disadvantages Requires good knoweldge of the heat recirculation pattern and the performance of the cooling units Holistic management approaches that can configure the cooling unit by network can be cooling-aware SP-EIR SP-EIR depends on the given algorithm, job and data center. Upper bound for any algorithm depends on the thermal and power characteristics of the data center.

Implications of thermal awareness First direction Introduce thermal awareness beyond just scheduling, in data center management: Thermal-aware power management Thermal-aware cooling management Cooling-awareness enables the above “Model-driven Co-ordinated Management of Data Centers,” ComNet, Special issue on Managing Emerging Computing Environments, under review Second direction Investigate technological trends on the savings of management E.g. “Trends and Effects of energy proportionality on server provisioning in internet data centers,” HiPC 2010

Energy proportionality metrics Energy-proportional computing: Consume power in proportion to utilization (purple line) Metrics IPR: idle-to-peak power ratio Pidle / Ppeak LDR: linear deviation ratio maxu (P(u)-Linear(u))/Linear(u) (the ratio of the maximum offset from the straight green line over the the value of the straight line at the maximum point)

Historical trends of energy proportionality Source data from SPECpower_ssj2008 published results http://www.spec.org/power_ssj2008/results/

Discussion on diverging LDR optimal performance-to-power ratio Negative LDR Ideal for stand-alone systems that are under-utilized Positive LDR Ideal for use in consolidation Near-zero LDR energy efficiency is independent to the utilization level P U minimal energy increase for considerable performance increase P U performance-to-power ratio almost independent of workload P U

Conclusions Energy proportionality will have different effects on the energy savings, depending upon the shape of the power curve IPR → 0 energy savings of power management (server provisioning) are expected to be minimal LDR >> 0 maximum energy efficiency may not be at the 100% utilization Systems can be optimally efficient at lower utilizations