Towards reducing total energy consumption while constraining core temperatures Osman Sarood and Laxmikant Kale Parallel Programming Lab (PPL) University.

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils.
A Framework for Dynamic Energy Efficiency and Temperature Management (DEETM) Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas University of Illinois.
Power Management in Cloud Computing using Green Algorithm -Kushal Mehta COP 6087 University of Central Florida.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
By Praveen Venkataramani Vishwani D. Agrawal TEST PROGRAMMING FOR POWER CONSTRAINED DEVICES 5/9/201322ND IEEE NORTH ATLANTIC TEST WORKSHOP 1.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
Dynamic Thermal Management in Charm++ Osman Sarood, Phil Miller, Esteban Meneses, Ehsan Totoni, Sanjay Kale Parallel Programming Lab (PPL) 1.
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Alex Shye, Berkin Ozisikyilmaz, Arindam Mallik, Gokhan Memik, Peter A. Dinda, Robert P. Dick, and Alok N. Choudhary Northwestern University, EECS International.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.
There are no comprehensive, holistic studies of performance, power and thermals on distributed scientific systems and workloads Without innovation future.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Cloud Computing Energy efficient cloud computing Keke Chen.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.
System-level, Unified In-band and Out-of-band Dynamic Thermal Control Dong LiVirginia Tech Rong GeMarquette University Kirk CameronVirginia Tech.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Kyushu University Koji Inoue ICECS'061 Supporting A Dynamic Program Signature: An Intrusion Detection Framework for Microprocessors Koji Inoue Department.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Towards Preemption.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Power-Aware Parallel Job Scheduling
Thermal Aware Data Management in Cloud based Data Centers Ling Liu College of Computing Georgia Institute of Technology NSF SEEDM workshop, May 2-3, 2011.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.
Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Thermal Management in Datacenters Ayan Banerjee. Thermal Management using task placement Tasks: Requires a certain number of servers (cores) for a specified.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Measuring Performance II and Logic Design
Energy Aware Network Operations
GCSE Computing - The CPU
Overview Motivation (Kevin) Thermal issues (Kevin)
Lecture 2: Performance Today’s topics:
Reducing the Number of Preemptions in Real-Time Systems Scheduling by CPU Frequency Scaling Abhilash Thekkilakattil, Anju S Pillai, Radu Dobrin, Sasikumar.
Thermal-aware Task Placement in Data Centers (part 4)
Temperature and Power Management
Green cloud computing 2 Cs 595 Lecture 15.
Morgan Kaufmann Publishers
Hui Chen, Shinan Wang and Weisong Shi Wayne State University
Computer Architecture
ElasticTree: Saving Energy in Data Center Networks
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
Cloud Resource Scheduling for Online and Batch Applications
Presentation transcript:

Towards reducing total energy consumption while constraining core temperatures Osman Sarood and Laxmikant Kale Parallel Programming Lab (PPL) University of Illinois Urbana Champaign

Why Energy? Data centers consume 2% of US Energy Budget in 2006 Costed $4.1 billion consumed 59 billion KWh The 3-year cost of powering and cooling servers exceeds the cost of purchasing the server hardware 2

Cooling Energy Cooling accounts for 40-50% of total cost Most data centers face hot spots responsible for lower temperatures in machine rooms Data center managers can save*: – 4% (7%) for every degree F (C) – 50% going from 68F(20C )to 80F(26.6C) Earlier work: – Reduce up to 63% in cooling energy with 11% penalty in execution time – Constraining core temperatures below 44C *according to Mark Monroe of Sun Microsystem 3

Machine Energy Accounts for 50%-60% of total cost Earlier work: limited machine energy savings Is it possible to reduce execution time penalty and machine energy while constraining core temperatures? 4

Core Temperatures 5 All results are on a quad core machine having Sandy Bridge Core i7-2600

Frequency, Time and Power for NPB-FT Reduce core power/temperature by reducing frequency with little impact on time

Temperature Control - Naïve Scheme Monitor temperature periodically – Above threshold: decrease frequency – Below threshold: increase frequency Use DVFS to change processor voltage/frequency at runtime

Naive Scheme – NPB-FT Energy savings but timing penalty high!

Can we do something to reduce the execution time penalty? 9

Execution Blocks Divide each iteration into Execution blocks (EBs) – different sections based on sensitivity to frequency – Manually done using HW performance counters Profile each EB for different frequency levels – Wall clock time (system clock) – Core power consumption (fast on- chip MSRs) 10 EB 1 EB 2 EB 3 Iteration

Execution Blocks (EBs) (NPB-IS) EB1 much more sensitive to frequency with the same power as EB2 EB2 wastes a lot of energy while running at max frequency!

EBTuner Profile each EB for all frequency values – Can be completed in milliseconds using energy MSRs of Sandy Bridge Temperature > Threshold – EB that results in minimum timing penalty – Change its frequency down one notch Temperature < Threshold – EB that results in maximum time reduction – Change its frequency up one notch

Problem formulation 13 Minimize: subject to: Heuristic for best EB ( ): Difference in power after reducing frequency Difference in time after reducing frequency Maximize ratio for temperature > T_max

EBTuner: Framework EB 1 EB 2 Iteration Profiler Frequency Control EBTuner Very fast using MSR energy counters Runs each EB at the specified frequency Determines if the temperature crosses threshold Specifies new frequency for each EB Temp Check

Evaluation On a single quad core machine Metrics – Ability to constrain core temperature – Timing penalty – Reduction in energy consumption 15

Temperature Control 16 Using EBTuner - Temperature Threshold 54C

Timing penalty Increase in execution time compared to runs with no temperature control and all cores working at maximum possible frequency

Steady State Frequencies 18 CounterNPB-LUNPB-FT MFLOPS L1-L2 (MB/sec) L2-L3 (MB/sec) Penalty (%) 66 ~ 400MHz Using EBTuner - Temperature Threshold 54C

Variance in sensitivity to frequency for different parts of an application Correlation

EB Tuner Vs. Naïve (NPB-IS) EBTuner increases CPU utilization

Reduction in machine energy

Energy Time tradeoff 22 : 0.19 : 0.55 : 0.53

Summary and Future Work Our scheme consistently better than Naïve version in terms of reducing timing penalty and machine energy consumption EBTuner was able to reduce machine energy by 17% with <1% timing penalty while constraining core temperature below 60C Combine this work with earlier work that saves cooling energy consumption 23

Thank You Questions? 24