Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Energy Efficient Prefetching with Buffer Disks for Cluster File Systems 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
Resource Management in Data-Intensive Systems Bernie Acs, Magda Balazinska, John Ford, Karthik Kambatla, Alex Labrinidis, Carlos Maltzahn, Rami Melhem,
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
Dynamic Thermal Management in Charm++ Osman Sarood, Phil Miller, Esteban Meneses, Ehsan Totoni, Sanjay Kale Parallel Programming Lab (PPL) 1.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Power-Aware SoC Test Optimization through Dynamic Voltage and Frequency Scaling Vijay Sheshadri, Vishwani D. Agrawal, Prathima Agrawal Dept. of Electrical.
Scaling and Packing on a Chip Multiprocessor Vincent W. Freeh Tyler K. Bletsch Freeman L. Rawson, III Austin Research Laboratory.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Department of Computer Science Engineering SRM University
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
PARAID: The Gear-Shifting Power-Aware RAID Charles Weddle, Mathew Oldham, An-I Andy Wang – Florida State University Peter Reiher – University of California,
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters Q. Tang, T. Mukherjee, Sandeep K. S. Gupta Department of Computer.
Energy Profiling And Analysis Of The HPC Challenge Benchmarks Scalable Performance Laboratory Department of Computer Science Virginia Tech Shuaiwen Song,
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
Low-Power Wireless Sensor Networks
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Cloud Computing Energy efficient cloud computing Keke Chen.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Towards reducing total energy consumption while constraining core temperatures Osman Sarood and Laxmikant Kale Parallel Programming Lab (PPL) University.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.
Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Power-Aware Parallel Job Scheduling
Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab ( Department.
Green Computing Metrics: Power, Temperature, CO2, … Computing system: Many-cores, Clusters, Grids and Clouds Algorithm and model: task scheduling, CFD.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Accounting for Load Variation in Energy-Efficient Data Centers
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Energy Aware Network Operations
Overview Motivation (Kevin) Thermal issues (Kevin)
Green cloud computing 2 Cs 595 Lecture 15.
Packing Jobs onto Machines in Datacenters
Performance Evaluation of Adaptive MPI
Department of Computer Science University of California, Santa Barbara
Haishan Zhu, Mattan Erez
IXPUG, SC’16 Lightning Talk Kavitha Chandrasekar*, Laxmikant V. Kale
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
Department of Computer Science University of California, Santa Barbara
Parallel Implementation of Adaptive Spacetime Simulations A
Presentation transcript:

Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign *Department of Business Administration University of Illinois at Urbana-Champaign 29 th May 2015 The Eleventh Workshop on High-Performance, Power-Aware Computing (HPPAC) Hyderabad, India

Major Challenge to Achieve Exascale Exascale in 20MW! Power consumption for Top500 2

Data Center Power How is power demand of data center calculated?  Using Thermal Design Power (TDP)! However, TDP is hardly reached!! Constraining CPU/Memory power Intel Sandy Bridge  Running Average Power Limit (RAPL) library  measure and set CPU/memory power 3

Constraining CPU/Memory power Intel Sandy Bridge  Running Average Power Limit (RAPL) library  measure and set CPU/memory power Achieved using combination of P-states and Clock throttling Performance states (or P-states) corresponding to processor’s voltage and frequency e.g. P0 – 3GHz, P GHz, P2-2.33GHz, P3-2GHz Clock throttling – processor is forced to be idle 4

Constraining CPU/Memory power Solution to Data Center Power  Constrain power consumption of nodes  Overprovisioning - Use more nodes than conventional data center for same power budget Intel Sandy Bridge  Running Average Power Limit (RAPL) library  measure and set CPU/memory power 5

Application Performance with Power (20x32,10) (12x44,18) Configuration (n x p c, p m ) Performance of LULESH at different configurations p c : CPU power cap p m : Memory power cap Application performance does not improve proportionately with increase in power cap Run on larger number of nodes each capped at lower power level [CLUSTER 13]. Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems. Sarood et al. pdfpdf 6

PARM: Power Aware Resource Manager Data center capabilities  Power capping ability  Overprovisioning Maximizing Data Center Performance Under Strict Power Budget 7

` ` Job Arrives Job Ends/Termina tes Schedule Jobs (LP) Update Queue Scheduler Launch Jobs/ Shrink-Expand Ensure Power Cap Execution framework Triggers Profiler Strong Scaling Power Aware Model Job Characteristics Database 8

Description  noMM: without Malleability and Moldability  noSE: with Moldability but no Malleability  wSE: with Moldability and Malleability 1.7X improvement in throughput Lulesh, AMR, LeanMD, Jacobi and Wave2D 38-node Intel Sandy Bridge Cluster, 3000W budget PARM: Power Aware Resource Manager Performance Results [SC 14]. Maximizing Throughput of Overprovisioned Data Center Under a Strict Power Budget. Sarood et al. pdfpdf 9

Energy Consumption Analysis Although power is a critical constraint, high energy consumption can lead to excessing electricity costs – 20MW $0.07/KWh = USD 1M/month In Future, users may be charged in terms of energy units instead of core hours! Selecting right configuration is important for desirable energy-vs-time tradeoff 10

Computational Testbed 38-node Dell PowerEdge R620 cluster Each node is an Intel Xeon E Sandy Bridge server with 6 physical cores running at 2GHz, 2- way SMT with 16GB of RAM Use RAPL for power capping/measurement CPU power caps - [31, 34, 37, 40, 43, 46, 49, 52, 55]W – What happens when CPU power cap is below 30 W? TDP value of a node = 168 W 11

Applications Wave – Finite Difference Scheme over a 2D mesh Lulesh – Shock hydrodynamics application Adaptive Mesh Refinement (AMR) – Oct-tree based structured adaptive mesh refinement LeanMD – Molecular Dynamic Simulation Based based on Lennard-Jones potential 12

Impact of Power Capping on Performance and CPU frequency 13

Terminology Configuration – (n, p), where n is number of nodes, and p is CPU power cap – n ∈ [4, 8, 12, 16], – p ∈ [31, 34, 37, 40, 43, 46, 49, 52, 55]W Different operation settings – Conventional Data Center (CDC) Nodes allocated TDP power – Performance Optimized Overprovisioned Data Center (pODC) – Energy and time optimized Overprovisioned Data Center (iODC) 14

Results 15 Power Budget =1450W and AMR Only 8 nodes can be powered in CDC pODC with configuration (16, 43) gives 30% improved performance but also 22% increased energy ODC with configuration (12, 55) gives 29% improved performance with just 4% increased energy consumption

Results 16 Power Budget = 1200W and LeanMD pODC at (12,55) iODC at (12, 46) leads to 7.7% savings in energy with only 1.4% penalty in execution time

Results 17 Power Budget = 1500W and Lulesh pODC at (16, 43) iODC at (12, 52) leads to 15.3% savings in energy with only 2.8% penalty in execution time

Results 18 Power Budget = 1550W and Wave pODC at (16, 46) iODC at (12, 55) leads to 12% savings in energy with only 6% increase in execution time

Results Note: Configuration choice currently limited by profiled samples, better configurations can be obtained by performance modeling that can predict performance and energy for any configuration 19

Future Work Automate the selection of configurations for iODC using performance modeling and energy-vs-time tradeoff metrics Incorporate CPU temperature and data center cooling energy consumption into the analysis 20

Takeaways  Overprovisioned Data Centers can lead to significant performance improvements under a strict power budget  However, energy consumption can be excessive in a purely performance optimized overprovisioned data center  Intelligent selection of configuration can lead to significant energy savings with minimal impact on performance 21

Publications [PMAM 15]. Energy-efficient Computing for HPC Workloads on Heterogeneous Many-core Chips. Langer et al. pdfpdf [SC 14]. Maximizing Throughput of Overprovisioned Data Center Under a Strict Power Budget. Sarood et al. pdfpdf [TOPC 14]. Power Management of Extreme-scale Networks with On/Off Links in Runtime Systems. Ehsan et al. pdfpdf [SC 14]. Using an Adaptive Runtime System to Reconfigure the Cache Hierarchy. Ehsan et al. pdfpdf [SC 13]. A Cool Way of Improving the Reliability of HPC Machines. Sarood et al. pdfpdf [CLUSTER 13]. Optimizing Power Allocation to CPU and Memory Subsystems in Overprovisioned HPC Systems. Sarood et al. pdfpdf [CLUSTER 13]. Thermal Aware Automated Load Balancing for HPC Applications. Harshitha et al. pdfpdf [IEEE TC 12]. Cool Load Balancing for High Performance Computing Data Centers. Sarood et al. pdfpdf [SC 12]. A Cool Load Balancer for Parallel Applications. Sarood et al. pdfpdf [CLUSTER 12]. Meta-Balancer: Automated Load Balancing Invocation Based on Application Characteristics. Harshitha et al. pdfpdf 22

Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign *Department of Business Administration University of Illinois at Urbana-Champaign 29 th May 2015 The Eleventh Workshop on High-Performance, Power-Aware Computing (HPPAC) Hyderabad, India