Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.

Slides:



Advertisements
Similar presentations
International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Advertisements

Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Introduction and Background  Power: A Critical Dimension for Embedded Systems  Dynamic power dominates; static /leakage power increases faster  Common.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,
Improving Transmission Asset Utilization through Advanced Mathematics and Computing 1 Henry Huang, Ruisheng Diao, Shuangshuang Jin, Yuri Makarov Pacific.
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Promoting Energy Efficiency In Buildings in Developing countries.
Energy Model for Multiprocess Applications Texas Tech University.
Energy Audit- a small introduction A presentation by Pune Power Development Pvt. Ltd.
Slide 1 U.Va. Department of Computer Science LAVA Architecture-Level Power Modeling N. Kim, T. Austin, T. Mudge, and D. Grunwald. “Challenges for Architectural.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
National Renewable Energy Laboratory Innovation for Our Energy Future * NREL July 5, 2011 Tradeoffs and Synergies between CSP and PV at High Grid Penetration.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Spherical Code Construction Vojin Šenk Ivan Stanojević Mladen Kovačević University of Novi.
Computational Sprinting on a Hardware/Software Testbed Arun Raghavan *, Laurel Emurian *, Lei Shao #, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
CSE 691: Energy-Efficient Computing Lecture 7 SMARTS: custom-made systems Anshul Gandhi 1307, CS building
Towards reducing total energy consumption while constraining core temperatures Osman Sarood and Laxmikant Kale Parallel Programming Lab (PPL) University.
Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.
Energy Savings with DVFS Reduction in CPU power Extra system power.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
80-Tile Teraflop Network-On- Chip 1. Contents Overview of the chip Architecture ▫Computational Core ▫Mesh Network Router ▫Power save features Performance.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Importance of Single-core in Multicore.
SIMPLE: Stable Increased Throughput Multi-hop Link Efficient Protocol For WBANs Qaisar Nadeem Department of Electrical Engineering Comsats Institute of.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Embedded System Lab. 김해천 The TURBO Diaries: Application-controlled Frequency Scaling Explained.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Issue Logic and Power/Performance Tradeoffs Edwin Olson Andrew Menard December 5, 2000.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Computational Sprinting Arun Raghavan *, Yixin Luo +, Anuj Chandawalla +, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K. Martin.
University of Michigan Electrical Engineering and Computer Science 1 Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.
CS203 – Advanced Computer Architecture
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.
Overview Motivation (Kevin) Thermal issues (Kevin)
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
CS203 – Advanced Computer Architecture
Seth Pugsley, Jeffrey Jestes,
Adaptive Cache Partitioning on a Composite Core
Green cloud computing 2 Cs 595 Lecture 15.
Intel’s Core i7 Processor
Lecture 2: Performance Today’s topics: Technology wrap-up
Adaptive Single-Chip Multiprocessing
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Goldratt Research Labs
The University of Adelaide, School of Computer Science
Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan
The University of Adelaide, School of Computer Science
Utsunomiya University
Presentation transcript:

Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K. Martin * University of Pennsylvania, Computer and Information Science * University of Michigan, Electrical Eng. and Computer Science + University of Michigan, Mechanical Engineering #

Overview 2 Computational Sprinting What is Computational Sprinting? Summary of feasibility study [HPCA’12] Simulation results: 10x responsiveness improvements in short bursts Same dynamic energy consumption Preliminary results with sprinting on prototype-proxy Characterize real thermal/energy/performance behavior Sprinting can improve energy efficiency due to race to halt

Computational Sprinting and Dark Silicon Problem: Dark Silicon/Utilization Wall Cannot use all transistors on chip all the time Cooling constraints limit mobile systems One approach: Use few transistors for long durations Specialized functional units [Accelerators, GreenDroid] Targeted towards sustained compute, e.g. media playback Our approach: Use many transistors for short durations Computational Sprinting by activating many “dark cores” Unsustainable power for short, intense bursts of computation Responsiveness for bursty/interactive applications 3 Computational Sprinting

Sprinting v/s Sustainable Operation 4 Computational Sprinting power time T max temperature time

Why Not Use All Cores? 5 Computational Sprinting T max temperature power time

Why Not Use All Cores? 6 Computational Sprinting temperature T max power N cores time Effect of thermal capacitance

Parallel Computational Sprinting 7 Computational Sprinting T max temperature power time

Why Not Use All Cores? 8 Computational Sprinting temperature T max power N cores time

Parallel Computational Sprinting 9 Computational Sprinting temperature T max power N cores 1 core time

Parallel Computational Sprinting 10 Computational Sprinting temperature T max power N cores 1 core time

Parallel Computational Sprinting 11 Computational Sprinting power temperature T max N cores 1 core time State of the art: Turbo Boost 2.0 exceeds sustainable power with DVFS (~25% for 25s) Our goal: 10x, ~1s

Sprinting Challenges and Opportunities Thermal challenges How to extend sprint duration and intensity? Latent heat from phase change material close to the die Electrical challenges How to supply peak currents? Ultracapacitor/battery hybrid How to ensure power stability? Ramped activation (~100μs) Architectural challenges How to control sprints? Thermal resource management How do applications benefit from sprinting? 10x responsiveness improvements; small energy overhead 12 Computational Sprinting Die PCM

Moving Beyond Simulation: Can we make a real system sprint? 13

14 Characterize real system energy/performance How might sprinting behave on real system?

Multiple Cores: Energy and Performance 15 Computational Sprinting 1 core2 cores4 cores normalized energy 1 core2 cores4 cores normalized speedup 1 core2 cores4 cores power Power increases with core count But < 2x for 4 cores Why? background power Energy consumption improves due to early completion Race to halt 4x 2x

frequency DVFS: Energy and Performance 16 Computational Sprinting normalized speedup power Power increases by ~3x for 2x speedup frequency normalized energy frequency FrequencyVoltage 1.6 GHz0.95V 3.2 GHz1.25V Min frequency is not always min energy 2x 3x

Energy/Performance/Power Tradeoffs 17 Computational Sprinting normalized speedup normalized energy normalized speedup power 4 cores, 1.6 GHz 4 cores, 3.2 GHz 1 core, 1.6 GHz 1 core, 3.2 GHz 1 core, 1.6 GHz 1 core, 3.2 GHz 4 cores, 1.6 GHz 4 cores, 3.2 GHz What if this is the maximum sustainable power? Sprinting enables higher responsiveness and greater energy efficiency Max speedup Min energy Min power

Emulating Sprinting with Limited Energy Budget Emulate effect of being TDP constrained Not currently physically limiting cooling constraints Estimate thermal capacity for sprinting based on energy Sprint operation: Execute with all cores at maximum frequency Monitor energy consumption via MSR Terminate sprinting when energy capacity is exceeded Migrate threads to single core Shutdown additional cores Lower frequency to minimum 18 Computational Sprinting

Effect of Thermal Capacity 19 Computational Sprinting normalized energy normalized speedup Thermal Capacity (J) Speedup sensitive to amount of computation within sprint Longer sprints enable greater energy saving Energy overhead from extremely short sprints Caveat: Mobile system background power characteristics likely differ

Conclusions and Work in Progress 20 Computational Sprinting Sprinting utilizes dark silicon for energy improvement Race to halt Lower system idle power  more energy saved Work in Progress: constrain cooling system Approximate thermal characteristics of mobile systems Correlate temperature with measured energy Use presented results to guide sprint implementation

21 Computational Sprinting