Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay.

Slides:



Advertisements
Similar presentations
Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils.
Advertisements

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
Architectural Power Management for High Leakage Technologies Department of Electrical and Computer Engineering Auburn University, Auburn, AL /15/2011.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Lecture 7: Power.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Slide 1 U.Va. Department of Computer Science LAVA Architecture-Level Power Modeling N. Kim, T. Austin, T. Mudge, and D. Grunwald. “Challenges for Architectural.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Computer performance.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
Low Power Techniques in Processor Design
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Basics of Energy & Power Dissipation
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
CS203 – Advanced Computer Architecture
Microprocessor Design Process
Computer Structure 2015 – Advanced Topics 1 Computer Structure Power and Power Management Computer Structure Power and Power Management Lecturer: Aharon.
PipeliningPipelining Computer Architecture (Fall 2006)
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
New Rules: Scaling Performance for Extreme Scale Computing
Lectures Slides and Figures from MKP and Sudhakar Yalamanchili
William Stallings Computer Organization and Architecture 6th Edition
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
Multiprocessing.
SECTIONS 1-7 By Astha Chawla
Lectures Slides and Figures from MKP and Sudhakar Yalamanchili
Lynn Choi School of Electrical Engineering
Basics of Energy & Power Dissipation
Architecture & Organization 1
Intel Atom Architecture – Next Generation Computing
Computer Architecture and Organization
Architecture & Organization 1
BIC 10503: COMPUTER ARCHITECTURE
Adaptive Single-Chip Multiprocessing
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
Computer Evolution and Performance
COMS 361 Computer Organization
The University of Adelaide, School of Computer Science
Presentation transcript:

Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(2) Technology Scaling 30% scaling down in dimensions  doubles transistor density Power per transistor  V dd scaling  lower power Transistor delay = C gate V dd /I SAT  C gate, V dd scaling  lower delay GATE SOURCE BODY DRAIN t ox GATE SOURCE DRAIN L

(3) Moore’s Law 3 From wikipedia.org Performance scaled with number of transistors Dennard scaling*: power scaled with feature size Goal: Sustain Performance Scaling *R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp , Oct

(4) Parallelism and Power IBM Power5 Source: IBM AMD Trinity Source: forwardthinking.pcmag.com How much of the chip area is devoted to compute? Run many cores slower. Why does this reduce power?

(5) The Power Wall Power per transistor scales with frequency but also scales with V dd  Lower V dd can be compensated for with increased pipelining to keep throughput constant  Power per transistor is not same as power per area  power density is the problem!  Multiple units can be run at lower frequencies to keep throughput constant, while saving power

(6) Mukhopadhyay and Yalamanchili (2009) Based on scaling using Pentium-class cores While Moore’s Law continues, scaling phenomena have changed Power densities are increasing with each generation 6 What is the Problem?

(7) ITRS Roadmap for Logic Devices From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

Power Management Basics Lecture notes S. Yalamanchili and S. Mukhopadhyay

(9) What are my Options? 1.Better technology  Manufacturing  Better devices (FinFet)  New Devices  non-CMOS?  this is the future 2.Be more efficient – activity management  Clock gating – dynamic energy/power  Power gating – static energy/power  Power state management - both 3.Improved architecture  Simpler pipelines 4.Parallelism Not this course

(10) Activity Management Turn off clock to a block of logic Eliminate unnecessary transitions/activity Clock distribution power Turn off power to a block of logic, e.g., core No leakage Combinational Logic clk cond input clk Core 0Core 1 V dd Power gate transistor Clock GatingPower Gating

(11) Multiple Voltage Frequency Domains From E. Rotem et. Al. HotChips 2011 Cores and ring in one DVFS domain Graphics unit in another DVFS domain Cores and portion of cache can be gated off Intel Sandy Bridge Processor

(12) Processor Power States Performance States – P-states  Operate at different voltage/frequencies oRecall delay-voltage relationship  Lower voltage  lower leakage  Lower frequency  lower power (not the same as energy!)  Lower frequency  longer execution time Idle States - C-states  Sleep states  Differ is how much state is saved SW or HW managed transitions between states!

(13) Example of P-states Software Managed Power States Changing Power States is not free AMD Trinity A APU: 100W TDP CPU P- state Voltage (V) Freq (MHz) HW Only (Boost) Pb Pb SW- Visible P P P P P

(14) Example of P-states From:

(15) Management Knobs Each core can be in any one of a multiple of states How do I decide what state to set each core?  Who decides? HW? SW? How do I decide when I can turn off a core? What am I saving? Static energy or dynamic energy?

(16) Power Management Software controlled power management  Optimize power and/or energy  Orchestrated by the operating system or application libraries  Industry standard interfaces for power management oAdvanced Configuration and Power Interface (ACPI) Hardware power management  Optimized power/energy  Failsafe operation, e.g., protect against thermal emergencies

(17) Power Management 3.0 Time Die Temperature Thermal Headroo m Convert thermal headroom to higher performance through boost HW Boost states Max Die Temp SW visible states Performance CPU DVFS- state HW Only (Boost) Pb0 Pb1 SW- Visible P0 P1 P Pmin Instructions/cycle Time Performance and energy efficiency depend on effective utilization of power and thermal headroom

(18) Boosting Exploit package physics  Temperature changes on the order of milliseconds Use the thermal headroom Max Power TDP Power Low power – build up thermal credits Turbo boost region 10s of seconds Intel Sandy Bridge

(19) Power Gating Intel Sandy Bridge Processor Turn off components that are not being used  Lose all state information Costs of powering down Costs of powering up Smart shutdown  Models to guide decisions

(20) Parallelism Concurrency + lower frequency  greater energy efficiency Core Cache Core Cache Core Cache Core Cache Core Cache 4X #cores 0.75x voltage 0.5x Frequency 1X power 2X in performance Example

(21) Simplify Core Design AMD Bulldozer Core ARM A7 Core (arm.com) Support for branch prediction, schedulers, etc. consumes more energy per instruction Can fit many more simpler cores on a die

(22) Metrics Power efficiency  MIPS/watt  Ops/watt Energy efficiency  Joules/instruction  Joules/op Composite  Energy-delay product  Energy-delay 2 Why are these useful?

Modeling Lecture notes S. Yalamanchili and S. Mukhopadhyay

(24) Microarchitectural Level Models How can we study power consumption without building circuits?  Models Models can are available at multiple levels of abstraction. We are interested in microarchitectural models

(25) Processor Microarchitecture Instruction Cache Instruction Queue Fetch Queue Fetch Queue Instruction Decoder Branch Prediction Branch Prediction Register Files Instruction TLB ALU MUL FPU LD ST L1 Data Cache Data TLB Data TLB L2 Data Cache NoC Router On-Chip Network On-Chip Network FetchDecodeExecute/Writeback Memory Network

(26) Energy/Power Calculation How do we calculate energy or power dissipation for a given microarchitecture? Energy/Power varies between:  Different ISA; ARM vs Intel x86  Different microarchitecture; in-order vs out-of-order  Different applications; memory vs compute-bound  Different technologies; 90nm vs 22nm technology  Different operation conditions; frequency, temperature

(27) Architecture Activity (1) Instruction Cache Instruction Queue Fetch Queue Fetch Queue Instruction Decoder Branch Prediction Branch Prediction Register Files Instruction TLB ALU MUL FPU LD ST L1 Data Cache Data TLB Data TLB L2 Data Cache NoC Router On-Chip Network On-Chip Network Activity 1: Instruction Fetch icache.read++; fbuffer.write++; Collect activity counts of each architecture component (through simulation or measurement). List of components differs between microarchitectures. Activity counts at each component differs between applications.

(28) Architecture Activity (2) Instruction Cache Instruction Queue Fetch Queue Fetch Queue Instruction Decoder Branch Prediction Branch Prediction Register Files Instruction TLB ALU MUL FPU LD ST L1 Data Cache Data TLB Data TLB L2 Data Cache NoC Router On-Chip Network On-Chip Network Activity 2: Instruction Decode fbuffer.read++; idecoder.logic++; Read/write accesses to caches, buffers, etc. Logical accesses to logic blocks such as decoder, ALUs, etc. Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity).

(29) Power and Architecture Activity For example, At n th clock cycle, collected counters are:  Data cache: oread = 20, write = 12; oper-read energy = 0.5nJ; per-write energy = 0.6nJ; oRead energy = read*per-read energy = 10nJ oWrite energy = write*per-write energy = 7.2nJ oTotal activity energy = read+write energies = 17.2nJ oIf n = 50 th clock cycle and clock frequency = 2GHz, Total activity power = energy*clock_freq/n = 688mW *Note: n/clock_freq = n clock periods in sec power = time average of energy

(30) Things to consider (1) 1.How do we calculate per-read/write energies? Per-access energies can be estimated from circuit-level designs and analyses. There are various open-source tools for this. Architecture Specification Technology Parameters Circuit-level Estimation Tool Circuit-level Estimation Tool Estimation Results: Area, Energy, Timing, etc.

(31) Things to consider (2) 2.Is per-access energy always the same? Per-access energy in fact depends on: how many bits are switching how they are switching (0 → 1 or 1 → 0) It is reasonable to assume constant per-access energy in long-term observation (e.g., n = 1M clock cycles); the number of switching bits are averaged (e.g., 50% of bits are switching). Most architecture simulators do not capture bit- level details due to simulation complexity.

(32) Things to consider (3) 3.If a register file didn’t have read/write accesses but held data, what is the energy dissipation? Energy (or power) is largely comprised of dynamic and static dissipations. Dynamic (or switching) energy refers to energy dissipation due to switching activities. Static (or leakage) energy is dissipation to keep the electronic system turned on. In this case, the register file has no dynamic energy dissipation but consumes static energy.

Thermal Issues Lecture notes S. Yalamanchili and S. Mukhopadhyay

(34) Thermal Issues Heat can cause damage to the chip  Need failsafe operation Thermal fields change the physical characteristics  Leakage current and therefore power increases  Delay increases  Device degradation becomes worse Cooling solution determines the permitted power dissipation

(35) Thermal Design Power (TDP) This is the maximum power at which the part is designed to operate  Dictates the design of the cooling system oMax temperature  T jmax  Typically fixed by worst case workload Parts are typically operating below the TDP Opportunities for turbo mode? AMD Trinity APU

(36) Heat Sink Limits on Performance Thermal design power (TDP) Determines the cooling solution & package limits Performance depends on effective utilization of this thermal headroom  Instructions/cycle Time Thermal Headroom Max Die Temp Convert thermal headroom to higher performance through boosting HW Boost states SW visible states Boost power TDP Power Workload Temp Power

(37) Trinity TDP Source:

(38) Issues Cooling chips is now an issue for computer architects! Co-design the cooling system and the processor Some very “cool” new technologies  E.g., microfluidics!

(39) Electrical and Fluidic I/Os Fluid flow through the microchannels carry heat out to an external heat exchanger (e.g., heat sink) Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)

(40) Fabrication Examples Electrical and fluidic microbumps, fluidic vias and fine wires Micropin-fins (150 µm diameter and 225 µm diameter)and vias Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)

(41) Conclusions Power/energy is the leading driver of modern architecture design Power and energy management is key to scalability Need integrated power/energy, performance, thermal management in fielded systems What about energy/power efficient algorithms?

(42) Study Guide Explain the difference between energy dissipation and power dissipation Distinguish between static power dissipation and dynamic power dissipation Explain dynamic voltage frequency scaling  What are power states?  Why is this an advantage?  What is the impact of DVFS on i) energy, ii) execution time, and iii) power Distinguish between clock gating and power gating

(43) Study Guide (cont.) Define thermal design power (TDP) Name two schemes to preventing the chip from exceeding TDP. Explain how they achieve this goal What does boosting achieve? What is the difference between C-states and P- states? Name one power management technique that will save static power? How does using many slower simpler cores improve power efficiency?

(44) Study Guide (cont.) How is thermal design power (TDP) calculated? When using boost algorithms, what determines the duration of the high frequency operation? How does a power virus work? Describe how throttling works Know the power dissipation in some modern processor-memory systems drawn from the embedded, server, and high performance computing segments

(45) Glossary Boosting C-states Dynamic Power and Energy Power Gating P-states Static Power and Energy Time constant Thermal Design Point Throttling