Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils.

Slides:



Advertisements
Similar presentations
ENERGY-EFFICIENT ALGORITHMS INTRODUCTION TO DETERMINISTIC ONLINE POWER-DOWN ALGORITHMS Len Matsuyama CS 695.
Advertisements

International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Computer Abstractions and Technology
GIGABYTE UNITED INC. 技嘉 945 系列全固態電容主機板 GIGABYTE 945 Series All-Solid Capacitor Motherboards GIGABYTE 945 Series All-Solid Capacitor Motherboards.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Power Reduction Techniques For Microprocessor Systems
Thermal-Scheduling For Ultra Low Power Mobile Microprocessor May, Thermal-Scheduling For Ultra Low Power Mobile Microprocessor George Cai 1 Chee.
Topic 3: Sensor Networks and RFIDs Part 2 Instructor: Randall Berry Northwestern University MITP 491: Selected Topics.
Keeping Hot Chips Cool Thermal Management for Green Computing Yang Ge Professor Qinru Qiu.
DC-DC Power Supply Design for the Computing Industry Presented by Ronil Patel 12/03/03 EE Power Electronics SJSU © 2003.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
Low-power computer architecture
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
Power Delivery Challenges for High Performance Low Voltage Microprocessors Tanay Karnik Microprocessor Research Labs Intel Corporation November 9, 2001.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
1 The Problem of Power Consumption in Servers L. Minas and B. Ellison Intel-Lab In Dr. Dobb’s Journal, May 2009 Prepared and presented by Yan Cai Fall.
Energy Model for Multiprocess Applications Texas Tech University.
Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Acer TravelMate P253 PRODUCT BRIEF Fast, efficient and reliable computing.
Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
Green IT and Data Centers Darshan R. Kapadia Gregor von Laszewski 1.
Computational Sprinting on a Hardware/Software Testbed Arun Raghavan *, Laurel Emurian *, Lei Shao #, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas.
Low Power Techniques in Processor Design
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Last Time Performance Analysis It’s all relative
CSE 691: Energy-Efficient Computing Lecture 7 SMARTS: custom-made systems Anshul Gandhi 1307, CS building
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
Energy Savings with DVFS Reduction in CPU power Extra system power.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Evolution of Microprocessors Microprocessor A microprocessor incorporates most of all the functions of a computer’s central processing unit on a single.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
1 November 11, 2015 GAME CHANGING DESIGN Shlomit Weiss – VP Platform Engineering Group, Intel corp. 1.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Computational Sprinting Arun Raghavan *, Yixin Luo +, Anuj Chandawalla +, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K. Martin.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
FaridehShiran Department of Electronics Carleton University, Ottawa, ON, Canada SmartReflex Power and Performance Management Technologies.
CS203 – Advanced Computer Architecture
Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Battery Models. EMF and voltage What is EMF? – Electro Motive Force What is the difference between EMF and battery voltage? – The battery has internal.
1 Aphirak Jansang Thiranun Dumrongson
Computer Structure 2015 – Advanced Topics 1 Computer Structure Power and Power Management Computer Structure Power and Power Management Lecturer: Aharon.
New Rules: Scaling Performance for Extreme Scale Computing
Overview Motivation (Kevin) Thermal issues (Kevin)
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Computer Structure Power and Power Management
CS203 – Advanced Computer Architecture
Accelerated Processing Units
Anshul Gandhi 347, CS building
Green cloud computing 2 Cs 595 Lecture 15.
Basics of Energy & Power Dissipation
Temperature aware architecture of 14nm Broadwell chip-sets
Hui Chen, Shinan Wang and Weisong Shi Wayne State University
CSE 591: Energy-Efficient Computing Lecture 18 SPEED: power
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Utsunomiya University
Presentation transcript:

Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils

Processor Power Components The power consumed by a processor consists of Dynamic power: power for toggling transistors and lines from 01 or 10 αCV2f : α – activity, C – capacitance, V – voltage, f – frequency Leakage power: leakage of transistors under voltage function of: Z – total size of all transistors, V – voltage, t – temperature Peak power must not exceed the thermal constrains Power generates heat Heat must be dissipated to keep transistors within allowed temperature Peak power determines peak frequency (and thus peak performance) Also affects form factor, cooling solution cost, and acoustic noise Average power Determines battery life (for mobile devices), electricity bill, air-condition bill Average power = Total Energy / Total time Including low-activity and idle-time (~90% idle time for client)

Performance per Watt In small form-factor devices thermal budget limits performance Old target: get max performance New target: get max performance at a given power envelope Performance per Watt Increasing f also requires increasing V (~linearly) Dynamic Power = αCV2f = Kf3  X% performance costs ~3X% power A power efficient feature – better than 1:3 performance : power Otherwise it is better to just increase frequency (and voltage) Vmin is the minimal operation voltage Once at Vmin, reducing frequency no longer reduces voltage At this point a feature is power efficient only if it is 1:1 performance : power Active energy efficiency tradeoff Energyactive = Poweractive × Timeactive  Poweractive / Perfactive Energy efficient feature: 1:1 performance : power

Platform Power Processor average power is <10% of the platform Display (panel + inverter) 33% CPU 10% Power Supply MCH 9% Misc. 8% GFX HDD CLK 5% ICH 3% DVD 2% LAN Fan

Managing Power Typical CPU usage varies over time Bursts of high utilization & long idle periods (~90% of time in client) Optimize power and energy consumption High power when high performance is needed Low power at low activity or idle Enhanced Intel SpeedStep® Technology Multi voltage/frequency operating points OS changes frequency to meet performance needs and minimize power Referred to as processor Performance states = P-States OS notifies CPU when no tasks are ready for execution CPU enters sleep state, called C-state Using MWAIT instruction, with C-state level as an argument Tradeoff between power and latency Deeper sleep  more power savings  longer to wake

P-states Operation frequncies are called P-states = Performance states P0 is the highest frequency P1,2,3… are lower frequencies Pn is the min Vcc point = Energy efficient point DVFS = Dynamic Voltage and Frequency Scaling Power = CV2f ; f = KV  Power ~ f3 Program execution time ~ 1/f E = P×t  E ~ f2  Pn is the most energy efficient point Going up/down the cubic curve of power High cost to achieve frequency large power savings for some small frequency reduction P0 P1 Pn Freq Power P2

C-States: C0 C0: CPU active state Active Core Power Local Clocks and Logic Clock Distribution Leakage

C-States: C1 C0: CPU active state C1: Halt state: Active Core Power Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops Clock Distribution Leakage

C-States: C3 C0: CPU active state C1: Halt state: C3 state: Active Core Power C0: CPU active state C1: Halt state: Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: Stop remaining core clocks Flush internal core caches Leakage

C-States: C6 Core power goes to ~0 C0: CPU active state Active Core Power C0: CPU active state C1: Halt state: Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: Stop remaining core clocks Flush internal core caches C6 state: Processor saves architectural state Turn off power gate, eliminating leakage Leakage Core power goes to ~0

Putting it all together CPU running at max power and frequency Periodically enters C1 C0 P0 Power [W] C1 Time

Putting it all together Going into idle period Gradually enters deeper C states Controlled by OS C0 P0 Power [W] C2 C3 C4 C1 Time

Putting it all together Tracking CPU utilization history OS identifies low activity Switches CPU to lower P state C0 P0 C0 P1 Power [W] C2 C3 C4 C1 Time

Putting it all together CPU enters Idle state again C0 P0 C0 P1 Power [W] C2 C2 C3 C3 C4 C4 C1 Time

Putting it all together Further lowering the P state DVD play runs at lowest P state C2 C3 C4 C0 P1 P2 C1 P0 Power [W] Time

Voltage and Frequency Domains Two Independent Variable Power Planes CPU cores, ring and LLC Embedded power gates – each core can be turned off individually Cache power gating – turn off portions or all cache at deeper sleep states Graphics processor Can be varied or turned off when not active Shared frequency for all IA32 cores and ring Independent frequency for PG Fixed Programmable power plane for System Agent Optimize SA power consumption System On Chip functionality and PCU logic Periphery: DDR, PCIe, Display VCC Core (Gated) (ungated) VCC SA VCC Graphics VCC Periphery Embedded power gates

Turbo Mode P1 is guaranteed frequency CPU and GFX simultaneous heavy load at worst case conditions Actual power has high dynamic range P0 is max possible frequency – the Turbo frequency P1-P0 has significant frequency range (GHz) Single thread or lightly loaded applications GFX <>CPU balancing OS treats P0 as any other P-state Requesting is when it needs more performance P1 to P0 range is fully H/W controlled Frequency transitions handled completely in HW PCU keeps silicon within existing operating limits Systems designed to same specs, with or without Turbo Mode Pn is the energy efficient state Lower than Pn is controlled by Thermal-State “Turbo” H/W Control OS Visible States T-state & Throttle P1 Pn P0 1C frequency LFM

Zero power for inactive cores Turbo Mode Power Gating Zero power for inactive cores Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 Workload Lightly Threaded OEM adheres to thermal design guidelines Works with Enhanced Intel Speedstep to increase energy efficiency Capability extends further in Sandy Bridge 18 18 18

Zero power for inactive cores Turbo Mode Turbo Mode Use thermal budget of inactive core to increase frequency of active cores Power Gating Zero power for inactive cores Workload Lightly Threaded Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 OEM adheres to thermal design guidelines Works with Enhanced Intel Speedstep to increase energy efficiency Capability extends further in Sandy Bridge 19 19 19

Zero power for inactive cores Turbo Mode Turbo Mode Use thermal budget of inactive core to increase frequency of active cores Power Gating Zero power for inactive cores Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Workload Lightly Threaded OEM adheres to thermal design guidelines Works with Enhanced Intel Speedstep to increase energy efficiency Capability extends further in Sandy Bridge 20 20 20

Turbo Mode No Turbo Core 3 Core 0 Core 2 Turbo Mode Increase frequency within thermal headroom No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 Active cores running workloads < TDP Core 1 Core 3 Frequency (F) Frequency (F) Core 0 Core 2 OEM adheres to thermal design guidelines Works with Enhanced Intel Speedstep to increase energy efficiency Capability extends further in Sandy Bridge 21 21 21

Turbo Mode No Turbo Core 2 Core 0 Core 3 Power Gating Zero power for inactive cores Turbo Mode Increase frequency within thermal headroom Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 2 Workload Lightly Threaded And active cores < TDP Core 0 Core 1 Core 3 22 22 22

Classic model response More realistic response to power changes Thermal Capacitance Classic Model Steady-State Thermal Resistance Design guide for steady state Temperature Time Classic model response Temperature Time More realistic response to power changes New Model Steady-State Thermal Resistance AND Dynamic Thermal Capacitance Temperature rises as energy is delivered to thermal solution Thermal solution response is calculated at real-time Foil taken from IDF 2011

Intel® Turbo Boost Technology 2.0 Time Power Sleep or Low power Turbo Boost 2.0 “TDP” C0/P0 (Turbo) After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds In Steady State conditions the power stabilizes on TDP P > TDP: Responsiveness Sustain power Buildup thermal budget during idle periods Use accumulated energy budget to enhance user experience Foil taken from IDF 2011

Core and Graphic Power Budgeting Sandy Bridge Next Gen Turbo Cores and Graphics integrated on the same die with separate voltage/frequency controls; tight HW control Full package power specifications available for sharing Power budget can shift between Cores and Graphics Core Power [W] Heavy Graphics workload Heavy CPU Total package power Sandy Bridge Next Gen Turbo for short periods Sum of max power Realistic concurrent max power Specification Core Power Applications Specification Graphics Power Graphics Power [W] Foil taken from IDF 2011