Computer Structure 2015 – Advanced Topics 1 Computer Structure Power and Power Management Computer Structure Power and Power Management Lecturer: Aharon.

Slides:



Advertisements
Similar presentations
Chapter 9 Contributed by Alex Turek
Advertisements

Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils.
Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
Thermal-Scheduling For Ultra Low Power Mobile Microprocessor May, Thermal-Scheduling For Ultra Low Power Mobile Microprocessor George Cai 1 Chee.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Energy in sensor nets. Where does the power go Components: –Battery -> DC-DC converter –CPU + Memory + Flash –Sensors + ADC –DAC + Audio speakers –Display.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 7: Power.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Low Power Techniques in Processor Design
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
Last Time Performance Analysis It’s all relative
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Energy Savings with DVFS Reduction in CPU power Extra system power.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Embedded System Lab. 김해천 The TURBO Diaries: Application-controlled Frequency Scaling Explained.
1 Latest Generations of Multi Core Processors
Basics of Energy & Power Dissipation
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
1 November 11, 2015 GAME CHANGING DESIGN Shlomit Weiss – VP Platform Engineering Group, Intel corp. 1.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
E-MOS: Efficient Energy Management Policies in Operating Systems
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
CS203 – Advanced Computer Architecture
1 Aphirak Jansang Thiranun Dumrongson
Evaluation of Advanced Power Management for ClassCloud based on DRBL Rider Grid Technology Division National Center for High-Performance Computing Research.
Power Management. Outline Why manage power? Power management in CPU cores Power management system wide Ways for embedded programmers to be power conscious.
Overview Motivation (Kevin) Thermal issues (Kevin)
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Computer Structure Power and Power Management
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Temperature and Power Management
Multiprocessing.
Lecture 3: MIPS Instruction Set
Green cloud computing 2 Cs 595 Lecture 15.
Computer Structure Multi-Threading
Morgan Kaufmann Publishers
Intel Atom Architecture – Next Generation Computing
Intel’s Core i7 Processor
Lecture 2: Performance Today’s topics: Technology wrap-up
Energy Efficient Scheduling in IoT Networks
Overheads for Computers as Components 2nd ed.
Lecture 3: MIPS Instruction Set
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Utsunomiya University
Presentation transcript:

Computer Structure 2015 – Advanced Topics 1 Computer Structure Power and Power Management Computer Structure Power and Power Management Lecturer: Aharon Kupershtok Created by Lihu Rappoport

Computer Structure 2015 – Advanced Topics 2 Thermal Design Point TDP power –Maximum amount of power the thermal solution of the platform is required to dissipate –Calculated as the rolling average of 5sec power of the highest real existing applications –Not including Viruses  Some virus like application are ignored as well TDP power impacts –Guaranteed frequency –Affects form factor –cooling solution cost –acoustic noise Fan CPU How much power can this fan can dissipate?

Computer Structure 2015 – Advanced Topics 3 Average Power Average Power –Average power = Total Energy / Total time –Including low-activity and idle-time (~90% idle time for client) Average Power determines –Battery life – for mobile devices –Electricity bill –Air-condition bill –Air pollution 1 web search Same energy as an 11W light bulb for 1 hour Emits 7gr CO 2 4B web searches daily Battery Life Continuous Web surfing over wireless For servers

Computer Structure 2015 – Advanced Topics 4 Platform Power Processor average power is <10% of the platform Display (panel + inverter) 33% CPU 10% Power Supply 10% MCH 9% Misc. 8% GFX 8% HDD 8% CLK 5% ICH 3% DVD 2% LAN 2% Fan 2%

Computer Structure 2015 – Advanced Topics 5 CPU Power Components Dynamic power: P dyn = CV 2 f –C – total electrical capacitance charged/discharged per cycle  The sum of all the capacities of all transistors and wires which are charged/discharged (toggle from 0  1 or from 1  0) per cycle Transistors which maintain their value do not spend dynamic power  Application dependent E.g., an application with no floating point calculation will not toggle the transistors in the floating point execution unit  Typically, a bigger CPU has a bigger Cdyn –V – voltage –f – frequency: increasing f also requires increasing V ~linearly  CV 2 f ~ f 3  X% frequency costs ~3X% power Leakage power –Leakage of transistors under voltage, which is a function of  Z – total size of all transistors, V – voltage, t – temperature

Computer Structure 2015 – Advanced Topics 6 Performance per Watt In small form-factor devices thermal budget limits performance –Old target: get max performance –New target: get max performance at a given power envelope  Performance per Watt Increasing f also requires increasing V (~linearly) –Dynamic Power = αCV 2 f = Kf 3  X% performance costs ~3X% power –A power efficient feature – better than 1:3 performance : power  Otherwise it is better to just increase frequency (and voltage) Vmin is the minimal operation voltage –Once at Vmin, reducing frequency no longer reduces voltage –At this point a feature is power efficient only if it is 1:1 performance : power Active energy efficiency tradeoff –Energy active = Power active × Time active  Power active / Perf active –Energy efficient feature: 1:1 performance : power

Computer Structure 2015 – Advanced Topics 7 Managing Power Typical CPU usage varies over time –Bursts of high utilization & long idle periods (~90% of time in client) Optimize power and energy consumption –High power when high performance is needed –Low power at low activity or idle Enhanced Intel SpeedStep® Technology –Multi voltage/frequency operating points –OS changes frequency to meet performance needs and minimize power –Referred to as processor Performance states = P-States OS notifies CPU when no tasks are ready for execution –CPU enters sleep state, called C-state –Using MWAIT instruction, with C-state level as an argument –Tradeoff between power and latency  Deeper sleep  more power savings  longer to wake

Computer Structure 2015 – Advanced Topics 8 P-states Operation frequencies are called P-states = Performance states –P0 is the highest frequency –P1,2,3… are lower frequencies –Pn is the min Vcc point = Energy efficient point DVFS = Dynamic Voltage and Frequency Scaling –Power = CV 2 f ; f = KV  Power ~ f 3 –Program execution time ~ 1/f –E = P×t  E ~ f 2  Pn is the most energy efficient point –Going up/down the cubic curve of power  High cost to achieve frequency  large power savings for some small frequency reduction P0 P1 Pn Freq Power P2

Computer Structure 2015 – Advanced Topics 9 C-States: C0 C0: CPU active state Leakage Clock Distribution Local Clocks and Logic Active Core Power

Computer Structure 2015 – Advanced Topics 10 C-States: C1 C0: CPU active state C1: Halt state: Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops Leakage Clock Distribution Active Core Power

Computer Structure 2015 – Advanced Topics 11 C-States: C3 C0: CPU active state C1: Halt state: Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: Stop remaining core clocks Flush internal core caches Leakage Active Core Power

Computer Structure 2015 – Advanced Topics 12 C-States: C6 C0: CPU active state C1: Halt state: Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: Stop remaining core clocks Flush internal core caches C6 state: Processor saves architectural state Turn off power gate, eliminating leakage Leakage Core power goes to ~0 Active Core Power

Computer Structure 2015 – Advanced Topics 13 Putting it all together CPU running at max power and frequency Periodically enters C1 Power [W] C1 C0 P0 Time

Computer Structure 2015 – Advanced Topics 14 Putting it all together Going into idle period –Gradually enters deeper C states –Controlled by OS Time Power [W] C2 C3 C4 C1 C0 P0

Computer Structure 2015 – Advanced Topics 15 Putting it all together Tracking CPU utilization history –OS identifies low activity –Switches CPU to lower P state Time Power [W] C2 C3 C4 C0 P1 C1 C0 P0

Computer Structure 2015 – Advanced Topics 16 Putting it all together CPU enters Idle state again Time Power [W] C2 C3 C4 C0 P1 C2 C3 C4 C1 C0 P0

Computer Structure 2015 – Advanced Topics 17 Putting it all together Further lowering the P state DVD play runs at lowest P state Time Power [W] C2 C3 C4 C0 P1 C0 P2 C2 C3 C4 C1 C0 P0

Computer Structure 2015 – Advanced Topics 18 Voltage and Frequency Domains Two Independent Variable Power Planes –CPU cores, ring and LLC  Embedded power gates – each core can be turned off individually  Cache power gating – turn off portions or all cache at deeper sleep states –Graphics processor  Can be varied or turned off when not active Shared frequency for all IA32 cores and ring Independent frequency for PG Fixed Programmable power plane for System Agent –Optimize SA power consumption –System On Chip functionality and PCU logic –Periphery: DDR, PCIe, Display VCC Core (Gated) (Gated) (Gated) (Gated) (ungated) VCC SA VCC Graphics VCC Periphery Embedded power gates

Computer Structure 2015 – Advanced Topics 19 Turbo Mode P1 is guaranteed frequency –CPU and GFX simultaneous heavy load at worst case conditions –Actual power has high dynamic range P0 is max possible frequency – the Turbo frequency –P1-P0 has significant frequency range (GHz)  Single thread or lightly loaded applications  GFX <>CPU balancing –OS treats P0 as any other P-state  Requesting is when it needs more performance –P1 to P0 range is fully H/W controlled  Frequency transitions handled completely in HW  PCU keeps silicon within existing operating limits –Systems designed to same specs, with or without Turbo Mode Pn is the energy efficient state –Lower than Pn is controlled by Thermal-State “Turbo”H/WControl OSVisibleStates OSControl T-state & Throttle P1 Pn P0 1C frequency LFM

Computer Structure 2015 – Advanced Topics 20 Turbo Mode Frequency (F) No Turbo Core 0Core 1Core 2Core 3 Core 2Core 3Core 0Core 1 Power Gating Zero power for inactive cores Workload Lightly Threaded

Computer Structure 2015 – Advanced Topics 21 Turbo Mode Workload Lightly Threaded Frequency (F) No Turbo Core 0Core 1Core 2Core 3 Turbo Mode Use thermal budget of inactive core to increase frequency of active cores Core 0Core 1 Power Gating Zero power for inactive cores

Computer Structure 2015 – Advanced Topics 22 Turbo Mode Frequency (F) No Turbo Core 0Core 1Core 2Core 3 Workload Lightly Threaded Core 0Core 1 Power Gating Zero power for inactive cores Turbo Mode Use thermal budget of inactive core to increase frequency of active cores

Computer Structure 2015 – Advanced Topics 23 Turbo Mode Active cores running workloads < TDP Frequency (F) No Turbo Core 0Core 1Core 2Core 3 Core 2 Core 3 Core 0 Core 1 Core 2Core 3Core 0Core 1 Turbo Mode Increase frequency within thermal headroom

Computer Structure 2015 – Advanced Topics 24 Turbo Mode Frequency (F) No Turbo Core 0Core 1Core 2Core 3 Workload Lightly Threaded And active cores < TDP Core 2 Core 3 Core 1 Core 0 Turbo Mode Increase frequency within thermal headroom Power Gating Zero power for inactive cores

Computer Structure 2015 – Advanced Topics 25 Thermal Capacitance Classic Model Steady-State Thermal Resistance Design guide for steady state Classic Model Steady-State Thermal Resistance Design guide for steady state Temperature Time Classic model response Temperature rises as energy is delivered to thermal solution Thermal solution response is calculated at real-time Temperature rises as energy is delivered to thermal solution Thermal solution response is calculated at real-time Temperature Time More realistic response to power changes New Model Steady-State Thermal Resistance AND Dynamic Thermal Capacitance Foil taken from IDF 2011

Computer Structure 2015 – Advanced Topics 26 Time Power Sleep or Low power Turbo Boost 2.0 “TDP” C0/P0 (Turbo) After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds In Steady State conditions the power stabilizes on TDP P > TDP: Responsiveness Sustain power Buildup thermal budget during idle periods Use accumulated energy budget to enhance user experience Intel® Turbo Boost Technology 2.0 Foil taken from IDF 2011

Computer Structure 2015 – Advanced Topics 27 Core and Graphic Power Budgeting Cores and Graphics integrated on the same die with separate voltage/frequency controls; tight HW control Full package power specifications available for sharing Power budget can shift between Cores and Graphics Core Power [W] Graphics Power [W] Total package power Realistic concurrent max power Sum of max power Heavy Graphics workload Heavy CPU workload Specification Core Power Specification Graphics Power Applications Sandy Bridge Next Gen Turbo for short periods Sandy Bridge Next Gen Turbo for short periods Foil taken from IDF 2011

Computer Structure 2015 – Advanced Topics 28 Example –Power/Performance Exam Question דרוש לתכנן מערכת Micro Server Multi Processor ויש להשוות בין שני סוגי מעבדים כך שניתן יהיה להריץ אפליקציות חמות (TDP) ביעילות תוך קבלת ביצועים (MP performance) הגבוהים ביותר במסגרת תקציב ההספק. מעטפת הספק של המערכת היא 60Watt כאשר שני שליש מתקציב ההספק הינו עבור המעבדים – Core’s (והשאר עבור ה – Uncore). יש להשוות בין שני מעבדים אפשריים לצורך בניית המערכת, מעבד גדול ומעבד קטן: הגדול גודלו ארבע מילימטר רבוע4mm 2 והקטן גודלו שני מילימטר רבוע.2mm 2 המעבד הגדול הוא מעבד4wide עם IPC=3והקטן הוא מעבד 3wide עם IPC=2. IPC זה מתקבל כאשר מריצים אפליקציות חמות (TDP) ללא טעויות בחיזוי הקפיצה. ההספק הסטטי Leakage Power הוא 0.25Watt למילימטר רבוע (ההספק הסטטי קבוע ולא משתנה עם המתח/תדר). הקיבול הדינאמי של כל מעבד נתון כפונקציה של ה – IPC של האפליקציה אותה הוא מריץ וערכו הוא: Cdyn=IPC×750pF

Computer Structure 2015 – Advanced Topics 29 Example –Power/Performance Exam Question (Cont’) להלן נתונה טבלה המראה את נקודות מתח ותדר אפשריות לעבודת המעבדים הגדול והקטן. תדר ב Ghz מעבד גדולתדר ב Ghz מעבד קטןמתח ב Volt’s 0.751Vmin= Vmin= Vmin= Vmax=1 יש לתכנן שתי מערכות אחת עם מעבדים גדולים ואחת עם מעבדים קטנים כך שניתן יהיה להריץ אפליקציות חמות (TDP) ביעילות תוך קבלת ביצועים הגבוהים ביותר במסגרת תקציב ההספק. עבור כל מערכת יש למצוא את מספר המעבדים הדרוש.

Computer Structure 2015 – Advanced Topics 30Solution Power = leakage + c×f×V 2 Leakage of the core big Core = 4×0.25W= 1W Leakage of the core small Core = 2×0.25W= 0.5W To maximize Total MP performance need to work at Vmin for TDP app Big Core: V= 0.7v f=1.35Ghz Small Core V= 0.7v f=1.45Ghz For each system need to calculate the per Core Power budget for most efficient operating point which is Vmin when Hot App is running, this will allow maximizing number of cores and overall performance: For Big Core System: P = C×V 2 ×f + 1  3×0.75×0.7 2 ×1.35+1=2.49W For Small Core System: P= C×V 2 ×f  2×0.75×0.7 2 × =1.57W Power budget for all cores=60×2/3=40W Number of Cores: Big 40/2.49 = 16 Small 40/1.57= 25 When building a system with big Cores putting16 big Cores will provide highest MP performance in the system power envelop When building a system with small Cores putting 25 small Cores will provide highest MP performance in the system power envelop

Computer Structure 2015 – Advanced Topics 31 MP Perf/Power Q&A MP Performance = #of cores×IPC×f For Big Core: f=1.35Ghz; IPC_tdp=3; #of cores=16 TDP MP Performance= 16×3×1.35= 64.8 ×10 9 Instructions per second For small Core: f=1.45Ghz; IPC_tdp=2; #of cores=25 TDP MP Performance= 25×2×1.45= 72.5 ×10 9 Instructions per second The system with the small cores provides the highest MP performance and takes less area (25×2=50mm 2 vs 16×4=64mm 2 ) so we recommend to build the system with the small cores provide highest MP performance in the system power envelop עבור כל מערכת חשבו את הביצועים (MP performance instructions per second) עבור אפליקציה חמה בנקודת Vmin. לאור התוצאות איזו מערכת הייתם ממליצים לייצר? זו עם המעבד הגדול או זו עם המעבד הקטן.