Download presentation
Presentation is loading. Please wait.
1
Computer Structure Power and Power Management
Lihu Rappoport and Adi Yoaz
2
Thermal Design Point TDP power TDP power impacts
Maximum amount of power the thermal solution of the platform is required to dissipate Calculated as the rolling average of 5sec power of the highest real existing applications Not including Viruses Some virus like application are ignored as well TDP power impacts Guaranteed frequency Affects form factor cooling solution cost acoustic noise Fan CPU How much power can this fan can dissipate?
3
Average Power Average Power Average Power determines
Average power = Total Energy / Total time Including low-activity and idle-time (~90% idle time for client) Average Power determines Battery life – for mobile devices Electricity bill Air-condition bill Air pollution Battery Life Continuous Web surfing over wireless For servers 1 web search Same energy as an 11W light bulb for 1 hour Emits 7gr CO2 4B web searches daily
4
Platform Power Processor average power is <10% of the platform
Display (panel + inverter) 33% CPU 10% Power Supply MCH 9% Misc. 8% GFX HDD CLK 5% ICH 3% DVD 2% LAN Fan
5
CPU Power Components Dynamic power: Pdyn = CV2f Leakage power
C – total electrical capacitance charged/discharged per cycle The sum of all the capacities of all transistors and wires which are charged/discharged (toggle from 01 or from 10) per cycle Transistors which maintain their value do not spend dynamic power Application dependent E.g., an application with no floating point calculation will not toggle the transistors in the floating point execution unit Typically, a bigger CPU has a bigger Cdyn V – voltage f – frequency: increasing f also requires increasing V ~linearly CV2f ~ f3 X% frequency costs ~3X% power Leakage power Leakage of transistors under voltage, which is a function of Z – total size of all transistors, V – voltage, t – temperature
6
Performance per Watt In small form-factor devices thermal budget limits performance Old target: get max performance New target: get max performance at a given power envelope Performance per Watt Increasing f also requires increasing V (~linearly) Dynamic Power = αCV2f = Kf3 X% performance costs ~3X% power A power efficient feature – better than 1:3 performance : power Otherwise it is better to just increase frequency (and voltage) Vmin is the minimal operation voltage Once at Vmin, reducing frequency no longer reduces voltage At this point a feature is power efficient only if it is 1:1 performance : power Active energy efficiency tradeoff Energyactive = Poweractive × Timeactive Poweractive / Perfactive Energy efficient feature: 1:1 performance : power
7
Example –Power/Performance Exam Question
יש לתכנן מערכת Multi Processor ויש להשוות בין שני סוגי מעבדים כך שניתן יהיה להריץ אפליקציות ביעילות תוך קבלת ביצועים (MP performance)מרביים במסגרת תקציב ההספק. מעטפת הספק של המערכת היא 60Watt, כאשר שני שליש מתקציב ההספק הינו עבור המעבדים – Core’s (והשאר עבור ה – Uncore). יש להשוות בין שני מעבדים אפשריים לצורך בניית המערכת, מעבד גדול ומעבד קטן: ההספק הסטטי Leakage Power הוא 0.25Watt למילימטר רבוע (ההספק הסטטי קבוע ולא משתנה עם המתח/תדר). הקיבול הדינאמי של כל מעבד נתון כפונקציה של ה – IPC של האפליקציה אותה הוא מריץ וערכו הוא: Cdyn=IPC×750pF מעבד קטן מעבד גדול 2mm2 4mm2 שטח 3 wide 4 wide רוחב 2 3 IPC
8
Example –Power/Performance Exam Question (Cont’)
בטבלה הבאה נתונות נקודות מתח ותדר אפשריות לעבודת המעבדים הגדול והקטן: תדר ב Ghz מעבד גדול תדר ב Ghz מעבד קטן מתח ב Volt’s 0.75 1 Vmin=0.7 1.25 1.35 1.45 1.75 0.8 2.25 2 0.85 2.5 0.9 3 0.95 3.5 2.75 Vmax=1 עבור כל מערכת יש למצוא את מספר המעבדים האופטימלי, וכן לחשב את הביצועים (MP performance instructions per second). לאור התוצאות איזו מערכת הייתם ממליצים לייצר? זו עם המעבד הגדול או זו עם המעבד הקטן.
9
Solution Power = c×V2×f + leakage Leakage = 0.25w/mm2 × A(mm2)
Leakage of the big Core = 0.25W/mm2 × 4mm2 = 1w Leakage of the small Core = 0.25W/mm2 × 2mm2 = 0.5w Cdyn=IPC×750pF Big Core: Cdyn = 3×0.75= 2.25 Small Core: Cdyn = 2×0.75= 1.5 Maximize total MP performance work at Vmin’s max freq.: Big Core: V= 0.7v f=1.35Ghz Small Core V= 0.7v f=1.45Ghz Per Core power for most efficient operating point: For Big Core System: P = 2.25×0.72×1.35+1=2.49w For Small Core System: P = 1.5 ×0.72×1.45+1=1.57w
10
Solution (cont.) Power budget for all cores= 60w×2/3=40w
Number of Cores = power for all cores / power per core Big Cores: w/2.49w = 16 Small Cores: 40w/1.57w= 25 MP Performance = #of cores × IPC × f (inst/sec) Big Cores: × 3 × 1.35GHz = 64.8×109 inst/sec Samll Cores: 25 × 2 × 1.45GHz = 72.5×109 inst/sec The small core system provides higher MP performance at the given system power envelop It also takes less area: 25×2=50mm2 vs 16×4=64mm2 this is the recommend system
11
Managing Power Typical CPU usage varies over time
Bursts of high utilization & long idle periods (~90% of time in client) Optimize power and energy consumption High power when high performance is needed Low power at low activity or idle Enhanced Intel SpeedStep® Technology Multi voltage/frequency operating points OS changes frequency to meet performance needs and minimize power Referred to as processor Performance states = P-States OS notifies CPU when no tasks are ready for execution CPU enters sleep state, called C-state Using MWAIT instruction, with C-state level as an argument Tradeoff between power and latency Deeper sleep more power savings longer to wake
12
P-states Operation frequencies are called P-states = Performance states P0 is the highest frequency P1,2,3… are lower frequencies Pn is the min Vcc point = Energy efficient point DVFS = Dynamic Voltage and Frequency Scaling Power = CV2f ; f = KV Power ~ f3 Program execution time ~ 1/f E = P×t E ~ f2 Pn is the most energy efficient point Going up/down the cubic curve of power High cost to achieve frequency large power savings for some small frequency reduction P0 P1 Pn Freq Power P2
13
C-States: C0 C0: CPU active state Active Core Power
Local Clocks and Logic Clock Distribution Leakage
14
C-States: C1 C0: CPU active state C1: Halt state: Stop core pipeline
Stop most core clocks No instructions are executed Caches respond to external snoops Active Core Power Clock Distribution Leakage
15
C-States: C3 C0: CPU active state C1: Halt state: C3 state:
Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: Stop remaining core clocks Flush internal core caches Active Core Power Leakage
16
C-States: C6 Core power goes to ~0 C0: CPU active state
C1: Halt state: Stop core pipeline Stop most core clocks No instructions are executed Caches respond to external snoops C3 state: Stop remaining core clocks Flush internal core caches C6 state: Processor saves architectural state Turn off power gate, eliminating leakage Active Core Power Leakage Core power goes to ~0
17
Putting it all together
CPU running at max power and frequency Periodically enters C1 C0 P0 Power [W] C1 Time
18
Putting it all together
Going into idle period Gradually enters deeper C states Controlled by OS C0 P0 Power [W] C2 C3 C4 C1 Time
19
Putting it all together
Tracking CPU utilization history OS identifies low activity Switches CPU to lower P state C0 P0 C0 P1 Power [W] C2 C3 C4 C1 Time
20
Putting it all together
CPU enters Idle state again C0 P0 C0 P1 Power [W] C2 C2 C3 C3 C4 C4 C1 Time
21
Putting it all together
Further lowering the P state DVD play runs at lowest P state C2 C3 C4 C0 P1 P2 C1 P0 Power [W] Time
22
Voltage and Frequency Domains
VCC Core (Gated) (ungated) VCC SA VCC Graphics VCC Periphery Embedded power gates Two Independent Variable Power Planes CPU cores, ring and LLC Embedded power gates – each core can be turned off individually Cache power gating – turn off portions or all cache at deeper sleep states Graphics processor Can be varied or turned off when not active Shared frequency for all IA32 cores and ring Independent frequency for PG Fixed Programmable power plane for System Agent Optimize SA power consumption System On Chip functionality and PCU logic Periphery: DDR, PCIe, Display
23
Turbo Mode P1 is guaranteed frequency
CPU and GFX simultaneous heavy load at worst case conditions Actual power has high dynamic range P0 is max possible frequency – the Turbo frequency P1-P0 has significant frequency range (GHz) Single thread or lightly loaded applications GFX <>CPU balancing OS treats P0 as any other P-state Requesting is when it needs more performance P1 to P0 range is fully H/W controlled Frequency transitions handled completely in HW PCU keeps silicon within existing operating limits Systems designed to same specs, with or without Turbo Mode Pn is the energy efficient state Lower than Pn is controlled by Thermal-State “Turbo” H/W Control OS Visible States T-state & Throttle P1 Pn P0 1C frequency LFM
24
Zero power for inactive cores
Turbo Mode Power Gating Zero power for inactive cores Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 Workload Lightly Threaded
25
Zero power for inactive cores
Turbo Mode Turbo Mode Use thermal budget of inactive core to increase frequency of active cores Power Gating Zero power for inactive cores Workload Lightly Threaded Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1
26
Zero power for inactive cores
Turbo Mode Turbo Mode Use thermal budget of inactive core to increase frequency of active cores Power Gating Zero power for inactive cores Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Workload Lightly Threaded
27
Turbo Mode No Turbo Core 3 Core 0 Core 2 Turbo Mode
Increase frequency within thermal headroom No Turbo Core 0 Core 1 Core 2 Core 3 Core 0 Core 1 Core 2 Core 3 Active cores running workloads < TDP Core 1 Core 3 Frequency (F) Frequency (F) Core 0 Core 2
28
Turbo Mode No Turbo Core 2 Core 0 Core 3 Power Gating
Zero power for inactive cores Turbo Mode Increase frequency within thermal headroom Frequency (F) No Turbo Core 0 Core 1 Core 2 Core 3 Core 2 Workload Lightly Threaded And active cores < TDP Core 0 Core 1 Core 3
29
Classic model response More realistic response to power changes
Thermal Capacitance Classic Model Steady-State Thermal Resistance Design guide for steady state Temperature Time Classic model response Temperature Time More realistic response to power changes New Model Steady-State Thermal Resistance AND Dynamic Thermal Capacitance Temperature rises as energy is delivered to thermal solution Thermal solution response is calculated at real-time Foil taken from IDF 2011
30
Intel® Turbo Boost Technology 2.0
Time Power Sleep or Low power Turbo Boost 2.0 “TDP” C0/P0 (Turbo) After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds In Steady State conditions the power stabilizes on TDP P > TDP: Responsiveness Sustain power Buildup thermal budget during idle periods Use accumulated energy budget to enhance user experience Foil taken from IDF 2011
31
Core and Graphic Power Budgeting Sandy Bridge Next Gen Turbo
Cores and Graphics integrated on the same die with separate voltage/frequency controls; tight HW control Full package power specifications available for sharing Power budget can shift between Cores and Graphics Core Power [W] Heavy Graphics workload Heavy CPU Total package power Sandy Bridge Next Gen Turbo for short periods Sum of max power Realistic concurrent max power Specification Core Power Applications Specification Graphics Power Graphics Power [W] Foil taken from IDF 2011
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.