Download presentation
Presentation is loading. Please wait.
Published byRosaline Chandler Modified over 9 years ago
1
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy 2014-2-11 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and Engineering www-inst.eecs.berkeley.edu/~cs152/ TA: Eric Love Lecture 7 -- Power and Energy Play:
2
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Today: Power and Energy Metrics: Power and energy (intro) Short Break. Metrics: Power and energy (technique)
3
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Power and Energy Universal: Power and energy units are comparable across all of applied physics. So, we use automobiles to introduce terminology.
4
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy The Joule: Unit of energy. A 1 Gallon gas container holds 130 MJ of energy. 1 J = 1 W s. 1 W = 1 J/s. The Watt: Unit of power. A rate of energy (J/s). A gas pump hose delivers 6 MW. 120 KW: The power delivered by a Tesla Supercharger. Tesla Model S has a 306 MJ battery (good for 265 miles).
5
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Energy and Performance Sad fact: Computers turn electrical energy into heat. Computation is a byproduct. Air or water carries heat away, or chip melts.
6
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy + 1V - 1 Ohm Resistor 1A 1 Joule heats 1 gram of water 0.24 degree C This is how electric tea pots work... 1 Joule of Heat Energy per Second 1 Watt 20 W rating: Maximum power the package is able to transfer to the air. Exceed rating and resistor burns. The Watt: Unit of power. The amount of energy burned in the resistor in 1 second. The Joule: Unit of energy. Can also be expressed as Watt-Seconds. Burning 1 Watt for 100 seconds uses 100 Watt-Seconds of energy.
7
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Cooling an iPod nano... Like resistor on last slide, iPod relies on passive transfer of heat from case to the air. Why? Users don’t want fans in their pocket... To stay “cool to the touch” via passive cooling, power budget of 5 W. If iPod nano used 5W all the time, its battery would last 15 minutes...
8
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Powering an iPod nano (2005 edition) 1.2 W-hour battery: Can supply 1.2 watts of power for 1 hour. 1.2 W-hr / 5 W ≈ 15 minutes. Real specs for iPod nano : 14 hours for music, 4 hours for slide shows. 85 mW for music. 300 mW for slides. More W-hours require bigger battery and thus bigger “form factor” -- it wouldn’t be “nano” anymore :-).
9
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Finding the (2005) iPod nano CPU... Two 80 MHz CPUs. One CPU used for audio, one for slides. Low-power ARM roughly 1mW per MHz... variable clock, sleep modes. 85 mW system power realistic... A close relative...
10
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy The CPU is only part of power budget! “Amdahl’s Law for Power” If our CPU took no power at all to run, that would only double battery life! CPU LCD Backlight “other” LCD GPU 2004-era notebook running a full workload.
11
2010 nano 0.74 ounces (50% of 2005 Nano) What’s happened since 2005? “Up to” 24 hours audio playback. 70% improvement from 2005 nano. 0.39 W Hr (33% of 2005 Nano)
12
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Processors and Energy
13
Moore’s Law 2.6 Billion 1 Million 2 Thousand
14
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Main driver: device scaling... From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
15
1974: Dennard Scaling If we scale the gate length by a factor, how should we scale other aspects of transistor to get the “best” results? not scaled = 5 scaling
16
Dennard Scaling not scaled = 5 scaling Things we do: scale dimensions, doping, Vdd. What we get: 2 as many transistors at the same power density! Whose gates switch times faster! Power density scaling ended in 2003 (Pentium 4: 3.2GHz, 82W, 55M FETs). Why? We could no longer scale Vdd.
17
The Power Wall Why? We can no longer fully scale Vdd... because MOS transistor leakage current is no longer a negligible part of the power budget.
18
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Switching Energy: Fundamental Physics Every logic transition dissipates energy. How can we limit switching energy? (1) Reduce # of clock transitions. But we have work to do... (2) Reduce Vdd. But lowering Vdd limits the clock speed... (3) Fewer circuits. But more transistors can do more work. (4) Reduce C per node. One reason why we scale processes. V dd 1 2 C V dd E 0- >1 = 2 V dd 1 2 C V dd E 1- >0 = 2 C Strong result: Independent of technology.
19
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Scaling switching energy per gate... IC process scaling (“Moore’s Law”) From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005. Due to reducing V and C (length and width of Cs decrease, but plate distance gets smaller). Recent slope more shallow because V is being scaled less aggressively.
20
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy 0V = Second Factor: Leakage Currents Even when a logic gate isn’t switching, it burns power. Igate: Ideal capacitors have zero DC current. But modern transistor gates are a few atoms thick, and are not ideal. Isub: Even when this nFet is off, it passes an Ioff leakage current. We can engineer any Ioff we like, but a lower Ioff also results in a lower Ion, and thus a lower maximum clock speed. Intel’s 2006 processor designs, leakage vs switching power A lot of work was done to get a ratio this good... 50/50 is common. Bill Holt, Intel, Hot Chips 17.
21
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Engineering “On” Current at 25 nm... I ds VsVs VdVd V g 0.7 = V dd 0.25 ≈ V t I ds 1.2 mA = I on I off = 0 ??? We can increase I on by raising V dd and/or lowering V t.
22
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Plot on a “Log” Scale to See “Off” Current I ds VsVs VdVd V g I ds I off ≈ 10 nA We can decrease I off by raising V t - but that lowers I on. 0.25 ≈ V t 1.2 mA = I on 0.7 = V dd
23
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy I off ? I on ? Recall: Timing Lecture... I ds 0.25 ≈ V t 1.2 mA = I on 0.7 = V dd I off ≈ 10 nA A “on” n-FET empties the bucket. An “off” n-FET while bucket fills. I off I on Why on? Vdd >> V t Why open? Gnd << V t I ds = Current through n- FET
24
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Device engineers trade speed and power From: Silicon Device Scaling to the Sub-10-nm Regime Meikei Ieong, 1* Bruce Doris, 2 Jakub Kedzierski, 1 Ken Rim, 1 Min Yang 1 We can reduce leakage (P standby ) by raising V t. We can increase speed by raising V dd and lowering V t. We can reduce CV (P active ) by lowering V dd. 2
25
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Customize processes for product types... From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
26
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy V gs I ds Intel 22nm Process Transistor channel is a raised fin. Gate controls channel from sides and top. Channel depth is fin width. 12-15nm for L=22nm.
27
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Clock rates have flattened out, but...
28
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Performance: Put more transistors to work
29
Moore’s Law 2.6 Billion 1 Million 2 Thousand We still scale to get more transistors per unit area... but we use design techniques to reduce power.
30
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Takeaway Abstractions From Part I...
31
Small circuits can go very fast in standard CMOS... This oscillator runs at 210 GHz in a 32 nm SOI CMOS logic process, and consumes 42 mW... But if we used these techniques for a CPU, our 150 W air-cooled power limit would limit our design to using about ten thousand transistors...... but we are used to using 100s of millions!
32
The Power Wall Dennard scaling stopped working in 2004. Why? MOSFET off currents became non-negligible. We limit clock speed to prevent chip from melting.
33
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Dynamic Power: 4 ways to reduce it... Every logic transition dissipates energy. How can we limit switching energy? (1) Reduce # of clock transitions. But we have work to do... (2) Reduce Vdd. But lowering Vdd limits the clock speed... (3) Fewer circuits. But more transistors can do more work. (4) Reduce C per node. One reason why we scale processes. V dd 1 2 C V dd E 0- >1 = 2 V dd 1 2 C V dd E 1- >0 = 2 C Strong result: Independent of technology.
34
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy 0V = Static power: We trade off speed for power. Even when a logic gate isn’t switching, it burns power. Igate: Ideal capacitors have zero DC current. But modern transistor gates are a few atoms thick, and are not ideal. Isub: Even when this nFet is off, it passes an Ioff leakage current. We can engineer any Ioff we like, but a lower Ioff also results in a lower Ion, and thus a lower maximum clock speed. Intel’s 2006 processor designs, leakage vs switching power A lot of work was done to get a ratio this good... 50/50 is common. Bill Holt, Intel, Hot Chips 17.
35
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Factor of 60 in leakage vs. 3.5 in speed (40, 45, 50) are transistor channel lengths (in nm) Chart shows 9 different NAND gates for an IC process, each with a different speed vs. static power tradeoff.
36
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy FO4 (Delay, Power) vs Vdd -- 65nm FO4 is the delay of one inverter driving four additional inverters. Power in this plot includes dynamic and static.
37
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Six low-power design techniques Power-down idle transistors Parallelism and pipelining Slow down non-critical paths Clock gating Thermal management Data-dependent processing
38
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Trading Hardware for Power via Parallelism and Pipelining... Design Technique #1 (of 6)
39
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Gate delay roughly linear with Vdd This magic trick brought to you by Cory Hall... And so, we can transform this: Block processes stereo audio. 1/2 of clocks for “left”, 1/2 for “right”. P ~ F V dd 2 P ~ 1 1 2 Into this: Top block processes “left”, bottom “right”. CV 2 power only P ~ #blks F V dd 2 P ~ 2 1/2 1/4 = 1/4
40
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Chandrakasan & Brodersen (UCB, 1992) Simple Pipelined Parallel From:
41
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Multiple Cores for Low Power Trade hardware for power, on a large scale...
42
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Cell: The PS3 chip 2006
43
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Cell (PS3 Chip): 1 CPU + 8 “SPUs” PowerPC L2 Cache 512 KB Synergistic Processing Units (SPUs) 8
44
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy One Synergistic Processing Unit (SPU) 256 KB Local Store, 128 128-bit Registers SPU issues 2 inst/cycle (in order) to 7 execution units SPU fills Local Store using DMA to DRAM and network
45
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy A “Schmoo” plot for a Cell SPU... The lower Vdd, the longer the maximum clock period, the slower the clock frequency. 1 2 C V dd E 1- >0 = 2 1 2 C V dd E 0- >1 = 2 The lower Vdd, the less dynamic energy consumption.
46
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Clock speed alone doesn’t help E/op... 1 2 C V dd E 1- >0 = 2 1 2 C V dd E 0- >1 = But, lowering clock frequency while keeping voltage constant spreads the same amount of work over a longer time, so chip stays cooler... 2
47
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Scaling V and f does lower energy/op 7W to reliably get 4.4 GHz performance. 47C die temp. 1 W to get 2.2 GHz performance. 26 C die temp. If a program that needs a 4.4 Ghz CPU can be recoded to use two 2.2 Ghz CPUs... big win.
48
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy How iPod nano 2005 puts its 2 cores to use... Two 80 MHz CPUs. Was used in several nano generations, with one CPU doing audio decoding, the other doing photos, etc.
49
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy 2013 Macbook Air Haswell CPU/GPU Voltage range: 0.655V to 1.041V... 2.5x in CV 2 energy
50
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Powering down idle circuits Design Technique #2 (of 6)
51
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Add “sleep” transistors to logic... Example: Floating point unit logic. When running fixed-point instructions, put logic “to sleep”. +++ When “asleep”, leakage power is dramatically reduced. --- Presence of sleep transistors slows down the clock rate when the logic block is in use.
52
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Intel example: Sleeping cache blocks From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005. A tiny current supplied in “sleep” maintains SRAM state.
53
Intel Medfield
54
Switches 45 power “islands.” Fine-grained control of leakage power, to track user activity. “Race to idle” strategy -- finish tasks quickly, to get to power down.
55
Playing a game...
56
Watching a video...
57
Looking at phone screen, not doing anything...
58
Phone in your pocket, waiting for a call...
59
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Slow down “slack paths” Design Technique #3 (of 6)
60
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Fact: Most logic on a chip is “too fast” From “The circuit and physical design of the POWER4 microprocessor”, IBM J Res and Dev, 46:1, Jan 2002, J.D. Warnock et al. Most wires have hundreds of picoseconds to spare. The critical path
61
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Use several supply voltages on a chip... Why use multi-Vdd? We can reduce dynamic power by using low-power Vdd for logic off the critical path. From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005. What if we can’t do a multi-Vdd design? In a multi-Vt process, we can reduce leakage power on the slow logic by using high-Vth transistors.
62
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy From a chapter from new book on ASIC design by Chinnery and Keutzer (UCB). Logical partition into 0.8V and 1.0V nets done manually to meet 350 MHz spec (90nm). Level-shifter insertion and placement done automatically. Dynamic power in 0.8V section cut 50% below baseline. Leakage power in 1.0V section cut 70% below baseline.
63
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Gating clocks to save power Design Technique #4 (of 6)
64
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy On a CPU, where does the power go? From: Bose, Martonosi, Brooks: Sigmetrics-2001 Tutorial Half of the power goes to latches (Flip-Flops). Most of the time, the latches don’t change state. So (gasp) gated clocks are a big win. But, done with CAD tools in a disciplined way.
65
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Synopsis Design Compiler can do this... “Up to 70% power savings at the block level, for applicable circuits” Synopsis Data Sheet <=
66
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Power Compiler also can do this... 10-20% push-button power savings, using techniques like this one.
67
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Data-Dependent Processing Design Technique #5 (of 6)
68
Example: Video Decode Transform Most of the time, the inputs flip between small positive and negative integers. In 2’s complement, wastes power: +1: 0b00001 -1: 0b11110
69
Solution: Add bias value to all inputs 30+% power reduction for a bias of 64. For this linear transform, correcting the output for the bias is trivial.
70
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Thermal Management Design Technique #6 (of 6)
71
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy Keep chip cool to minimize leakage power A recipe for thermal runaway
72
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy IBM Power 4: How does die heat up? 4 dies on a multi-chip module 2 CPUs per die
73
UC Regents Spring 2014 © UCBCS 152: L7: Power and Energy 115 Watts: Concentrated in “hot spots” Fixed point units Hot spots Cache logic 66.8 C == 152 F 82 C == 179.6 F
74
Idea: Monitor temperature, servo clock speed TDP = Thermal Design Point
75
iPad Air, iPad Mini retina, and iPhone 5S all use the A7 Repeatedly running the same benchmark on three Apple products. The TDP of each form factor dictates how long it can run at “top speed” TDP = Thermal Design Point
76
On Thursday Time to market via chip verification. Have fun in section !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.