Power.

Slides:



Advertisements
Similar presentations
Topics Electrical properties of static combinational gates:
Advertisements

Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Clock Design Adopted from David Harris of Harvey Mudd College.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
EE42/100, Spring 2006Week 14a, Prof. White1 Week 14a Propagation delay of logic gates CMOS (complementary MOS) logic gates Pull-down and pull-up The basic.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
© Digital Integrated Circuits 2nd Inverter CMOS Inverter: Digital Workhorse  Best Figures of Merit in CMOS Family  Noise Immunity  Performance  Power/Buffer.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 5 – Power Prof. Luke Theogarajan
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Lecture 7: Power.
Lecture 7: Power.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
The CMOS Inverter Slides adapted from:
6.375 Complex Digital Systems Krste Asanovic March 7, 2007
Free Powerpoint Templates Page 1 Free Powerpoint Templates Low Power VLSI Design Dr Elwin Chandra Monie RMK Engineering College.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Low-Power CMOS Logic Circuit Topic Review 1 Part I: Overview (Shaw) Part II: (Vincent) Low-Power Design Through Voltage Scaling Estimation and Optimization.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
Low Power Techniques in Processor Design
Power Saving at Architectural Level Xiao Xing March 7, 2005.
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
Power Reduction for FPGA using Multiple Vdd/Vth
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Lecture 2 1 Computer Elements Transistors (computing) –How can they be connected to do something useful? –How do we evaluate how fast a logic block is?
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
CAD for Physical Design of VLSI Circuits
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
The George Washington University School of Engineering and Applied Science Department of Electrical and Computer Engineering ECE122 – Lab 7 MOSFET Parameters.
Washington State University
Why Low Power Testing? 台大電子所 李建模.
Guy Lemieux, Mehdi Alimadadi, Samad Sheikhaei, Shahriar Mirabbasi University of British Columbia, Canada Patrick Palmer University of Cambridge, UK SoC.
Leakage reduction techniques Three major leakage current components 1. Gate leakage ; ~ Vdd 4 2. Subthreshold ; ~ Vdd 3 3. P/N junction.
경종민 Low-Power Design for Embedded Processor.
Basics of Energy & Power Dissipation
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
Class Report 林常仁 Low Power Design: System and Algorithm Levels.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 19: March 28, 2012 Minimizing Energy.
ELEC Digital Logic Circuits Fall 2015 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
COE 360 Principles of VLSI Design Delay. 2 Definitions.
Power-Optimal Pipelining in Deep Submicron Technology
Smruti R. Sarangi IIT Delhi
CS203 – Advanced Computer Architecture
Temperature and Power Management
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
Hot Chips, Slow Wires, Leaky Transistors
Basics of Energy & Power Dissipation
Architecture & Organization 1
Architecture & Organization 1
Lecture 7: Power.
Lecture 7: Power.
Presentation transcript:

Power

Pareto-Optimal Points Lab 2 Results Pareto-Optimal Points

Standard Projects Two basic design projects Processor variants (based on lab1&2 testrigs) Non-blocking caches and memory system Possible project ideas on web site Must hand in proposal before quiz on March 18th, including: Team members (2 or 3 per team) Description of project, including the architecture exploration you will attempt

Non-Standard Projects Must hand in proposal early by class on March 14th, describing: Team members (2 or 3) The chip you want to design The existing reference code you will use to build a test rig, and the test strategy you will use The architectural exploration you will attempt

Power Trends 1000 100 Power (Watts) 10 1 0.1 1970 1980 1990 2000 2010 1000W CPU? 1000 Pentium® 4 proc 100 Power (Watts) 10 Pentium® proc 386 1 8086 8080 0.1 1970 1980 1990 2000 2010 2020 [ Source: Intel ] CMOS originally used for very low-power circuitry such as wristwatches Now some CPUs have power dissipation >100W

Power Concerns Power dissipation is limiting factor in many systems battery weight and life for portable devices packaging and cooling costs for tethered systems case temperature for laptop/wearable computers fan noise not acceptable in some settings Internet data center, ~8,000 servers,~2MW 25% of running cost is in electricity supply for supplying power and running air-conditioning to remove heat Environmental concerns ~2005, 1 billion PCs, 100W each => 100 GW 100 GW = 40 Hoover Dams

On-Chip Power Distribution Supply pad G Routed power distribution on two stacked layers of metal (one for VDD, one for GND). OK for low-cost, low-power designs with few layers of metal. A V G B V V G V G Power Grid. Interconnected vertical and horizontal power bars. Common on most high-performance designs. Often well over half of total metal on upper thicker layers used for VDD/GND. V V G G V V G G V G V G Via V G V G V V Dedicated VDD/GND planes. Very expensive. Only used on Alpha 21264. Simplified circuit analysis. Dropped on subsequent Alphas. G G V V G G V G V G

Power Dissipation in CMOS CL Diode Leakage Current Subthreshold Leakage Current Short-Circuit Current CapacitorCharging Current Gate Leakage Current Primary Components: Capacitor charging, energy is 1/2 CV2 per transition the dominant source of power dissipation today Short-circuit current, PMOS & NMOS both on during transition kept to <10% of capacitor charging current by making edges fast Subthreshold leakage, transistors don’t turn off completely approaching 10-40% of active power in <180nm technologies Diode leakage from parasitic source and drain diodes usually negligible Gate leakage from electrons tunneling across gate oxide was negligible, increasing due to very thin gate oxides

Energy to Charge Capacitor VDD Isupply Vout CL During 0->1 transition, energy CLVDD2 removed from power supply After transition, 1/2 CLVDD2 stored in capacitor, the other 1/2 CLVDD2 was dissipated as heat in pullup resistance The 1/2 CLVDD2 energy stored in capacitor is dissipated in the pulldown resistance on next 1->0 transition

Power Formula Power = activity * frequency * (1/2 CVDD2 + VDDISC) + VDDISubthreshold + VDDIDiode + VDDIGate Activity is average number of transitions per clock cycle (clock has two)

Power  activity * 1/2 CV2 * frequency Switching Power Power  activity * 1/2 CV2 * frequency Reduce activity Reduce switched capacitance C Reduce supply voltage V Reduce frequency

Reducing Activity with Clock Gating Global Clock Gated Local Clock Enable D Q Latch (transparent on clock low) Clock Gating don’t clock flip-flop if not needed avoids transitioning downstream logic enable adds to control logic complexity Pentium-4 has hundreds of gated clock domains Clock Enable Latched Enable Gated Clock

Reducing Activity with Data Gating Avoid data toggling in unused unit by gating off inputs A Shifter 1 B Adder Shifter infrequently used Shift/Add Select Shifter Adder 1 A B Could use transparent latch instead of AND gate to reduce number of transitions, but would be bigger and slower.

Other Ways to Reduce Activity Bus Encodings choose encodings that minimize transitions on average (e.g., Gray code for address bus) compression schemes (move fewer bits) Freeze “Don’t Cares” If a signal is a don’t’ care, then freeze last dynamic value (using a latch) rather than always forcing to a fixed 1 or 0. E.g., 1, X, 1, 0, X, 0 ===> 1, X=1, 1, 0, X=0, 0 Remove Glitches balance logic paths to avoid glitches during settling

Reducing Switched Capacitance Reduce switched capacitance C Careful transistor sizing (small transistors off critical path) Tighter layout (good floorplanning) Segmented structures (avoid switching long nets) A B C Bus Shared bus driven by A or B when sending values to C Insert switch to isolate bus segment when B sending to C A B C

Reducing Frequency Doesn’t save energy, just reduces rate at which it is consumed (lower power, but must run longer) Get some saving in battery life from reduction in rate of discharge

Reducing Supply Voltage Quadratic savings in energy per transition (1/2 CVDD2) Circuit speed is reduced Must lower clock frequency to maintain correctness Delay rises sharply as supply voltage approaches threshold voltages [Horowitz]

Voltage Scaling for Reduced Energy Reducing supply voltage by 0.5 improves energy per transition by ~0.25 Performance is reduced – need to use slower clock Can regain performance with parallel architecture Alternatively, can trade surplus performance for lower energy by reducing supply voltage until “just enough” performance Dynamic Voltage Scaling

Parallel Architectures Reduce Energy at Constant Throughput 8-bit adder/comparator 40MHz at 5V, area = 530 km2 Base power Pref Two parallel interleaved adder/compare units 20MHz at 2.9V, area = 1,800 km2 (3.4x) Power = 0.36 Pref One pipelined adder/compare unit 40MHz at 2.9V, area = 690 km2 (1.3x) Power = 0.39 Pref Pipelined and parallel 20MHz at 2.0V, area = 1,961 km2 (3.7x) Power = 0.2 Pref Chandrakasan et. al. “Low-Power CMOS Digital Design”, IEEE JSSC 27(4), April 1992

“Just Enough” Performance t=deadline Time Frequency Run fast then stop Run slower and just meet deadline Save energy by reducing frequency and voltage to minimum necessary

Voltage Scaling on Transmeta Crusoe TM5400 Frequency (MHz) Relative Performance (%) Voltage (V) Relative Energy (%) Relative Power 700 100.0 1.65 600 85.7 1.60 94.0 80.6 500 71.4 1.50 82.6 59.0 400 57.1 1.40 72.0 41.4 300 42.9 1.25 57.4 24.6 200 28.6 1.10 44.4 12.7

Leakage Power Under ideal scaling, want to reduce threshold voltage as fast as supply voltage But subthreshold leakage is an exponential function of threshold voltage and temperature [ Butts, Micro 2000]

Rise in Leakage Power 250 250 120% 120% Active Power 200 200 100% 100% Active Leakage power 80% 80% 150 150 Power (Watts) 60% 60% 100 100 40% 40% 50 50 20% 20% 0% 0% 0.25m 0.25m 0.18m 0.18m 0.13m 0.13m 0.1m 0.1m 0.07m 0.07m Technology Technology [ Intel]

Design-Time Leakage Reduction Use slow, low-leakage transistors off critical path leakage proportional to device width, so use smallest devices off critical path leakage drops greatly with stacked devices (acts as drain voltage divider), so use more highly stacked gates off critical path leakage drops with increasing channel length, so slightly increase length off critical path dual VT - process engineers can provide two thresholds (at extra cost) use high VT off critical path (modern cell libraries often have multiple VT)

Critical Path Leakage Critical paths dominate leakage after applying design-time leakage reduction techniques Example: PowerPC 750 5% of transistor width is low Vt, but these account for >50% of total leakage Possible approach, run-time leakage reduction switch off critical path transistors when not needed - LVT transistors account for 5% of total transistor width, but 50% of total leakage power (PowerPC 750)

Run-Time Leakage Reduction Vbody > Vdd Gate Source Drain Body Body Biasing Vt increase by reverse-biased body effect Large transition time and wakeup latency due to well cap and resistance Power Gating Sleep transistor between supply and virtual supply lines Increased delay due to sleep transistor Sleep Vector Input vector which minimizes leakage Increased delay due to mux and active energy due to spurious toggles after applying sleep vector Sleep signal Virtual Vdd Vdd Logic cells

Power Reduction for Cell-Based Designs Minimize activity Use clock gating to avoid toggling flip-flops Partition designs so minimal number of components activated to perform each operation Floorplan units to reduce length of most active wires Use lowest voltage and slowest frequency necessary to reach target performance Use pipelined architectures to allow fewer gates to reach target performance (reduces leakage) After pipelining, use parallelism to further reduce needed frequency and voltage if possible Always use energy-delay plots to understand power tradeoffs

Constant Energy-Delay Product Energy versus Delay Energy A B C Constant Energy-Delay Product D Delay Can try to compress this 2D information into single number Energy*Delay product Energy*Delay2 – gives more weight to speed, mostly insensitive to supply voltage Many techniques can exchange energy for delay Single number (ED, ED2) often misleading for real designs usually want minimum energy for given delay or minimum delay for given power budget can’t scale all techniques across range of interest To fully compare alternatives, should plot E-D curve for each solution

Energy versus Delay A better B better Energy Architecture A Architecture B Delay (1/performance) Should always compare architectures at the same performance level or at the same energy Can always trade performance for energy using voltage/frequency scaling Other techniques can trade performance for energy consumption (e.g., less pipelining, fewer parallel execution units, smaller caches, etc)

Temperature Hot Spots Not just total power, but power density is a problem for modern high-performance chips Some parts of the chip get much hotter than others Transistors get slower when hotter Leakage gets exponentially worse (can get thermal runaway with positive feedback between temperature and leakage power) Chip reliability suffers Few good solutions as yet Better floorplanning to spread hot units across chip Activity migration, to move computation from hot units to cold units More expensive packaging (liquid cooling)

Itanium Temperature Plot Execution core 120oC Cache 70°C Integer & FP ALUs Temp (oC) [ Source: Intel ]