Why Power Matters Packaging costs Power supply rail design

Slides:



Advertisements
Similar presentations
Topics Electrical properties of static combinational gates:
Advertisements

Digital Integrated Circuits© Prentice Hall 1995 Low Power Design Low Power Design in CMOS.
Elettronica T A.A Digital Integrated Circuits © Prentice Hall 2003 Inverter CMOS INVERTER.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Low Power Design in CMOS [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
EE42/100, Spring 2006Week 14a, Prof. White1 Week 14a Propagation delay of logic gates CMOS (complementary MOS) logic gates Pull-down and pull-up The basic.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
© Digital Integrated Circuits 2nd Inverter CMOS Inverter: Digital Workhorse  Best Figures of Merit in CMOS Family  Noise Immunity  Performance  Power/Buffer.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 Low Power Design in Microarchitectures and Memories [Adapted from Mary Jane Irwin (
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
8/23-25/05ELEC / Lecture 21 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Designing for Low Power
Fall 06, Sep 14 ELEC / Lecture 5 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits (Formerly ELEC / )
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Lecture 7: Power.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
The CMOS Inverter Slides adapted from:
Micro transductors ’08 Low Power VLSI Design 1 Dr.-Ing. Frank Sill Department of Electrical Engineering, Federal University of Minas Gerais, Av. Antônio.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Low Power VLSI Design Dr Elwin Chandra Monie RMK Engineering College.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Designing for Low Power Mary Jane Irwin ( )
CSE477 L26 System Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Low Power Techniques in Microarchitectures and Memories Mary Jane.
Gheorghe M. Ştefan
Ch 10 MOSFETs and MOS Digital Circuits
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 19: October 16, 2013 Energy and Power.
CPE 626 Advanced VLSI Design Lecture 8: Power and Designing for Low Power Aleksandar Milenkovic
Review: CMOS Inverter: Dynamic
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
1. Department of Electronics Engineering Sahand University of Technology NMOS inverter with an n-channel enhancement-mode mosfet with the gate connected.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 19: October 15, 2014 Energy and Power.
1 Power Dissipation in CMOS Two Components contribute to the power dissipation: »Static Power Dissipation –Leakage current –Sub-threshold current »Dynamic.
Sub-threshold Design of Ultra Low Power CMOS Circuits Students: Dmitry Vaysman Alexander Gertsman Supervisors: Prof. Natan Kopeika Prof. Orly Yadid-Pecht.
Basics of Energy & Power Dissipation Lecture notes S. Yalamanchili, S. Mukhopadhyay. A. Chowdhary.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Chapter 07 Electronic Analysis of CMOS Logic Gates
Review: Designing Inverters for Performance  Reduce C L l internal diffusion capacitance of the gate itself l interconnect capacitance l fanout  Increase.
VLSI Design Lecture 5: Logic Gates Mohammad Arjomand CE Department Sharif Univ. of Tech. Adapted with modifications from Wayne Wolf’s lecture notes.
CSE477 L07 Pass Transistor Logic.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 07: Pass Transistor Logic Mary Jane Irwin (
Leakage reduction techniques Three major leakage current components 1. Gate leakage ; ~ Vdd 4 2. Subthreshold ; ~ Vdd 3 3. P/N junction.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 18: October 14, 2013 Energy and Power.
EE141 © Digital Integrated Circuits 2nd Devices 1 Goal of this lecture  Present understanding of device operation  nMOS/pMOS as switches  How to design.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 17: October 19, 2011 Energy and Power.
CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 12&13: Designing for Low Power Mary Jane Irwin (
Inverter Chapter 5 The Inverter April 10, Inverter Objective of This Chapter  Use Inverter to know basic CMOS Circuits Operations  Watch for performance.
EE141 © Digital Integrated Circuits 2nd Inverter 1 Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje.
Basics of Energy & Power Dissipation
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Sp09 CMPEN 411 L14 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 14: Designing for Low Power [Adapted from Rabaey’s Digital Integrated Circuits,
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
Solid-State Devices & Circuits
The MOS Transistor Polysilicon Aluminum. The NMOS Transistor Cross Section n areas have been doped with donor ions (arsenic) of concentration N D - electrons.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 12&13: Designing for Low Power Mary Jane Irwin (
Damu, 2008EGE535 Fall 08, Lecture 51 EGE535 Low Power VLSI Design Lecture #5 & 6 CMOS Inverter.
Lecture 10: Designing for Low Power. Review: Designing Fast CMOS Gates  Transistor sizing  Progressive transistor sizing l fet closest to the output.
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Lecture 7: Power.
Lecture 7: Power.
Presentation transcript:

Why Power Matters Packaging costs Power supply rail design Chip and system cooling costs Noise immunity and system reliability Battery life (in portable systems) Environmental concerns Office equipment accounted for 5% of total US commercial energy usage in 1993 Energy Star compliant systems

Why worry about power? -- Power Dissipation Lead microprocessors power continues to increase 100 P6 Pentium ® 10 486 286 Power (Watts) 8086 386 8085 1 8080 8008 4004 0.1 1971 1974 1978 1985 1992 2000 Year Power delivery and dissipation will be prohibitive Source: Borkar, De Intel

Why worry about power? -- Chip Power Density Sun’s Surface 4004 8008 8080 8085 8086 286 386 486 Pentium® P6 1 10 100 1000 10000 1970 1980 1990 2000 2010 Year Power Density (W/cm2) Rocket Nozzle Nuclear Reactor …chips might become hot… Hot Plate Source: Borkar, De Intel

Chip Power Density Distribution Power Map On-Die Temperature Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots Impact on packaging, w.r.t. cooling

Why worry about power? -- Standby Power Year 2002 2005 2008 2011 2014 Power supply Vdd (V) 1.5 1.2 0.9 0.7 0.6 Threshold VT (V) 0.4 0.35 0.3 0.25 Drain leakage will increase as VT decreases to maintain noise margins and meet frequency demands, leading to excessive battery draining standby power consumption. 8KW 1.7KW 400W 88W 12W 0% 10% 20% 30% 40% 50% 2000 2002 2004 2006 2008 Standby Power …and phones leaky! Source: Borkar, De Intel

Power and Energy Figures of Merit Power consumption in Watts determines battery life in hours Peak power determines power ground wiring designs sets packaging limits impacts signal noise margin and reliability analysis Energy efficiency in Joules rate at which power is consumed over time Energy = power * delay Joules = Watts * seconds lower energy number means less power to perform a computation at the same frequency power is the rate at which energy is delivered or exchanged; power dissipation is the rate at which energy is taken from the source (Vdd) and converted into heat (electrical energy is converted into heat energy during operation) Hard to get large current into a chip (amps of current) – 30W at 3V is 10amps

Power versus Energy Power is height of curve Watts Lower power design could simply be slower Approach 1 Approach 2 time Energy is area under curve Watts Two approaches require the same energy Approach 1 Energy – changing the operating frequency does not change the energy consumption! Approach 2 time

PDP and EDP Power-delay product (PDP) = Pav * tp = (CLVDD2)/2 PDP is the average energy consumed per switching event (Watts * sec = Joule) lower power design could simply be a slower design Energy-delay product (EDP) = PDP * tp = Pav * tp2 EDP is the average energy consumed multiplied by the computation time required takes into account that one can trade increased delay for lower energy/operation (e.g., via supply voltage scaling that increases delay, but decreases energy consumption) energy-delay energy delay PDP stands for the average energy consumed per switching even – so its CL VDD**2 over 2 As each inverter cycle contains a 0->1 and a 1->0 transition, Eav is twice the PDP For a given structure the PDP may be made arbitrarily low by reducing the supply voltage that comes at the expense of performance. EDP is the preferred metric – since it takes performance into account The optimum supply voltage can be derived (as in the book) as VDDopt = 3/2 VTE where VTE = VT + VDSAT/2. This value of VDD optimizes both performance and energy simultaneously. For technologies with VT’s in the range of 0.5V, the optimum supply is around 1V as shown in the plot (for our generic parameters of 0.43 VTn and –0.4 VTp it is 1.2V) Ignores standby power issues, and microarchitecture optimization issues (e.g., pipelining) allows one to understand tradeoffs better

Understanding Tradeoffs Which design is the “best” (fastest, coolest, both) ? better b Energy a c d For class handout 1/Delay better

Understanding Tradeoffs Which design is the “best” (fastest, coolest, both) ? Lower EDP better b Energy a c d For lecture Clearly a is “better” than c and c is “better” than b, but how about b and d? Or a and d? Constant EDP’s are the straight lines in the graph 1/Delay better

CMOS Energy & Power Equations E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage f01 = P01 * fclock f0->1 represents the energy consuming transition Dynamic power Short-circuit power Leakage power

Dynamic Power Consumption Vdd Vin Vout CL f01 Energy/transition = CL * VDD2 * P01 Pdyn = Energy/transition * f = CL * VDD2 * P01 * f Pdyn = CEFF * VDD2 * f where CEFF = P01 CL Half of the energy is dissipated in the PMOS device, the remainder is stored on the load capacitor. Notice that this energy dissipation is independent of the size (and hence the resistance) of the PMOS device. During the high-to-low transition, this capacitor is discharged and the stored energy is dissipated in the NMOS transistor. P0->1 is the transition probability - probability that the output is going to make the transition from 0 to 1 (the energy consuming transition). Ceff is the effective capacitance representing the average capacitance switched every clock cycle Dynamic power accounts for 60 to 80% of the total power consumption of today’s parts. Advances in technology result in ever higher values for f0->1 (as tp decreases). At the same time the total capacitance on the chip (CL) increases as more and more gates are placed on a single die. With CL = 6fF, Edyn = 37.5 fJoules (for 2.5 V supply). If clocked at the maximum rate (T = 1/f = tpLH + tpHL = 2tp) then Pdyn = Edyn/(2tp) = 580 microWatts Not a function of transistor sizes! Data dependent - a function of switching activity!

Lowering Dynamic Power Capacitance: Function of fan-out, wire length, transistor sizes Supply Voltage: Has been dropping with successive generations Pdyn = CL VDD2 P01 f Activity factor: How often, on average, do wires switch? Clock frequency: Increasing… Lowering CL Improves performance as well Keep transistors minimum size (keeps intrinsic capacitance (gate and diffusion) small) Transistors should be sized only when CL is dominated by extrinsic capacitance (fanout and wires) Reducing VDD has a quadratic effect! But has a negative effect on performance especially as VDD approaches 2VT Reducing the switching activity, f01 = P01 * f A function of signal statistics and clock rate Impacted by logic and architecture design decisions One alternative is to lower the supply voltage as much as possible and to compensate for the loss in performance by increasing the transistor sizes. Yet this causes the capacitance to increase. At a low enough supply voltage, the capacitance (for speed) may start to dominate the power equation causing energy to increase with further drops in supply.

Short Circuit Power Consumption Vin Isc Vout CL Accounts for 20 to 40% of power of today’s technology Finite slope of the input signal causes a direct current path between VDD and GND for a short period of time during switching when both the NMOS and PMOS transistors are conducting.

Short Circuit Currents Determinates Esc = tsc VDD Ipeak P01 Psc = tsc VDD Ipeak f01 Duration and slope of the input signal, tsc Ipeak determined by the saturation current of the P and N transistors which depend on their sizes, process technology, temperature, etc. strong function of the ratio between input and output slopes a function of CL Direct path currents tsc is when both devices are conducting – peak and duration of Isc both increase as the input slope decreases

Impact of CL on Psc Large capacitive load Isc  0 Isc  Imax Vin Vout Vin Vout CL CL Large capacitive load Output fall time significantly larger than input rise time. Small capacitive load Output fall time substantially smaller than the input rise time. Left case - input moves through the transient region before the output starts to change. As the source-drain voltage of the PMOS is approximately 0 during that period, the device shuts off without ever delivering any current, so Isc is close to zero. Right case - Drain-source voltage of PMOS equals VDD for most of the transition period, giving maximum Isc

Ipeak as a Function of CL x 10-4 When load capacitance is small, Ipeak is large. CL = 20 fF Ipeak (A) CL = 100 fF Short circuit dissipation is minimized by matching the rise/fall times of the input and output signals - slope engineering. CL = 500 fF Making the output rise/fall time too large slows down the circuit and can cause short-circuit currents in the fan-out gates! SLOPE ENGINEERING – is important for BOTH speed and power consumption However, its not the optimum solution for a gate on its own (just keeps the overall short-circuit current within bounds). For a single inverter, see next slide . . . x 10-10 time (sec) 500 psec input slope

Psc as a Function of Rise/Fall Times When load capacitance is small (tsin/tsout > 2 for VDD > 2V) the power is dominated by Psc VDD= 3.3 V P normalized VDD = 2.5 V If VDD < VTn + |VTp| then Psc is eliminated since both devices are never on at the same time. For large capacitance values, all the power dissipation is devoted to charging and discharging the load capacitance. When the rise/fall times of inputs and outputs are equalized, most power dissipation is associated with dynamic power and only a minor fraction (<10%) is devoted to Psc. At VDD=2.5V and VTs around 0.5V, an input/output slope ratio of 2 is needed to cause a 10% degradation in power dissipation. Also notice that short-circuit current is reduced when we lower the supply voltage. In the extreme, when VDD < VTn + |VTp|, short-circuit current is completely eliminated. With threshold voltages scaling at a slower rate than the supply voltage, Psc is becoming less important in vDSM. VDD = 1.5V tsin/tsout W/Lp = 1.125 m/0.25 m W/Ln = 0.375 m/0.25 m CL = 30 fF normalized wrt zero input rise-time dissipation

Leakage (Static) Power Consumption VDD Ileakage Vout Drain junction leakage Sub-threshold current Gate leakage For drain junction: Leakage current per unit drain area typically ranges between 10 and 100 picoA/micron**2 at room temperature (25 degrees C) for 0.25 micron CMOS (1 million gates, each with drain area of 0.5 micron**2 = 0.125 milliW). However, values increase with increasing junction temperature - exponentially! JS doubles for every 9 deg C! At 85 degrees C (commonly imposed upper bound for junction temperatures) the leakage currents increase by a factor of 60 over room temperature. As temperature is a strong function of the dissipated heat and its removal mechanisms – limit power heat and use chip packages that support efficient heat removal. Have to be more concerned about sub-threshold current. The closer the threshold voltage is to zero, the larger the leakage current at VGS = 0 V. To offset this effect, threshold voltages are not scaled as aggressively as supply voltages (narrowing noise margins). Unfortunately, scaling the supply voltage and not scaling threshold hurts performance, esp as VDD approaches 2 VT. Sub-threshold current is the dominant factor. All increase exponentially with temperature!

Leakage as a Function of VT Continued scaling of supply voltage and the subsequent scaling of threshold voltage will make subthreshold conduction a dominate component of power dissipation. 10-2 An 90mV/decade VT roll-off - so each 255mV increase in VT gives 3 orders of magnitude reduction in leakage (but adversely affects performance) 10-7 The choice of VT represents a trade-off between performance and static power dissipation. Process technologies with sharper turn-off characteristics (like SOI with slope factors closer to the ideal 60mV/decade) will become more attractive. With sizable static power dissipation, it is essential that non-active modules are powered-down (put in standby) by disconnecting the unit from the supply rails or by lowering the supply voltage. There are other leakage factors that we are ignoring here including: Drain-Induced Barrier Lowering (I3) Gate-Induced Drain Leakage (I4) Punchthrough (I5) Narrow Width Effect (I6) Gate Oxide Tunneling (I7) Hot Carrier(I8) 10-12

TSMC Processes Leakage and VT 80 0.25 V 13,000 920/400 0.08 m 24 Å 1.2 V CL013 HS 52 0.29 V 1,800 860/370 0.11 m 29 Å 1.5 V CL015 HS 42 Å Tox (effective) 43 14 22 30 FET Perf. (GHz) 0.40 V 0.73 V 0.63 V 0.42 V VTn 300 0.15 1.60 20 Ioff (leakage) (A/m) 780/360 320/130 500/180 600/260 IDSat (n/p) (A/m) 0.13 m 0.18 m 0.16 m Lgate 2 V 1.8 V Vdd CL018 HS CL018 ULP CL018 LP CL018 G From MPR, June 2000, pp. 19 – Performance of various TSMC processes (G generic, LP low power, ULP ultra low power, HS high speed) From MPR, 2000

Exponential Increase in Leakage Currents Ileakage(nA/m) Note y axis is log – doubles for every 10 degree increase in temperature Will leakage power still be 6 orders of magnitude smaller (as it is today) than dynamic power for future technologies even at room temperature. Example – the following configurations have the same power performance: 3V VDD, 0.7V VT and a 0.45V VDD, 0.1V VT The dynamic power consumption of the latter is, however, 45 times smaller. Temp(C) From De,1999

Review: Energy & Power Equations E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage f01 = P01 * fclock Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) f0->1 represents the energy consuming transition

Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Variable VT + Variable VT Columns are enable time – when they are implemented Rows are targeted dissipation source

Dynamic Power as a Function of Device Size Device sizing affects dynamic energy consumption gain is largest for networks with large overall effective fan-outs (F = CL/Cg,1) The optimal gate sizing factor (f) for dynamic energy is smaller than the one for performance, especially for large F’s e.g., for F=20, fopt(energy) = 3.53 while fopt(performance) = 4.47 If energy is a concern avoid oversizing beyond the optimal 1.5 F=5 F=1 F=2 1 F=10 F=20 normalized energy Device sizing COMBINED with supply voltage reduction is a very effective way to reduce the energy consumption of a logic network. Especially true for networks with large effective fanout (F) where energy reductions of a factor of 10 can be obtained (except when F=1 when minimum size should be used). Oversizing comes at a hefty price in energy. The optimal sizing factor for energy is smaller than the one for performance. 0.5 1 2 3 4 5 6 7 f From Nikolic, UCB

Dynamic Power Consumption is Data Dependent Switching activity, P01, has two components A static component – function of the logic topology A dynamic component – function of the timing behavior (glitching) Static transition probability P01 = Pout=0 x Pout=1 = P0 x (1-P0) 2-input NOR Gate A B Out 1 With input signal probabilities PA=1 = 1/2 PB=1 = 1/2 Assumes inputs of 0 and 1 are equally likely. Take away is that output probabilities are NOT uniform NOR static transition probability = 3/4 x 1/4 = 3/16

NOR Gate Transition Probabilities Switching activity is a strong function of the input signal statistics PA and PB are the probabilities that inputs A and B are one A B A B CL Understanding the signal statistics and their impact on switching events can be used to significantly impact the power dissipation. Observe how the graph degrades into the simple inverter case when one of the input probabilities is set to 0 PA 1 1 PB P01 = P0 x P1 = (1-(1-PA)(1-PB)) (1-PA)(1-PB)

Transition Probabilities for Some Basic Gates P01 = Pout=0 x Pout=1 NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB) OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB)) NAND PAPB x (1 - PAPB) AND (1 - PAPB) x PAPB XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB) X 0.5 A Z For class handout 0.5 B For X: P01 = For Z: P01 =

Transition Probabilities for Some Basic Gates P01 = Pout=0 x Pout=1 NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB) OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB)) NAND PAPB x (1 - PAPB) AND (1 - PAPB) x PAPB XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB) X 0.5 A Z For lecture. Ignoring signal statistics can result in substantial errors in energy/power estimation Need to look at the truth tables to understand the equations. Computation of the probabilities is straightforward: signal and transition probabilities are evaluated in an ordered fashion progressing from input to output node. Approach has two major limitations: 1-it does not deal with circuits with feedback 2-it assumes that the signal probabilities at the input of each gate are independent. 0.5 B For X: P01 = P0 x P1 = (1-PA) PA = 0.5 x 0.5 = 0.25 For Z: P01 = P0 x P1 = (1-PXPB) PXPB = (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16

Inter-signal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan-out 0.5 A X B 0.5 Z Reconvergent fan-out For class handout. Assume PA = PB = 0.5 P(Z=1) = P(B=1) & P(A=1 | B=1) Have to use conditional probabilities

Inter-signal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan-out (1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16 0.5 A X 0.5 B Z (1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085 Reconvergent For lecture. Even if the primary inputs are uncorrelated, the signals become correlated (“colored”) as they propagate through the logic network. Assume PA = PB = 0.5 But notice that Z = (A or B) and B = AB or B = B, so 0 -> 1 should be (and is) 1/2 x 1/2 = 1/4 !!! P(Z=1) = P(B=1) & P(A=1 | B=1) Have to use conditional probabilities

Ignores glitching effects Logic Restructuring Logic restructuring: changing the topology of a logic network to reduce transitions AND: P01 = P0 x P1 = (1 - PAPB) x PAPB 3/16 0.5 A Y 0.5 (1-0.25)*0.25 = 3/16 A B W 7/64 0.5 15/256 X B F 15/256 0.5 0.5 C C F 0.5 D D Z 0.5 0.5 3/16 Look at designing for speed – 8-input AND gate. Which implementation is lower energy? Which is lower delay? So which is better overall? Also look at slide speed.19, Design Technique 3 – when deciding which configuration consumes less power and has the best performance Chain implementation has a lower overall switching activity than the tree implementation for random inputs Ignores glitching effects

Input Ordering 0.2 0.5 B A X X C B F F 0.1 A 0.2 C 0.5 0.1 For class handout Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)

Input Ordering (1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196 0.2 0.5 B A X X C B F F 0.1 A 0.2 C 0.5 0.1 For lecture Activity at output node, F, equal in both cases Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)

Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value A X B Z C ABC 101 000 For class handout X Z Unit Delay

Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value A X B Z C ABC 101 000 For lecture Assumes unit delay model (gates have non zero delay) Shaded area is a glitch (aka critical race, dynamic hazard) Unit Delay X Z

Glitching in an RCA Cin S0 S15 S14 S2 S1 S3 S4 S15 Cin S2 S5 S10 S1 S0 Due to ripple of carry from cin to Add15 Cin S2 S5 S10 S1 S0

Balanced Delay Paths to Reduce Glitching Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs F1 F2 F3 1 F1 1 F2 2 F3 If you can arrange it so that all the inputs change simultaneously -> no glitching Making the path lengths to the inputs of a gate approximately the same is usually sufficient to eliminate glitches - delay balancing So equalize the lengths of timing paths through logic

Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Variable VT + Variable VT

Dynamic Power as a Function of VDD Decreasing the VDD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) tp(normalized) VDD (V) Propagation delay of a CMOS inverter as a function of supply voltage (normalized wrt delay at 2.5V supply). While the delay is relatively insensitive to supply variations for higher values of VDD, a sharp increase can be observed starting around 2VT. This operation regions should be avoided for high performance! Increasing VDD also has reliability concerns - oxide breakdown, hot-electron effects - that enforce firm upper bounds on the supply voltage in deep submicron processes. Lowering VDD slows down the gate! Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits).

Multiple VDD Considerations How many VDD? – Two is becoming common Many chips already have two supplies (one for core and one for I/O) When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up) If a gate supplied with VDDL drives a gate at VDDH, the PMOS never turns off The cross-coupled PMOS transistors do the level conversion The NMOS transistor operate on a reduced supply Level converters are not needed for a step-down change in voltage Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47) VDDH Vin Vout VDDL The delay of the level converter is quite sensitive to transistor sizing and supply voltage fluctuations. For a low VDDL, the delay can become very long. (Since the NMOS operate with a reduced drive (VDDL-VT) they have to be made larger to be able to overpower the feedback).

Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Clustered voltage-scaling Each path starts with VDDH and switches to VDDL (gray logic gates) when delay slack is available Level conversion is done in the flipflops at the end of the paths A number of studies have shown that for typical delay path distributions, adding more supplies (than two) yields only marginal additional savings. When using clustered voltage scaling, the dual-supply approach is more effective when large capacitances are concentrated towards the end of the logic block (such as in buffer chains).

Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active Logic Design Reduced Vdd Sizing Multi-Vdd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage + Multi-VT Sleep Transistors Variable VT + Variable VT

VT = VT0 + (|-2F + VSB| - |-2F|) Stack Effect Leakage is a function of the circuit topology and the value of the inputs VT = VT0 + (|-2F + VSB| - |-2F|) where VT0 is the threshold voltage at VSB = 0; VSB is the source- bulk (substrate) voltage;  is the body-effect coefficient A B VX ISUB VT ln(1+n) VGS=VBS= -VX 1 VGS=VBS=0 VDD-VT VSG=VSB=0 A B Out Maximum leakage reduction occurs when all the transistors in the stack are off and the intermediate node voltages research their steady state value. n is empirical parameter, with n >= 1 and typically ranging around 1.5. For an ideal transistor with the sharpest possible roll-off, n = 1 (where S = 60 mV/decade which means that the subthreshold current drops by a factor of 10 for a reduction in VGS of 60mV). Unfortunately, n is more like 1.5 for actual devices (so S = 90 mV/decade). The current roll-off is further decreased by a rise in the operating temperature. A VX B Leakage is least when A = B = 0 Leakage reduction due to stacked transistors is called the stack effect

Short Channel Factors and Stack Effect In short-channel devices, the subthreshold leakage current depends on VGS,VBS and VDS. The VT of a short-channel device decreases with increasing VDS due to DIBL (drain- induced barrier loading). Typical values for DIBL are 20 to 150mV change in VT per voltage change in VDS so the stack effect is even more significant for short-channel devices. VX reduces the drain-source voltage of the top nfet, increasing its VT and lowering its leakage For our 0.25 micron technology, VX settles to ~100mV in steady state so VBS = -100mV and VDS = VDD -100mV which is 20 times smaller than the leakage of a device with VBS = 0mV and VDS = VDD

Leakage as a Function of Design Time VT Reducing the VT increases the sub-threshold leakage current (exponentially) 90mV reduction in VT increases leakage by an order of magnitude But, reducing VT decreases gate delay (increases performance) Most sub-0.25micron CMOS technologies offer two types of n- and p-type transistors with thresholds differing by about 100mV. The higher threshold device has leakage current about one order of magnitude lower than the lower threshold device a the expense of a ~ 30% reduction in active current (i.e., lower performance). Note that the use of multiple thresholds does not require level converters and can be done on a per-cell transistor basis; clustering of the logic is not required (as in multiple VDD). Does incur some small area penalty. Also gives a small reduction in active power due to the reduced gate-to-channel capacitance in the off state and a small reduction in signal swing on the internal nodes of a gate (VDD – VTH) (partially offset by increase source and drain junction sidewall capacitance) - its only about 4% Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control. A careful assignment of VT’s can reduce the leakage by as much as 80%

Dual-Thresholds Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Use lower threshold on timing-critical paths Assignment can be done on a per gate or transistor basis; no clustering of the logic is needed No level converters are needed

Variable VT (ABB) at Run Time VT = VT0 + (|-2F + VSB| - |-2F|) For an n-channel device, the substrate is normally tied to ground (VSB = 0) A negative bias on VSB causes VT to increase Adjusting the substrate bias at run time is called adaptive body-biasing (ABB) Requires a dual well fab process VT (V) Requires a dual well fab process VSB (V)