CSE477 L12&13 Low Power.1Irwin&Vijay, PSU, 2002 TKT-1527 Digital System Design Issues Designing for Low Power Mary Jane Irwin ( ) [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
CSE477 L12&13 Low Power.2Irwin&Vijay, PSU, 2002 Review: Designing Fast CMOS Gates Transistor sizing Progressive transistor sizing l fet closest to the output is smallest of series fets Transistor ordering l put latest arriving signal closest to the output Logic structure reordering l replace large fan-in gates with smaller fan-in gate network Logical effort Buffer (inverter) insertion l separate large fan-in from large C L with buffers l uses buffers so there are no more than four TGs in series
CSE477 L12&13 Low Power.3Irwin&Vijay, PSU, 2002 Why Power Matters Packaging costs Power supply rail design Chip and system cooling costs Noise immunity and system reliability Battery life (in portable systems) Environmental concerns l Office equipment accounted for 5% of total US commercial energy usage in 1993 l Energy Star compliant systems
CSE477 L12&13 Low Power.4Irwin&Vijay, PSU, 2002 Why worry about power? -- Power Dissipation P6 Pentium ® Year Power (Watts) Lead microprocessors power continues to increase Power delivery and dissipation will be prohibitive Source: Borkar, De Intel
CSE477 L12&13 Low Power.5Irwin&Vijay, PSU, 2002 Why worry about power? -- Chip Power Density Pentium® P Year Power Density (W/cm2) Hot Plate Nuclear Reactor Rocket Nozzle Sun’s Surface …chips might become hot… Source: Borkar, De Intel
CSE477 L12&13 Low Power.6Irwin&Vijay, PSU, 2002 Chip Power Density Distribution Power density is not uniformly distributed across the chip Silicon is not a good heat conductor Max junction temperature is determined by hot-spots l Impact on packaging, w.r.t. cooling Power Map On-Die Temperature
CSE477 L12&13 Low Power.7Irwin&Vijay, PSU, 2002 Power and Energy Figures of Merit Power consumption in Watts l determines battery life in hours Peak power l determines power ground wiring designs l sets packaging limits l impacts signal noise margin and reliability analysis Energy efficiency in Joules l rate at which power is consumed over time Energy = power * delay l Joules = Watts * seconds l lower energy number means less power to perform a computation at the same frequency
CSE477 L12&13 Low Power.8Irwin&Vijay, PSU, 2002 Power versus Energy Watts time Power is height of curve Watts time Approach 1 Approach 2 Approach 1 Energy is area under curve Lower power design could simply be slower Two approaches require the same energy
CSE477 L12&13 Low Power.9Irwin&Vijay, PSU, 2002 PDP and EDP Power-delay product (PDP) = P av * t p = (C L V DD 2 )/2 l PDP is the average energy consumed per switching event (Watts * sec = Joule) l lower power design could simply be a slower design l allows one to understand tradeoffs better energy-delay energy delay Energy-delay product (EDP) = PDP * t p = P av * t p 2 l EDP is the average energy consumed multiplied by the computation time required l takes into account that one can trade increased delay for lower energy/operation (e.g., via supply voltage scaling that increases delay, but decreases energy consumption)
CSE477 L12&13 Low Power.10Irwin&Vijay, PSU, 2002 Understanding Tradeoffs Energy 1/Delay a b c d Which design is the “best” (fastest, coolest, both) ? better
CSE477 L12&13 Low Power.11Irwin&Vijay, PSU, 2002 Understanding Tradeoffs Energy 1/Delay a b c d Lower EDP Which design is the “best” (fastest, coolest, both) ? better
CSE477 L12&13 Low Power.12Irwin&Vijay, PSU, 2002 CMOS Energy & Power Equations E = C L V DD 2 P 0 1 + t sc V DD I peak P 0 1 + V DD I leakage P = C L V DD 2 f 0 1 + t sc V DD I peak f 0 1 + V DD I leakage Dynamic power Short-circuit power Leakage power f 0 1 = P 0 1 * f clock
CSE477 L12&13 Low Power.13Irwin&Vijay, PSU, 2002 Dynamic Power Consumption Energy/transition = C L * V DD 2 * P 0 1 P dyn = Energy/transition * f = C L * V DD 2 * P 0 1 * f P dyn = C EFF * V DD 2 * f where C EFF = P 0 1 C L Not a function of transistor sizes! Data dependent - a function of switching activity! VinVout CLCL Vdd f01f01
CSE477 L12&13 Low Power.14Irwin&Vijay, PSU, 2002 Lowering Dynamic Power P dyn = C L V DD 2 P 0 1 f Capacitance: Function of fan-out, wire length, transistor sizes Supply Voltage: Has been dropping with successive generations Clock frequency: Increasing… Activity factor: How often, on average, do wires switch?
CSE477 L12&13 Low Power.15Irwin&Vijay, PSU, 2002 Short Circuit Power Consumption Finite slope of the input signal causes a direct current path between V DD and GND for a short period of time during switching when both the NMOS and PMOS transistors are conducting. VinVout CLCL I sc
CSE477 L12&13 Low Power.16Irwin&Vijay, PSU, 2002 Short Circuit Currents Determinates Duration and slope of the input signal, t sc I peak determined by l the saturation current of the P and N transistors which depend on their sizes, process technology, temperature, etc. l strong function of the ratio between input and output slopes -a function of C L E sc = t sc V DD I peak P 0 1 P sc = t sc V DD I peak f 0 1
CSE477 L12&13 Low Power.17Irwin&Vijay, PSU, 2002 Impact of C L on P sc VinVout CLCL I sc 0 VinVout CLCL I sc I max Large capacitive load Output fall time significantly larger than input rise time. Small capacitive load Output fall time substantially smaller than the input rise time.
CSE477 L12&13 Low Power.18Irwin&Vijay, PSU, 2002 I peak as a Function of C L I peak (A) time (sec) x x C L = 20 fF C L = 100 fF C L = 500 fF 500 psec input slope Short circuit dissipation is minimized by matching the rise/fall times of the input and output signals - slope engineering. When load capacitance is small, I peak is large.
CSE477 L12&13 Low Power.19Irwin&Vijay, PSU, 2002 P sc as a Function of Rise/Fall Times P normalized t sin /t sou t V DD = 3.3 V V DD = 2.5 V V DD = 1.5V normalized wrt zero input rise-time dissipation When load capacitance is small (t sin /t sout > 2 for V DD > 2V) the power is dominated by P sc If V DD < V Tn + |V Tp | then P sc is eliminated since both devices are never on at the same time. W/L p = m/0.25 m W/L n = m/0.25 m C L = 30 fF
CSE477 L12&13 Low Power.20Irwin&Vijay, PSU, 2002 Leakage (Static) Power Consumption Sub-threshold current is the dominant factor. All increase exponentially with temperature! V DD I leakage Vout Drain junction leakage Sub-threshold current Gate leakage
CSE477 L12&13 Low Power.21Irwin&Vijay, PSU, 2002 Leakage as a Function of V T Continued scaling of supply voltage and the subsequent scaling of threshold voltage will make subthreshold conduction a dominate component of power dissipation. An 90mV/decade V T roll-off - so each 255mV increase in V T gives 3 orders of magnitude reduction in leakage (but adversely affects performance)
CSE477 L12&13 Low Power.22Irwin&Vijay, PSU, 2002 TSMC Processes Leakage and V T V 13, / m 24 Å 1.2 V CL013 HS V 1, / m 29 Å 1.5 V CL015 HS 42 Å T ox (effective) FET Perf. (GHz) 0.40 V0.73 V0.63 V0.42 VV Tn I off (leakage) ( A/ m) 780/360320/130500/180600/260I DSat (n/p) ( A/ m) 0.13 m0.18 m0.16 m L gate 2 V1.8 V V dd CL018 HS CL018 ULP CL018 LP CL018 G From MPR, 2000
CSE477 L12&13 Low Power.23Irwin&Vijay, PSU, 2002 Exponential Increase in Leakage Currents Temp(C) I leakage (nA/ m) From De,1999
CSE477 L12&13 Low Power.24Irwin&Vijay, PSU, 2002 Review: Energy & Power Equations E = C L V DD 2 P 0 1 + t sc V DD I peak P 0 1 + V DD I leakage P = C L V DD 2 f 0 1 + t sc V DD I peak f 0 1 + V DD I leakage Dynamic power (~90% today and decreasing relatively) Short-circuit power (~8% today and decreasing absolutely) Leakage power (~2% today and increasing) f 0 1 = P 0 1 * f clock
CSE477 L12&13 Low Power.25Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T
CSE477 L12&13 Low Power.26Irwin&Vijay, PSU, 2002 Dynamic Power as a Function of Device Size Device sizing affects dynamic energy consumption l gain is largest for networks with large overall effective fan-outs (F = C L /C g,1 ) The optimal gate sizing factor (f) for dynamic energy is smaller than the one for performance, especially for large F’s l e.g., for F=20, f opt (energy) = 3.53 while f opt (performance) = 4.47 If energy is a concern avoid oversizing beyond the optimal f normalized energy F=1 F=2 F=5 F=10 F=20 From Nikolic, UCB
CSE477 L12&13 Low Power.27Irwin&Vijay, PSU, 2002 Dynamic Power Consumption is Data Dependent ABOut input NOR Gate With input signal probabilities P A=1 = 1/2 P B=1 = 1/2 Static transition probability P 0 1 = P out=0 x P out=1 = P 0 x (1-P 0 ) Switching activity, P 0 1, has two components l A static component – function of the logic topology l A dynamic component – function of the timing behavior (glitching) NOR static transition probability = 3/4 x 1/4 = 3/16
CSE477 L12&13 Low Power.28Irwin&Vijay, PSU, 2002 NOR Gate Transition Probabilities CLCL A B BA P 0 1 = P 0 x P 1 = (1-(1-P A )(1-P B )) (1-P A )(1-P B ) PAPA PBPB Switching activity is a strong function of the input signal statistics l P A and P B are the probabilities that inputs A and B are one
CSE477 L12&13 Low Power.29Irwin&Vijay, PSU, 2002 Transition Probabilities for Some Basic Gates P 0 1 = P out=0 x P out=1 NOR(1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - P B ) OR(1 - P A )(1 - P B ) x (1 - (1 - P A )(1 - P B )) NANDP A P B x (1 - P A P B ) AND(1 - P A P B ) x P A P B XOR(1 - (P A + P B - 2P A P B )) x (P A + P B - 2P A P B ) B A Z X 0.5 For Z: P 0 1 = For X: P 0 1 =
CSE477 L12&13 Low Power.30Irwin&Vijay, PSU, 2002 Transition Probabilities for Some Basic Gates P 0 1 = P out=0 x P out=1 NOR(1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - P B ) OR(1 - P A )(1 - P B ) x (1 - (1 - P A )(1 - P B )) NANDP A P B x (1 - P A P B ) AND(1 - P A P B ) x P A P B XOR(1 - (P A + P B - 2P A P B )) x (P A + P B - 2P A P B ) B A Z X 0.5 For Z: P 0 1 = P 0 x P 1 = (1-P X P B ) P X P B For X: P 0 1 = P 0 x P 1 = (1-P A ) P A = 0.5 x 0.5 = 0.25 = (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
CSE477 L12&13 Low Power.31Irwin&Vijay, PSU, 2002 Inter-signal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time l reconvergent fan-out B A Z X P(Z=1) = P(B=1) & P(A=1 | B=1) Reconvergent fan-out 0.5 Have to use conditional probabilities
CSE477 L12&13 Low Power.32Irwin&Vijay, PSU, 2002 Inter-signal Correlations B A Z X P(Z=1) = P(B=1) & P(A=1 | B=1) 0.5 (1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16 ( x 0.5) x (0.75 x 0.5) = Incorrect! Reconvergent Determining switching activity is complicated by the fact that signals exhibit correlation in space and time l reconvergent fan-out Have to use conditional probabilities
CSE477 L12&13 Low Power.33Irwin&Vijay, PSU, 2002 Logic Restructuring Chain implementation has a lower overall switching activity than the tree implementation for random inputs Ignores glitching effects Logic restructuring: changing the topology of a logic network to reduce transitions A B C D F A B C DZ F W X Y 0.5 (1-0.25)*0.25 = 3/ /64 15/256 3/16 15/256 AND: P 0 1 = P 0 x P 1 = (1 - P A P B ) x P A P B
CSE477 L12&13 Low Power.34Irwin&Vijay, PSU, 2002 Input Ordering A B C X F B C A X F Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)
CSE477 L12&13 Low Power.35Irwin&Vijay, PSU, 2002 Input Ordering Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) A B C X F B C A X F (1-0.5x0.2)x(0.5x0.2)=0.09(1-0.2x0.1)x(0.2x0.1)=0.0196
CSE477 L12&13 Low Power.36Irwin&Vijay, PSU, 2002 Glitching in Static CMOS Networks ABC X Z Unit Delay A B X Z C Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) l glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value
CSE477 L12&13 Low Power.37Irwin&Vijay, PSU, 2002 Glitching in Static CMOS Networks ABC X Z Unit Delay A B X Z C Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) l glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value
CSE477 L12&13 Low Power.38Irwin&Vijay, PSU, 2002 Glitching in an RCA S0 S1 S2S14 S15 Cin S0 S1 S2 S3 S4 S5 S10 S15
CSE477 L12&13 Low Power.39Irwin&Vijay, PSU, 2002 Balanced Delay Paths to Reduce Glitching So equalize the lengths of timing paths through logic F1F1 F2F2 F3F F1F1 F2F2 F3F Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs
CSE477 L12&13 Low Power.40Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T
CSE477 L12&13 Low Power.41Irwin&Vijay, PSU, 2002 Dynamic Power as a Function of V DD Decreasing the V DD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) V DD (V) t p(normalized) Determine the critical path(s) at design time and use high V DD for the transistors on those paths for speed. Use a lower V DD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits).
CSE477 L12&13 Low Power.42Irwin&Vijay, PSU, 2002 Multiple V DD Considerations How many V DD ? – Two is becoming common l Many chips already have two supplies (one for core and one for I/O) When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up) l If a gate supplied with V DDL drives a gate at V DDH, the PMOS never turns off -The cross-coupled PMOS transistors do the level conversion -The NMOS transistor operate on a reduced supply l Level converters are not needed for a step-down change in voltage l Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47) V DDH V in V out V DDL
CSE477 L12&13 Low Power.43Irwin&Vijay, PSU, 2002 Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Clustered voltage-scaling l Each path starts with V DDH and switches to V DDL (gray logic gates) when delay slack is available l Level conversion is done in the flipflops at the end of the paths
CSE477 L12&13 Low Power.44Irwin&Vijay, PSU, 2002 Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency EnergyDesign TimeNon-active ModulesRun Time Active Logic Design Reduced V dd Sizing Multi-V dd Clock Gating DFS, DVS (Dynamic Freq, Voltage Scaling) Leakage+ Multi-V T Sleep Transistors Multi-V dd Variable V T + Variable V T
CSE477 L12&13 Low Power.45Irwin&Vijay, PSU, 2002 Leakage as a Function of Design Time V T Reducing the V T increases the sub- threshold leakage current (exponentially) l 90mV reduction in V T increases leakage by an order of magnitude But, reducing V T decreases gate delay (increases performance) Determine the critical path(s) at design time and use low V T devices on the transistors on those paths for speed. Use a high V T on the other logic for leakage control. l A careful assignment of V T ’s can reduce the leakage by as much as 80%
CSE477 L12&13 Low Power.46Irwin&Vijay, PSU, 2002 Dual-Thresholds Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Use lower threshold on timing-critical paths l Assignment can be done on a per gate or transistor basis; no clustering of the logic is needed l No level converters are needed
CSE477 L12&13 Low Power.47Irwin&Vijay, PSU, 2002 Variable V T (ABB) at Run Time V T = V T0 + ( |-2 F + V SB | - |-2 F |) V SB (V) V T (V) A negative bias on V SB causes V T to increase Adjusting the substrate bias at run time is called adaptive body-biasing (ABB) l Requires a dual well fab process For an n-channel device, the substrate is normally tied to ground (V SB = 0)