August 9, 2006Agrawal: VDAT'06 Tutorial II1 Low-Power Electronics and Systems Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849, USA
August 9, 2006Agrawal: VDAT'06 Tutorial II2 Contents Introduction Dynamic power –Short circuit power –Reduced supply voltage operation –Glitch elimination Static (leakage) power reduction Low power systems –State encoding –Processor and multi-core design Books on low-power design
August 9, 2006Agrawal: VDAT'06 Tutorial II3 Introduction Why is it a concern? Power Consumption of VLSI Chips
August 9, 2006Agrawal: VDAT'06 Tutorial II4 ISSCC, Feb. 2001, Keynote “Ten years from now, microprocessors will run at 10GHz to 30GHz and be capable of processing 1 trillion operations per second -- about the same number of calculations that the world's fastest supercomputer can perform now. “Unfortunately, if nothing changes these chips will produce as much heat, for their proportional size, as a nuclear reactor....” Patrick P. Gelsinger Senior Vice President General Manager Digital Enterprise Group INTEL CORP.
August 9, 2006Agrawal: VDAT'06 Tutorial II5 VLSI Chip Power Density Pentium® P Year Power Density (W/cm 2 ) Hot Plate Nuclear Reactor Rocket Nozzle Sun’s Surface Source: Intel
August 9, 2006Agrawal: VDAT'06 Tutorial II6 Meaning of Low-Power Design Design practices that reduce power consumption at least by one order of magnitude; in practice 50% reduction is often acceptable. General considerations in low-power design –Algorithms and architectures –High-level and software techniques –Gate and circuit-level methods –Power estimation techniques –Test power
August 9, 2006Agrawal: VDAT'06 Tutorial II7 Topics in Low-Power Power dissipation in CMOS circuits Device technology –Low-power CMOS technologies –Energy recovery methods Circuit and gate level methods –Logic synthesis –Dynamic power reduction techniques –Leakage power reduction System level methods –Microprocessors –Arithmetic circuits –Low power memory technology Test power Power estimation methods and tools
August 9, 2006Agrawal: VDAT'06 Tutorial II8 Power in a CMOS Gate V DD i DD (t) Ground
August 9, 2006Agrawal: VDAT'06 Tutorial II9 Power Dissipation in CMOS Logic (0.25µ) %75%5%20 P total (0→1) = C L V DD 2 + t sc V DD I peak + V DD I leakage CLCL V DD
August 9, 2006Agrawal: VDAT'06 Tutorial II10 Power and Energy Instantaneous power (Watts) P(t) = i DD (t) V DD Peak power (Watts) P peak = Max {P(t)} Average power (Watts) P av = [ ∫ 0 T P(t) dt ]/T Energy (Joules) E = ∫ 0 T P(t) dt
August 9, 2006Agrawal: VDAT'06 Tutorial II11 Low-Power Design Techniques Circuit and gate level methods – Reduced supply voltage – Adiabatic switching and charge recovery – Logic design for reduced activity – Reduced Glitches – Transistor sizing – Pass-transistor logic – Pseudo-nMOS logic – Multi-threshold gates
August 9, 2006Agrawal: VDAT'06 Tutorial II12 Low-Power Design Techniques Functional and architectural methods –Clock suppression –Clock frequency reduction –Supply voltage reduction –Power down –Algorithmic and Software methods
August 9, 2006Agrawal: VDAT'06 Tutorial II13 Test Power Power grid on a VLSI chip is designed for certain current capacity during functional operation: –Average current → heat dissipation –Peak current → noise, ground bounce Problem – Tests like scan or BIST are nonfunctional and may cause higher than the functional circuit activity; a functionally good chip can fail the test.
August 9, 2006Agrawal: VDAT'06 Tutorial II14 Power Estimation Methods Spice: Accurate but expensive Logic-level –Event-driven simulation –Statistical –Probabilistic High-level: Hierarchical
August 9, 2006Agrawal: VDAT'06 Tutorial II15 Components of Power Dynamic –Signal transitions Logic activity Glitches –Short-circuit Static –Leakage P total =P dyn + P stat =P tran + P sc + P stat
August 9, 2006Agrawal: VDAT'06 Tutorial II16 Power of a Transition: P tran V DD Ground CLCL R on R=large v i (t) v o (t) i c (t)
August 9, 2006Agrawal: VDAT'06 Tutorial II17 Charging of a Capacitor V C R i(t) v(t) Charge on capacitor, q(t)=C v(t) Current, i(t)=dq(t)/dt=C dv(t)/dt t = 0
August 9, 2006Agrawal: VDAT'06 Tutorial II18 i(t)=C dv(t)/dt=[V – v(t)] /R dv(t)V – v(t) ───=───── dt RC dv(t) dt ∫ ─────=∫───── V – v(t) RC -t ln [V – v(t)]=──+ A RC Initial condition, t = 0, v(t) = 0 → A = ln V -t v(t)=V [1 – exp(───)] RC
August 9, 2006Agrawal: VDAT'06 Tutorial II19 -t v(t)=V [1 – exp( ── )] RC dv(t) V -t i(t)=C ───=── exp( ── ) dt R RC
August 9, 2006Agrawal: VDAT'06 Tutorial II20 Total Energy Per Charging Transition from Power Supply ∞∞ V 2 -t E trans =∫ V i(t) dt=∫ ── exp( ── ) dt 00 R RC =CV 2
August 9, 2006Agrawal: VDAT'06 Tutorial II21 Energy Dissipated per Transition in Resistance (R) of “On” Transistors ∞ V 2 ∞ -2t R ∫ i 2 (t) dt=R ── ∫ exp( ── ) dt 0 R 2 0 RC 1 = ─ CV 2 2
August 9, 2006Agrawal: VDAT'06 Tutorial II22 Energy Stored in Charged Capacitor ∞∞ -t V -t ∫ v(t) i(t) dt = ∫ V [1- exp( ── )] ─ exp( ── ) dt 00 RC R RC 1 = ─ CV 2 2
August 9, 2006Agrawal: VDAT'06 Tutorial II23 Transition Power Gate output rising transition –Energy dissipated in pMOS transistor = CV 2 /2 –Energy stored in capacitor = CV 2 /2 Gate output falling transition –Energy dissipated in nMOS transistor = CV 2 /2 Energy dissipated per transition = CV 2 /2 Power dissipation: P trans =E trans α f ck =α f ck CV 2 /2 α=activity factor
August 9, 2006Agrawal: VDAT'06 Tutorial II24 Short Circuit Current, i sc (t) Time (ns) 0 1 Amp Volt V DD i sc (t) 0 V i (t) V o (t) V DD - V Tp V Tn tBtB tEtE I scmaxf V DD V i (t)V o (t) GND
August 9, 2006Agrawal: VDAT'06 Tutorial II25 Short-Circuit Energy per Transition E scf = ∫ t B t E V DD i sc (t)dt = (t E – t B ) I scmaxf V DD /2 E scf = t f (V DD - |V Tp | -V Tn ) I scmaxf /2 E scr = t r (V DD - |V Tp | -V Tn ) I scmaxr /2 E scf = 0, when V DD = |V Tp | + V Tn
August 9, 2006Agrawal: VDAT'06 Tutorial II26 Short-Circuit Power and Voltage Scaling Decreases and eventually becomes zero when V DD is scaled down but the threshold voltages are not scaled down. References: –M. A. Ortega and J. Figueras, “Short Circuit Power Modeling in Submicron CMOS,” PATMOS’96, Aug. 1996, pp –T. Sakurai and A. Newton, “Alpha-power Law MOSFET model and Its Application to a CMOS Inverter,” IEEE J. Solid State Circuits, vol. 25, April 1990, pp
August 9, 2006Agrawal: VDAT'06 Tutorial II27 P sc and Output Capacitance V DD Ground CLCL R on R=large v i (t) v o (t) i c (t)+i sc (t) tftf trtr v o (t) ─── R↑
August 9, 2006Agrawal: VDAT'06 Tutorial II28 i sc and Output Capacitance -t V DD [ 1- exp ( ───── )] v o (t) R↓ tf (t)C I sc (t) =──── =────────────── R↑ tf (t)
August 9, 2006Agrawal: VDAT'06 Tutorial II29 i scmax and Output Capacitance Small C Large C tftf 1 ──── R↑ tf (t) i scmax v o (t) i t
August 9, 2006Agrawal: VDAT'06 Tutorial II30 P sc, Output Rise Times, Capacitance For given input rise and fall times short circuit power decreases as output capacitance increases. Short circuit power increases with increase of input rise and fall times. Short circuit power is reduced if output rise and fall times are smaller than the input rise and fall times.
August 9, 2006Agrawal: VDAT'06 Tutorial II31 Effects of Scaling Down 1-16% short-circuit power at 0.7 micron 4-37% at 0.35 micron 12-60% at 0.17 micron Reference: S. R. Vemuru and N. Steinberg, “Short Circuit Power Dissipation Estimation for CMOS Logic Gates,” IEEE Trans. on Circuits and Systems I, vol. 41, Nov. 1994, pp
August 9, 2006Agrawal: VDAT'06 Tutorial II32 Summary: Short-Circuit Power Short-circuit power is consumed by each transition (increases with input transition time). Reduction requires that gate output transition should not be faster than the input transition (faster gates can consume more short-circuit power). Increasing the output load capacitance reduces short-circuit power. Scaling down of supply voltage with respect to threshold voltages reduces short-circuit power.
August 9, 2006Agrawal: VDAT'06 Tutorial II33 Dynamic Power V DD Ground CLCL R R Dynamic Power = C L V DD 2 /2 + P sc ViVi VoVo i sc
August 9, 2006Agrawal: VDAT'06 Tutorial II34 Dynamic Power Reduction Reduce power per transition –Reduced voltage operation – voltage scaling –Capacitance minimization – device sizing Reduce number of transitions –Glitch elimination
August 9, 2006Agrawal: VDAT'06 Tutorial II35 CMOS Dynamic Power Dynamic Power = Σ0.5 α i f clk C Li V DD 2 All gates i ≈ 0.5 α f clk C L V DD 2 ≈ α 01 f clk C L V DD 2 whereαaverage gate activity factor α 01 = 0.5α, average 0→1 trans. f clk clock frequency C L total load capacitance V DD supply voltage
August 9, 2006Agrawal: VDAT'06 Tutorial II36 Example: 0.25μm CMOS Chip f = 500MHz Average capacitance = 15fF/gate V DD = 2.5V 10 6 gates Power= α 01 f C L V DD 2 = α 01 ×500×10 6 ×(15× ×10 6 ) ×2.5 2 = 46.9W, for α 01 = 1.0
August 9, 2006Agrawal: VDAT'06 Tutorial II37 Signal Activity, α T=1/f Clock α 01 = 1.0 α 01 = 0.5 Comb. signals
August 9, 2006Agrawal: VDAT'06 Tutorial II38 Reducing Dynamic Power Dynamic power reduction is –Quadratic with reduction of supply voltage –Linear with reduction of capacitance
August 9, 2006Agrawal: VDAT'06 Tutorial II μm CMOS Inverter, V DD =2.5V V in (V) V out (V) V in (V) Gain
August 9, 2006Agrawal: VDAT'06 Tutorial II μm CMOS Inverter, V DD < 2.5V V in (V) V out (V) V in (V) V out (V) Gain = -1
August 9, 2006Agrawal: VDAT'06 Tutorial II41 Lower Bound on V DD For proper operation of gate, maximum gain (for Vin = V DD /2) should be greater than 1. Gain max = -(1/n)[exp(V DD /2Φ T ) – 1] = -1 n = 1.5 Φ T = kT/q = 26mV V DD = 48V V DDmin > 2 to 4 times kT/q or ~100mV at room temperature (27 o C) Ref.: J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, 2003.
August 9, 2006Agrawal: VDAT'06 Tutorial II42 Impact of V DD on Performance C L V DD Inverter delay = K─────── (V DD – V t ) α 0.6V1.8V3.0V V DD Power Delay Delay (ns) V DD =V t Power (log scale)
August 9, 2006Agrawal: VDAT'06 Tutorial II43 Optimum Power × Delay V DD 3 Power × Delay, PD=constant ×─────── (V DD – V t ) α For minimum power-delay product, d(PD)/dV DD = 0 3V t V DD =─── 3 – α For long channel devices, α = 2, V DD = 3V t For very short channel devices, α = 1, V DD = 1.5V t
August 9, 2006Agrawal: VDAT'06 Tutorial II44 Transistor Sizing for Performance Problem: If we increase W/L to make the charging or discharging of load capacitance, then the increased W increases the load for the driving gate C in CLCL
August 9, 2006Agrawal: VDAT'06 Tutorial II45 Fixed-Taper Buffer V in V out CLCL C in 1 α α2α2 α i-1 α n-1 C i = α i-1 C in C L = α n C in Delay = t 0 Ref.: J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, Piscataway, New Jersey: IEEE Press, 2004.
August 9, 2006Agrawal: VDAT'06 Tutorial II46 Buffer (Cont.) α n = C L /C in ln (C L /C in ) n = ────── ln α ith stage delay, t i = αt 0, i = 1,... n, because each stage drives a stage α times bigger than itself.
August 9, 2006Agrawal: VDAT'06 Tutorial II47 Buffer (Cont.) n Total delay =Σ ti=nαt 0 i=1 = ln(C L /C in ) αt 0 /ln(α)
August 9, 2006Agrawal: VDAT'06 Tutorial II48 Buffer (Cont.) Differentiating total delay with respect to α and equating to 0, we get α opt = e ≈ 2.7 The optimum number of stages is n opt = ln(C L /C in )
August 9, 2006Agrawal: VDAT'06 Tutorial II49 Further Reading B. S. Cherkauer and E. G. Friedman, “A Unified Design Methodology for CMOS Tapered Buffers,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp , March 1995.
August 9, 2006Agrawal: VDAT'06 Tutorial II50 Logic Activity and Glitches d=2 d=1
August 9, 2006Agrawal: VDAT'06 Tutorial II51 Glitch Power Reduction Design a digital circuit for minimum transient energy consumption by eliminating hazards
August 9, 2006Agrawal: VDAT'06 Tutorial II52 Theorem 1 For correct operation with minimum energy consumption, a Boolean gate must produce no more than one event per transition. Output logic state changes One transition is necessary Output logic state unchanged No transition is necessary
August 9, 2006Agrawal: VDAT'06 Tutorial II53 Inertial Delay of a Gate (Inverter) d HL d LH d HL +d LH d = ──── 2 V in V out time
August 9, 2006Agrawal: VDAT'06 Tutorial II54 Given that events occur at the input of a gate with inertial delay d at times, t 1 ≤... ≤ t n, the number of events at the gate output cannot exceed Theorem 2 min ( n, 1 + ) t n – t d t n - t 1 t n - t 1 t 1 t 2 t 3 t n t 1 t 2 t 3 t n time time
August 9, 2006Agrawal: VDAT'06 Tutorial II55 Minimum Transient Design Minimum transient energy condition for a Boolean gate: | t i - t j | < d Where t i and t j are arrival times of input events and d is the inertial delay of gate
August 9, 2006Agrawal: VDAT'06 Tutorial II56 Balanced Delay Method All input events arrive simultaneously Overall circuit delay not increased Delay buffers may have to be inserted ?
August 9, 2006Agrawal: VDAT'06 Tutorial II57 Hazard Filter Method Gate delay is made greater than maximum input path delay difference No delay buffers needed (least transient energy) Overall circuit delay may increase
August 9, 2006Agrawal: VDAT'06 Tutorial II58 Glitch-Free Design by Linear Programming Variables: gate and buffer delays Objective: minimize number of buffers Subject to: overall circuit delay Subject to: minimum transient condition for multi-input gate
August 9, 2006Agrawal: VDAT'06 Tutorial II59 Variables for Full-Adder Gate delay variables d 4... d 12 Buffer delay variables d d 29 Delay variables are located at the checkpoints of the circuit. Delay variables
August 9, 2006Agrawal: VDAT'06 Tutorial II60 Objective Function Ideal: minimize the number of non-zero delay buffers Actual: minimize sum of buffer delays
August 9, 2006Agrawal: VDAT'06 Tutorial II61 Specify Critical Path Delay Sum of delays on critical path ≤ maxdel Original design
August 9, 2006Agrawal: VDAT'06 Tutorial II62 Multi-Input Gate Condition d1 d2 d d1 - d2 ≤ d d2 - d1 ≤ d d d |d1 - d2| ≤ d ≡
August 9, 2006Agrawal: VDAT'06 Tutorial II63 Results: 1-Bit Adder R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, 1993.
August 9, 2006Agrawal: VDAT'06 Tutorial II64 AMPL Solution: maxdel =
August 9, 2006Agrawal: VDAT'06 Tutorial II65 AMPL Solution: maxdel =
August 9, 2006Agrawal: VDAT'06 Tutorial II66 AMPL Solution: maxdel ≥
August 9, 2006Agrawal: VDAT'06 Tutorial II67 Removing a Limitation Constraints are written by path enumeration. Since number of paths in a circuit can be exponential in circuit size, the formulation is infeasible for large circuits. Example: c880 has 6.96M constraints. Solution: A linear complexity method. See, –T. Raja, Master’s Thesis, Rutgers University, –T. Raja, V. D. Agrawal and M. L. Bushnell, “Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program,” Proc. 16 th International Conf. VLSI Design, 2003, pp
August 9, 2006Agrawal: VDAT'06 Tutorial II68 Comparison of Constraints Number of gates in circuit Number of constraints
August 9, 2006Agrawal: VDAT'06 Tutorial II69 Benchmark Circuits Circuit C432 C880 C6288 c7552 Maxdel. (gates) No. of Buffers Average Peak Normalized Power
August 9, 2006Agrawal: VDAT'06 Tutorial II70 c7552: 3,500-gate CMOS Circuit Clock Cycles Instantaneous Energy x Joules
August 9, 2006Agrawal: VDAT'06 Tutorial II71 References R. Fourer, D. M. Gay and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming, South San Francisco: The Scientific Press, M. Berkelaar and E. Jacobs, “Using Gate Sizing to Reduce Glitch Power,” Proc. ProRISC Workshop, Mierlo, The Netherlands, Nov. 1996, pp V. D. Agrawal, “Low Power Design by Hazard Filtering,” Proc. 10 th Int’l Conf. VLSI Design, Jan. 1997, pp V. D. Agrawal, M. L. Bushnell, G. Parthasarathy and R. Ramadoss, “Digital Circuit Design for Minimum Transient Energy and Linear Programming Method,” Proc. 12 th Int’l Conf. VLSI Design, Jan. 1999, pp M. Hsiao, E. M. Rudnick and J. H. Patel, “Effects of Delay Model in Peak Power Estimation of VLSI Circuits,” Proc. ICCAD, Nov. 1997, pp T. Raja, A Reduced Constraint Set Linear Program for Low Power Design of Digital Circuits, Master’s Thesis, Rutgers Univ., New Jersey, T. Raja, V. D. Agrawal and M. L. Bushnell, “Transistor Sizing of Logic gates to Maximize Input Delay Variability,” J. of Low Power Electronics (JOLPE), vol. 2, pp , 2006.
August 9, 2006Agrawal: VDAT'06 Tutorial II72 Static (Leakage) Power Dynamic –Signal transitions Logic activity Glitches –Short-circuit Static –Leakage
August 9, 2006Agrawal: VDAT'06 Tutorial II73 Leakage Power IGIG IDID I sub I PT I GIDL n+ Ground V DD R
August 9, 2006Agrawal: VDAT'06 Tutorial II74 Leakage Current Components Subthreshold conduction, I sub Reverse bias pn junction conduction, I D Gate induced drain leakage, I GIDL due to tunneling at the gate-drain overlap Drain source punchthrough, I PT due to short channel and high drain-source voltage Gate tunneling, I G through thin oxide
August 9, 2006Agrawal: VDAT'06 Tutorial II75 Subthreshold Current I sub = μ 0 C ox (W/L) V t 2 exp{(V GS -V TH )/nV t } μ 0 : carrier surface mobility C ox : gate oxide capacitance per unit area L: channel length W: gate width V t = kT/q: thermal voltage n: a technology parameter
August 9, 2006Agrawal: VDAT'06 Tutorial II76 I DS for Short Channel Device I sub = μ 0 C ox (W/L) V t 2 exp{(V GS -V TH +ηV DS )/nV t } V DS = drain to source voltage η: a proportionality factor
August 9, 2006Agrawal: VDAT'06 Tutorial II77 Increased Subthreshold Leakage 0V TH ’V TH Log I sub Gate voltage Scaled device IcIc
August 9, 2006Agrawal: VDAT'06 Tutorial II78 Reducing Leakage Power Leakage power as a fraction of the total power increases as clock frequency drops. Turning supply off in unused parts can save power. For a gate it is a small fraction of the total power; it can be significant for very large circuits. Scaling down features requires lowering the threshold voltage, which increases leakage power; roughly doubles with each shrinking. Multiple-threshold devices are used to reduce leakage power.
August 9, 2006Agrawal: VDAT'06 Tutorial II79 Problem Statement Problem: To Design a CMOS Circuit, –using dual-threshold devices to globally minimize subthreshold leakage –using delay elements to eliminate all glitches –maintaining specified performance –allowing performance-power tradeoff Reference: Y. Lu and V. D. Agrawal, “Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for Vth Assignment and Path Balancing,” Proc. PATMOS, 2005, pp
August 9, 2006Agrawal: VDAT'06 Tutorial II80 MILP: Mixed Integer Linear Program Minimize { Σ X i I Li + (1-X i )I Hi all gates i + Σ Σ Δd ij } all gates i→ j WhereX i = 1, gate i has low V th, low leakage = I Li X i = 0, gate i has high V th, high leakage = I Hi Δd ij = delay inserted between gates i and j for glitch suppression X i = [0,1], is an integer, Δd ij is a real variable I Li and I Hi are constants for gate i obtained by SPICE simulation
August 9, 2006Agrawal: VDAT'06 Tutorial II81 MILP - Constraints Circuit delay constraint for each PO i: T max can be the delay of critical path or clock period specified by the circuit designer. Glitch suppression constraint for each gate i: (1) (2) (3) Constraints (1), (2) and (3) make sure that T i - t i < d i for each gate, so glitches are eliminated. T i is the latest signal arrival time at the output of gate i. t i is the earliest signal arrival time at the output of gate i.
August 9, 2006Agrawal: VDAT'06 Tutorial II82 Power-Delay Tradeoff Example 14-Gate Full Adder (Unptimized, T max = T c ) A B C S C0 Low V th gates Critical path I leak = 161 pA
August 9, 2006Agrawal: VDAT'06 Tutorial II83 Power-Delay Tradeoff Example 14-Gate Full Adder (Optimized, T max = T c ) A B C S C0 Low V th High V th Delay buffer (high V th ) Critical path I leak = 73 pA
August 9, 2006Agrawal: VDAT'06 Tutorial II84 Power-Delay Tradeoff Example 14-Gate Full Adder (Optimized, T max = 1.25T c ) A B C S C0 Low V th High V th Delay buffer (high V th ) Critical path I leak = 16 pA
August 9, 2006Agrawal: VDAT'06 Tutorial II85 Leakage Reduction and Performance 27 ℃, 70nm Circuit # gates Critical Path Delay T c (ns) Unoptimized I leak (μA) Optimized I leak (μA) (T max = T c ) Leakage Reduction Sun OS 5.7 CPU secs. Optimized I leak (μA) (T max = 1.25T c ) Leakage Reduction Sun OS 5.7 CPU secs. C % %0.3 C % %1.8 C % %0.3 C % %2.1 C % %1.3 C % %0.16 C % %0.74 C % %0.71 C % %7.48 C % %0.58
August 9, 2006Agrawal: VDAT'06 Tutorial II86 Leakage, Dynamic and Total Power 90 ℃, 70nm Circuit # Gates Leakage PowerDynamic PowerTotal Power P leak 1* (uW) P leak 2* (uW) Leakage Reduction P dyn 1* (uW) P dyn 2* (uW) Dynamic Reduction P total 1* (uW) P total 2* (uW) Total Reduction C % % % C % % % C % % % C % % % C % % % C % % % C % % % C % % % C % % % C % % % * 1: unoptimized circuits; 2: optimized circuits.
August 9, 2006Agrawal: VDAT'06 Tutorial II87 Low-Power System Design State encoding –Bus encoding –Finite state machine Clock gating –Flip-flop –Shift register Microprocessors –Single processor –Multi-core processor
August 9, 2006Agrawal: VDAT'06 Tutorial II88 Bus Encoding Example: Four bit bus 0000→1110 has three transitions. If bits of second pattern are inverted, then 0000→0001 will have only one transition. Bit-inversion encoding for N-bit bus: Number of bit transitions 0 N/2N N N/2 0 Number of bit transitions after inversion encoding
August 9, 2006Agrawal: VDAT'06 Tutorial II89 Bus-Inversion Encoding Logic Polarity decision logic Sent data Received data Bus register Polarity bit M. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp , March 1995.
August 9, 2006Agrawal: VDAT'06 Tutorial II90 FSM State Encoding Expected number of state-bit transitions: 2( ) + 1( ) = 1.61( ) + 2(0.1) = 1.0 Transition probability based on PI statistics State encoding can be selected using a power-based cost function.
August 9, 2006Agrawal: VDAT'06 Tutorial II91 FSM: Clock-Gating Moore machine: Outputs depend only on the state variables. –If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self-loop is to be executed. Sj Si Sk Xi/Zk Xk/Zk Xj/Zk Clock can be stopped when (Xk, Sk) combination occurs.
August 9, 2006Agrawal: VDAT'06 Tutorial II92 Clock-Gating in Moore FSM Combinational logic Latch Clock activation logic Flip-flops PI CK PO L. Benini and G. De Micheli, Dynamic Power Management, Boston: Springer, 1998.
August 9, 2006Agrawal: VDAT'06 Tutorial II93 Clock-Gating in Low-Power Flip-Flop D Q D CK C. Piguet, “Circuit and Logic Level Design,” pages in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997.
August 9, 2006Agrawal: VDAT'06 Tutorial II94 Reduced-Power Shift Register D Q D CK(f/2) multiplexer Output Flip-flops are operated at full voltage and half the clock frequency.
August 9, 2006Agrawal: VDAT'06 Tutorial II95 Power Reduction in Processors Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods
August 9, 2006Agrawal: VDAT'06 Tutorial II96 SIA Roadmap for Processors (1999) Year Feature size (nm) Logic transistors/cm 2 6.2M18M39M84M180M390M Clock (GHz) Chip size (mm 2 ) Power supply (V) High-perf. Power (W) Source:
August 9, 2006Agrawal: VDAT'06 Tutorial II97 Power Reduction Example Alpha 21064: 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) = 1.6W Scale 0.75→0.35μ, power (2x) = 0.8W Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200→160MHz, power (1.25x) = 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp , Nov
August 9, 2006Agrawal: VDAT'06 Tutorial II98 Low-Power Datapath Architecture Lower supply voltage –This slows down circuit speed –Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.
August 9, 2006Agrawal: VDAT'06 Tutorial II99 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref
August 9, 2006Agrawal: VDAT'06 Tutorial II100 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N A copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism
August 9, 2006Agrawal: VDAT'06 Tutorial II101 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4
August 9, 2006Agrawal: VDAT'06 Tutorial II102 Power P N =P proc + P overhead P proc =N(C inreg + C comb )V N 2 f/N + C outreg V N 2 f =(C inreg + C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2
August 9, 2006Agrawal: VDAT'06 Tutorial II103 Voltage vs. Speed C L V ref C L V ref Delay of a gate, T ≈ ──── = ────────── Ik(W/L)(V ref – V t ) 2 whereI is saturation current k is a technology parameter W/L is width to length ratio of transistor V t is threshold voltage Supply voltage Normalized gate delay, T VtVt V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V t
August 9, 2006Agrawal: VDAT'06 Tutorial II104 Increasing Multiprocessing P N /P V t =0V (extreme case) V t =0.4V V t =0.8V N 1.2μ CMOS, V ref = 5V
August 9, 2006Agrawal: VDAT'06 Tutorial II105 Extreme Cases: V t = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V t > 0, power reduction is less and there will be an optimum value of N.
August 9, 2006Agrawal: VDAT'06 Tutorial II106 Example: Multiplier Core Specification: 200MHz Clock 15W 5V Low voltage operation, V DD ≥ 1.5 volts (V DD – 0.5) 2 Relative clock rate = ─────── Problem: Integrate multiplier core on a SOC Power budget for multiplier ~ 5W
August 9, 2006Agrawal: VDAT'06 Tutorial II107 A Multicore Design Multiplier Core 1 Multiplier Core 5 Reg 5 to 1 mux Multiphase Clock gen. and mux control Input Output 200MHz CK 200MHz 40MHz Multiplier Core 2 Core clock frequency = 200/N, N should divide 200.
August 9, 2006Agrawal: VDAT'06 Tutorial II108 How Many Cores? For N cores: clock frequency = 200/N MHz Supply voltage, V DDN = (20.25/N) 1/2 Volts Assuming 10% overhead per core, V DDN Power dissipation =15 [ (N – 1)] ( ─── ) 2 watts 5
August 9, 2006Agrawal: VDAT'06 Tutorial II109 Design Tradeoffs Number of cores N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts)
August 9, 2006Agrawal: VDAT'06 Tutorial II110 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f
August 9, 2006Agrawal: VDAT'06 Tutorial II111 Approximate Trend n-parallel proc. n-stage pipeline proc. CapacitancenCC VoltageV/n Frequencyf/nf PowerCV 2 f/n 2 Chip area n times10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.
August 9, 2006Agrawal: VDAT'06 Tutorial II112 Multicore Processors Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12
August 9, 2006Agrawal: VDAT'06 Tutorial II113 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp , May A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp , July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp , January 2006.
August 9, 2006Agrawal: VDAT'06 Tutorial II114 Cell - Cell Broadband Engine Architecture L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops
August 9, 2006Agrawal: VDAT'06 Tutorial II115 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops
August 9, 2006Agrawal: VDAT'06 Tutorial II116 Books on Low-Power Design (1) L. Benini and G. De Micheli, Dynamic Power Management Design Techniques and CAD Tools, Boston: Springer, T. D. Burd and R. A. Brodersen, Energy Efficient Microprocessor Design, Boston: Springer, A. Chandrakasan and R. Brodersen, Low-Power Digital CMOS Design, Boston: Springer, A. Chandrakasan and R. Brodersen, Low-Power CMOS Design, New York: IEEE Press, J.-M. Chang and M. Pedram, Power Optimization and Synthesis at Behavioral and System Levels using Formal Methods, Boston: Springer, M. S. Elrabaa, I. S. Abu-Khater and M. I. Elmasry, Advanced Low-Power Digital Circuit Techniques, Boston: Springer, R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, S. Iman and M. Pedram, Logic Synthesis for Low Power VLSI Designs, Boston: Springer, J. B. Kuo and J.-H. Lou, Low-Voltage CMOS VLSI Circuits, New York: Wiley- Interscience, J. Monteiro and S. Devadas, Computer-Aided Design Techniques for Low Power Sequential Logic Circuits, Boston: Springer, S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS Technologies, Boston: Springer, W. Nebel and J. Mermet, Low Power Design in Deep Submicron Electronics, Boston: Springer, 1997.
August 9, 2006Agrawal: VDAT'06 Tutorial II117 Books on Low-Power Design (2) N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits, Boston: Springer, V. G. Oklobdzija, V. M. Stojanovic, D. M. Markovic and N. Nedovic, Digital System Clocking: High Performance and Low-Power Aspects, Wiley-IEEE, M. Pedram and J. M. Rabaey, Power Aware Design Methodologies, Boston: Springer, C. Piguet, Low-Power Electronics Design, Boca Raton: Florida: CRC Press, J. M. Rabaey and M. Pedram, Low Power Design Methodologies, Boston: Springer, S. Roudy, P. K. Wright and J. M. Rabaey, Energy Scavenging for Wireless Sensor Networks, Boston: Springer, K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design, New York: Wiley- Interscience, E. Sánchez-Sinencio and A. G. Andreaou, Low-Voltage/Low-Power Integrated Circuits and Systems – Low-Voltage Mixed-Signal Circuits, New York: IEEE Press, W. A. Serdijn, Low-Voltage Low-Power Analog Integrated Circuits, Boston:Springer, S. Sheng and R. W. Brodersen, Low-Power Wireless Communications: A Wideband CDMA System Design, Boston: Springer, G. Verghese and J. M. Rabaey, Low-Energy FPGAs, Boston: springer, G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Springer, K.-S. Yeo and K. Roy, Low-Voltage Low-Power Subsystems, McGraw Hill, 2004.
August 9, 2006Agrawal: VDAT'06 Tutorial II118 Other Books Useful in Low-Power Design A. Chandrakasan, W. J. Bowhill and F. Fox, Design of High- Performance Microprocessor Circuits, New York: IEEE Press, N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, S. M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits, New York: McGraw-Hill, E. Larsson, Introduction to Advanced System-on-Chip Test Design and Optimization, Springer, J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Second Edition, Upper Saddle River, New Jersey: Prentice-Hall, J. Segura and C. F. Hawkins, CMOS Electronics, How It Works, How It Fails, New York: IEEE Press, 2004.