11/01/05ELEC / Lecture 171 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits Low-Power Logic Design and Parallelism Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University
11/01/05ELEC / Lecture 172 State Encoding Two-bit binary counter: State sequence, 00→01→10→11→00 Six bit transitions in four clock cycles 6/4 = 1.5 transitions per clock Two-bit Gray-code counter State sequence, 00→01→11→10→00 Four bit transitions in four clock cycles 4/4 = 1.0 transition per clock Gray-code counter is more power efficient. G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers (now Springer), 1998.
11/01/05ELEC / Lecture 173 Three-Bit Counters BinaryGray-code StateNo. of togglesStateNo. of toggles
11/01/05ELEC / Lecture 174 N-Bit Counter: Toggles in Counting Cycle Binary counter: T(binary) = 2(2 N – 1) Gray-code counter: T(gray) = 2 N T(gray)/T(binary) = 2 N-1 /(2 N – 1) → 0.5 BitsT(binary)T(gray)T(gray)/T(binary) ∞
11/01/05ELEC / Lecture 175 Bus Encoding Example: Four bit bus 0000→1110 has three transitions. If bits of second pattern are inverted, then 0000→0001 will have only one transition. Bit-inversion encoding for N-bit bus: Number of bit transitions 0 N/2N N N/2 0 Number of bit transitions after inversion encoding
11/01/05ELEC / Lecture 176 Bus-Inversion Encoding Logic Polarity decision logic Sent data Received data Bus register Polarity bit M. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp , March 1995.
11/01/05ELEC / Lecture 177 FSM State Encoding Expected number of state-bit transitions: 2( ) + 1( ) = 1.61( ) + 2(0.1) = 1.0 Transition probability based on PI statistics State encoding can be selected using a power-based cost function.
11/01/05ELEC / Lecture 178 FSM: Clock-Gating Moore machine: Outputs depend only on the state variables. –If a state has a self-loop in the state transition graph (STG), then clock can be stopped whenever a self-loop is to be executed. Sj Si Sk Xi/Zk Xk/Zk Xj/Zk Clock can be stopped when (Xk, Sk) combination occurs.
11/01/05ELEC / Lecture 179 Clock-Gating in Moore FSM Combinational logic Latch Clock activation logic Flip-flops PI CK PO L. Benini and G. De Micheli, Dynamic Power Management, Boston: Springer, 1998.
11/01/05ELEC / Lecture 1710 Clock-Gating in Low-Power Flip-Flop D Q D CK
11/01/05ELEC / Lecture 1711 Low-Power Datapath Architecture Lower supply voltage –This slows down circuit speed –Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.
11/01/05ELEC / Lecture 1712 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref
11/01/05ELEC / Lecture 1713 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N A copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism
11/01/05ELEC / Lecture 1714 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4
11/01/05ELEC / Lecture 1715 Power P N =P proc + P overhead P proc =N(C inreg +C comb )V N 2 f/N + C outreg V N 2 f =(C inreg +C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2
11/01/05ELEC / Lecture 1716 Voltage vs. Speed C L V ref C L V ref Delay of a gate, T ≈ ──── = ────────── Ik(W/L)(V ref – V t ) 2 whereI is saturation current k is a technology parameter W/L is width to length ratio of transistor V t is threshold voltage Supply voltage Normalized gate delay, T VtVt V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V t
11/01/05ELEC / Lecture 1717 Increasing Multiprocessing P N /P V t =0V (extreme case) V t =0.4V V t =0.8V N 1.2μ CMOS, V ref = 5V
11/01/05ELEC / Lecture 1718 Extreme Case: V t = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V t > 0, power reduction is less and there will be an optimum value of N.
11/01/05ELEC / Lecture 1719 Reduced-Power Shift Register D Q D CK(f/2) multiplexer Output Flip-flops are operated at full voltage and half the clock frequency.
11/01/05ELEC / Lecture 1720 Power Consumption of Shift Reg. P = C’V DD 2 f/n Degree of parallelism, n Normalized power Deg. Of parallelism Freq (MHz) Power (μW) bit shift register, 2μ CMOS C. Piguet, “Circuit and Logic Level Design,” pages in W. Nebel and J. Mermet (ed.), Low Power Design in Deep Submicron Electronics, Boston: Kluwer Academic Publishers, 1997.
11/01/05ELEC / Lecture 1721 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp , May A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp , July 2005; this special issue contains three more articles on multicore processors.
11/01/05ELEC / Lecture 1722 Multicore Processors Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12