Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 91 ELEC 5270/6270 Spring 2015 Low-Power Design of Electronic Circuits Memory and Multicore Design Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 92 Memory Architecture Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 A 0 A 1. A K-1 Decoder K address lines K = log 2 N N = 2 K
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 93 Memory Organization Sense amplifiers/drivers Column decoder A L A L+1 A K–1 Storage cell Word line Bit line Input-Output (M bits) A 0 A L–1 2 K – L M.2 L K – L bit row address L bit column address N = 2 K M-bit words Row decoder
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 94 An SRAM Cell bit VDD WL BL
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 95 Read Operation bit VDD WL BL 1. Precharge to VDD 2. WL = Logic 1 3. Sense amplifier converts BL swing to logic level
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 96 Precharge Circuit bit VDD WL BL Diff. sense ampl. VDD PC Equalization device
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 97 Reading 1 from Cell Precharge time WL BL Sense ampl. output Pulsed to save bit line charge
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 98 Write Operation, 1→ 0 bit VDD WL BL Set BL = 0, BL = 1 2. WL = 1
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 99 Cell Array Power Management Smaller transistors Smaller transistors Low supply voltage Low supply voltage Lower voltage swing (0.1V – 0.3V for SRAM) Lower voltage swing (0.1V – 0.3V for SRAM) Sense amplifier restores the full voltage swing for outside use. Sense amplifier restores the full voltage swing for outside use. Power-down and sleep modes Power-down and sleep modes
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 910 Sense Amplifier bit SE or CLK Sense ampl. enable: Low when bit lines are precharged and equalized VDD Full voltage swing output
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 911 Sense Amplifier: Precharge bit=1 SE=0 VDD 0 OFF ON
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 912 Sense Amplifier: Reading 0 bit=1 – ∆ bit=1 SE=1 VDD 1 0 ON OFF ON
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 913 Sense Amplifier: Reading 1 bit=1 bit=1– ∆ SE=1 VDD 0 1 ON OFF ON
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 914 Block-Oriented Architecture A single cell array may contain 64 Kbits to 256 Kbits. A single cell array may contain 64 Kbits to 256 Kbits. Larger arrays become slow and consume more power. Larger arrays become slow and consume more power. Larger memories are block oriented. Larger memories are block oriented.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 915 Hierarchical Organization Global data bus Global amplifier/driver I/O Block 0 Block 1 Block P-1 Control circuitry Block selector Row addr. Column addr. Block addr.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 916 Power Saving Block-oriented memory Block-oriented memory Lengths of local word and bit lines are kept small. Lengths of local word and bit lines are kept small. Block address is used to activate the addressed block. Block address is used to activate the addressed block. Unaddressed blocks are put in power-saving mode: Unaddressed blocks are put in power-saving mode: sense amplifier and row/column decoders are disabled. sense amplifier and row/column decoders are disabled. Cell array is put in power-saving mode. Cell array is put in power-saving mode.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 917 Static Power Supply voltage 1.3μ 1.1μ 900n 700n 500n 300n 100n 0.13μ CMOS 0.18μ CMOS 8-kbit SRAM 7x increase Leakage current (Amperes)
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 918 Power Saving Modes Power-down: Disconnect supply. Data is not retained. Must be refreshed before use. Example, caches. Power-down: Disconnect supply. Data is not retained. Must be refreshed before use. Example, caches. Increasing thresholds by body biasing: Negative bias on nonactive cells reduces leakage. Increasing thresholds by body biasing: Negative bias on nonactive cells reduces leakage. Sleep mode: Sleep mode: Insert resistance in leakage path; retain data. Insert resistance in leakage path; retain data. Lower supply voltage. Lower supply voltage.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 919 Adding Resistance in Leakage Path SRAM cell SRAM cell SRAM cell GND VDD sleep Low-threshold transistor VSS.int VDD.int
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 920 Lowering Supply Voltage SRAM cell SRAM cell SRAM cell GND VDD sleep VDDL ≥ 100mV for 0.13μ CMOS Sleep = 1, data retention mode
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 921 Parallelization of Memories instr. A instr. C instr. E. f/2 Mem 1 instr. B instr. D instr. F. f/2 Mem 2 MUX f/2 01 Power = C’ f/2 V DD 2 C. Piguet, “Circuit and Logic Level Design,” pp in W. Nebel and J. Mermet (Eds.), Low Power Design in Deep Submicron Electronics, Springer, 1997.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 922 References K. Itoh, VLSI Memory Chip Design, Springer- Verlag, K. Itoh, VLSI Memory Chip Design, Springer- Verlag, J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, Inc., 2003, Chapter 12. J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, Inc., 2003, Chapter 12. S.-M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits Analysis and Design, New York: McGraw-Hill, 1996, Chapter 10. S.-M. Kang and Y. Leblebici, CMOS Digital Integrated Circuits Analysis and Design, New York: McGraw-Hill, 1996, Chapter 10.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 923 Low-Power Datapath Architecture Lower supply voltage Lower supply voltage This slows down circuit speed This slows down circuit speed Use parallel computing to gain the speed back Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 924 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 925 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N Each copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 926 Level Converter: L to H Vin_L Vout_H VDDH VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section , Addison-Wesley, 2005.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 927 Level Converter: Input 0 Vin_L = 0 Vout_H VDDH VDDL 1L 0 short open VDDH
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 928 Level Converter: Input 1 Vin_L = 1L Vout_H VDDH VDDL 0 VDDH short open 0
DVF4: Dual V TH Feedback Type 4-Transistor Level Converter Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 929 Vin_L Vout_H VDDH High Transistor VDDL V TH
References for DVF4 K. N. Jayaraman, “DVF4: A Dual Vth Feedback Based 4-Transistor Level Converter,” Master’s thesis, Auburn University, Dec K. N. Jayaraman and V. D. Agrawal, “A Four-Transistor Level Converter for Dual- Voltage Low-Power Design,” J. Low Power Electronics, vol. 10, no. 4, pp. 617–628, Dec Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 930
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 931 Level Converter: H to L Vin_H Vout_L VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section , Addison-Wesley, 2005.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 932 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 933 Power P N =P proc + P overhead P proc =N(C inreg + C comb )V N 2 f/N + C outreg V N 2 f =(C inreg + C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2
Alpha-Power Law Model Variation of delay with supply voltage: Variation of delay with supply voltage: delay α V DD /(V DD – V TH ) α V TH = Threshold voltage V TH = Threshold voltage α = 1 for short-channel devices, ≈ 2 for long-channel devices T. Sakurai and A. R. Newton, “Delay analysis of series-connected MOSFET circuits,” IEEE Journal of Solid-State Circuits, Vol. 26, pp.122–131, Feb T. Sakurai and A. R. Newton, “A simple MOSFET model for circuit analysis,” IEEE Transaction on Electron Devices, Vol. 38, No. 4, pp.887–894, Apr T. Sakurai, “High-speed circuit design with scaled-down MOSFETs and low supply voltage (invited),” Proc. IEEE ISCAS, pp.1487–1490, Chicago, May T. Sakurai, “Alpha-Power Law MOS Model,” IEEE Solid-State Circuits Society Newsletter, Vol. 9, No. 4, pp. 4–5, Oct Copyright Agrawal, 2007ELEC5270/6270 Spr 15, Lecture 834
Copyright Agrawal, 2007 ELEC6270 Spring 15, Lecture 9 35 Voltage vs. Speed k V ref Circuit delay, T ≈ ──────── (V ref – V TH ) 2 where k is a technology constant V TH is threshold voltage Supply voltage Normalized gate delay, T V TH V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V TH
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 936 Increasing Multiprocessing P N /P V TH = 0V (extreme case) V TH = 0.4V V TH = 0.8V N 1.2μ CMOS, V ref = 5V
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 937 Extreme Cases: V TH = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V TH > 0, power reduction is less and there will be an optimum value of N.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 938 Example: Multiplier Core Specification: Specification: 200MHz Clock 200MHz Clock 15W 5V 15W 5V Low voltage operation, V DD ≥ 1.5 volts Low voltage operation, V DD ≥ 1.5 volts For threshold voltage, V t = 0.5V, For threshold voltage, V t = 0.5V, (V DD – 0.5) 2 (V DD – 0.5) 2 Clock frequency = ───────GHz Clock frequency = ───────GHz V DD V DD Problem: Problem: Integrate multiplier core on a SOC Integrate multiplier core on a SOC Power budget for multiplier ~ 5W Power budget for multiplier ~ 5W
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 939 A Multicore Design Multiplier Core 1 Multiplier Core 5 Reg 5 to 1 mux Multiphase Clock gen. and mux control Input Output 200MHz CK 200MHz 40MHz Multiplier Core 2 Core clock frequency = 200/N
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 940 How Many Cores? For N cores: For N cores: clock frequency = 200/N MHz clock frequency = 200/N MHz Supply voltage, V DDN Supply voltage, V DDN (V DDN – 0.5) 2 = 4.05 V DDN /N V DDN 2 – (1+4.05/N) V DDN = 0 V DDN 2 – (1+4.05/N) V DDN = 0 Assuming 10% overhead per core, Assuming 10% overhead per core, V DDN V DDN Power dissipation =15 [ (N – 1)] ( ─── ) 2 watts 5
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 941 Design Tradeoffs Number of cores, N Clock (MHz) Core supply VDDN (volts) Total Power (watts)
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 942 Power Reduction in Processors Just about everything is used. Just about everything is used. Hardware methods: Hardware methods: Voltage reduction for dynamic power Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Dual-threshold devices for leakage reduction Clock gating, frequency reduction Clock gating, frequency reduction Sleep mode Sleep mode Architecture: Architecture: Instruction set Instruction set hardware organization hardware organization Software methods Software methods
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 943 Parallel Architecture Processor f f/2 Processor f/2 f Input Output Input Output Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV 2 f
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 944 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 945 Approximate Trend n-parallel proc. n-parallel proc. n-stage pipeline proc. n-stage pipeline proc. CapacitancenCC VoltageV/nV/n Frequencyf/nf Power CV 2 f/n 2 Chip area n times n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Springer, 1998.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 946 Multicore Processors Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 947 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp , May D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp , May A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp , July 2005; A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp , July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp , January S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp , January 2006.
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 948 Cell - Cell Broadband Engine Architecture L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops
Copyright Agrawal, 2007ELEC6270 Spring 15, Lecture 949 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops