Download presentation
Presentation is loading. Please wait.
1
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 1 Low-Power Design and Test Memory and Multicore Design Vishwani D. Agrawal Auburn University, USA vagrawal@eng.auburn.edu Srivaths Ravi Texas Instruments India Srivaths.ravi@ti.com Hyderabad, July 30-31, 2007 http://www.eng.auburn.edu/~vagrawal/hyd.html
2
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 62 Memory Architecture Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 Word 0 Word 1 Word 2 M bits Storage cell Word N-2 Word N-1 Input-Output (M bits) N words S0S0 S N-1 A 0 A 1. A k-1 Decoder k address lines k = log 2 N
3
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 63 Memory Organization Sense amplifiers/drivers Column decoder A K A K-1 A L-1 Storage cell Word line Bit line Input-Output (M bits) A 0 A K-1 2 L-K M.2 K
4
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 64 An SRAM Cell bit VDD WL BL
5
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 65 Read Operation bit VDD WL BL 1. Precharge to VDD 2. WL = Logic 1 3. Sense amplifier converts BL swing to logic level
6
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 66 Precharge Circuit bit VDD WL BL Diff. sense ampl. VDD PC
7
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 67 Reading 1 from Cell Precharge time WL BL Sense ampl. output Pulsed to save bit line charge
8
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 68 Write Operation, bit = 1 → 0 bit VDD WL BL 0 1 1. Set BL = 0, BL = 1 2. WL = 1
9
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 69 Cell Array Power Management Smaller transistors Low supply voltage Lower voltage swing (0.1V – 0.3V for SRAM) Sense amplifier restores the full voltage swing for outside use.
10
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 610 Sense Amplifier bit SE Sense ampl. enable: Low when bit lines are precharged and equalized VDD Full voltage swing output
11
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 611 Block-Oriented Architecture A single cell array may contain 64 Kbits to 256 Kbits. Larger arrays become slow and consume more power. Larger memories are block oriented.
12
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 612 Hierarchical Organization Global data bus Global amplifier/driver I/O Block 0 Block 1 Block P-1 Control circuitry Block selector Row addr. Column addr. Block addr.
13
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 613 Power Saving Block-oriented memory Lengths of local word and bit lines are kept small. Block address is used to activate the addressed block. Unaddressed blocks are put in power-saving mode: sense amplifier and row/column decoders are disabled. Power is maintained for data retention in cells.
14
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 614 Static Power 0.00.61.21.8 Supply voltage 1.3μ 1.1μ 900n 700n 500n 300n 100n 0.13μ CMOS 0.18μ CMOS 8-kbit SRAM 7x increase Leakage current (Amperes)
15
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 615 Adding Resistance in Leakage Path SRAM cell array SRAM cell array SRAM cell array GND VDD sleep Low-threshold transistor VSS.int VDD.int
16
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 616 Lowering Supply Voltage SRAM cell array SRAM cell array SRAM cell array GND VDD sleep VDDL= 100mV for 0.13μ CMOS Sleep = 1, data retention mode
17
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 617 Parallelization of Memories instr. A instr. C instr. E. f/2 Mem 1 instr. B instr. D instr. F. f/2 Mem 2 MUX f/2 01 Power = C’ f/2 V DD 2 C. Piguet, “Circuit and Logic Level Design,” pp. 124-125 in W. Nebel and J. Mermet (Eds.), Low Power Design in Deep Submocron Electronics, Springer, 1997.
18
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 618 References K. Itoh, VLSI Memory Chip Design, Springer-Verlag, 2001. J. M. Rabaey, A. Chandrakasan and B. Nikolić, Digital Integrated Circuits, Upper Saddle River, New Jersey: Pearson Education, Inc., 2003.
19
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 619 Low-Power Datapath Architecture Lower supply voltage This slows down circuit speed Use parallel computing to gain the speed back Works well when threshold voltage is also lowered. About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995.
20
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 620 A Reference Datapath Combinational logic Output Input Register CK Supply voltage= V ref Total capacitance switched per cycle= C ref Clock frequency= f Power consumption:P ref = C ref V ref 2 f C ref
21
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 621 A Parallel Architecture Comb. Logic Copy 1 Comb. Logic Copy 2 Comb. Logic Copy N Register N to 1 multiplexer Multiphase Clock gen. and mux control Input Output CK f f/N Each copy processes every Nth input, operates at reduced voltage Supply voltage: V N ≤ V 1 = V ref N = Deg. of parallelism
22
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 622 Level Converter: L to H Vin_L Vout_H VDDH VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.
23
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 623 Level Converter: H to L Vin_H Vout_L VDDL Transistors with thicker oxide and longer channels N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Section 12.4.3, Addison-Wesley, 2005.
24
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 624 Control Signals, N = 4 CK Phase 1 Phase 2 Phase 3 Phase 4
25
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 625 Power P N =P proc + P overhead P proc =N(C inreg + C comb )V N 2 f/N + C outreg V N 2 f =(C inreg + C comb +C outreg )V N 2 f =C ref V N 2 f P overhead =C overhead V N 2 f≈ δC ref (N – 1)V N 2 f P N = [1 + δ(N – 1)]C ref V N 2 f P N V N 2 ──= [1 + δ(N – 1)] ─── P 1 V ref 2
26
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 626 Voltage vs. Speed C L V ref C L V ref Delay of a gate, T ≈ ──── = ────────── Ik(W/L)(V ref – V t ) 2 whereI is saturation current k is a technology parameter W/L is width to length ratio of transistor V t is threshold voltage Supply voltage Normalized gate delay, T 4.0 3.0 2.0 1.0 0.0 VtVt V ref =5VV 2 =2.9V N=1 N=2 V3V3 N=3 1.2μ CMOS Voltage reduction slows down as we get closer to V t
27
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 627 Increasing Multiprocessing P N /P 1 1 2 3 4 5 6 7 8 9 10 11 12 1.0 0.8 0.6 0.4 0.2 0.0 V t =0V (extreme case) V t =0.4V V t =0.8V N 1.2μ CMOS, V ref = 5V
28
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 628 Extreme Cases: V t = 0 Delay, T α 1/ V ref For N processing elements, delay = NT → V N = V ref /N P N 1 ──=[1+ δ (N – 1)] ──→1/N P 1 N 2 For negligible overhead, δ→0 P N 1 ──≈── P 1 N 2 For V t > 0, power reduction is less and there will be an optimum value of N.
29
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 629 Example: Multiplier Core Specification: 200MHz Clock 15W dissipation @ 5V Low voltage operation, V DD ≥ 1.5 volts (V DD – 0.5) 2 (V DD – 0.5) 2 Relative clock rate = ─────── Relative clock rate = ─────── 20.25 20.25 Problem: Integrate multiplier core on a SOC Power budget for multiplier ~ 5W
30
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 630 A Multicore Design Multiplier Core 1 Multiplier Core 5 Reg 5 to 1 mux Multiphase Clock gen. and mux control Input Output 200MHz CK 200MHz 40MHz Multiplier Core 2 Core clock frequency = 200/N, N should divide 200.
31
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 631 How Many Cores? For N cores: clock frequency = 200/N MHz Supply voltage, V DDN = 0.5 + (20.25/N) 1/2 Volts Assuming 10% overhead per core, V DDN V DDN Power dissipation =15 [1 + 0.1(N – 1)] ( ─── ) 2 watts 5
32
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 632 Design Tradeoffs Number of cores, N Clock (MHz) Core supply VDDN (Volts) Total Power (Watts) 12005.0015.0 21003.688.94 4502.755.90 5402.515.29 8252.104.50
33
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 633 Power Reduction in Processors Just about everything is used. Hardware methods: Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Clock gating, frequency reduction Sleep mode Architecture: Instruction set hardware organization Software methods
34
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 634 Parallel Architecture Processor f f/2 Processor f/2 f Input Output Input Output Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 2.2C Voltage = 0.6V Frequency = 0.5f Power = 0.396CV 2 f
35
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 635 Pipeline Architecture Processor f Input Output Register ½ Proc. f InputOutput Register ½ Proc. Register Capacitance = C Voltage = V Frequency = f Power = CV 2 f Capacitance = 1.2C Voltage = 0.6V Frequency = f Power = 0.432CV 2 f
36
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 636 Approximate Trend n-parallel proc. n-parallel proc. n-stage pipeline proc. n-stage pipeline proc. CapacitancenCC VoltageV/nV/n Frequencyf/nf Power CV 2 f/n 2 Chip area n times n times 10-20% increase G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: Kluwer Academic Publishers, 1998.
37
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 637 Multicore Processors 200020042008 Performance based on SPECint2000 and SPECfp2000 benchmarks Multicore Single core Computer, May 2005, p. 12
38
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 638 Multicore Processors D. Geer, “Chip Makers Turn to Multicore Processors,” Computer, vol. 38, no. 5, pp. 11-13, May 2005. A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Computer, vol. 5, no. 7, pp. 36-40, July 2005; this special issue contains three more articles on multicore processors. S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” IEEE Spectrum, vol. 43. no. 1, pp. 20-23, January 2006.
39
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 639 Cell - Cell Broadband Engine Architecture L to R Atsushi Kameyama, Toshiba James Kahle, IBM Masakazu Suzoki, Sony © IEEE Spectrum, January 2006 Nine-processor chip: 192 Gflops
40
Copyright Agrawal & Srivaths, 2007Low-Power Design and Test, Lecture 640 Cell’s Nine-Processor Chip © IEEE Spectrum, January 2006 Eight Identical Processors f = 5.6GHz (max) 44.8 Gflops
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.