Sp09 CMPEN 411 L23 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 23: Memory Cell Designs SRAM, DRAM [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp09 CMPEN 411 L23 S.2 Heads-up IBM Kerry Bernstein’s talk Thursday 4 PM, IST 333 l To prepare for his talk, go to ANGEL system, find the file “New dimensions in performance”, under “interesting reading materials” To make up last cancelled lecture: l Kerry Bernstein’s talk – “Microarchitecture’s Race for Performance and Power”, PSU talk, 11/2004, Slides and Videos are online in ANGEL system “Interesting Reading Materials” DAC Young Student Scholarship
Sp09 CMPEN 411 L23 S.3 Review: Basic Building Blocks Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers, decoders Control l Finite state machines (PLA, ROM, random logic) Interconnect l Switches, arbiters, buses Memory l ROM, Caches (SRAMs), CAM, DRAMs, buffers
Sp09 CMPEN 411 L23 S.4 2D 4x4 SRAM Memory Bank A0A0 Row Decoder !BL WL[0] A1A1 A2A2 Column Decoder sense amplifiers write circuitry BL WL[1] WL[2] WL[3] bit line precharge 2 bit words clocking and control enable read precharge BL i BL i+1
Sp09 CMPEN 411 L23 S.5 6-Transistor SRAM Storage Cell !BLBL WL M1M1 M2M2 M3M3 M4M4 M5M5 M6M6 Q !Q 1 0 on off
Sp09 CMPEN 411 L23 S.6 SRAM Cell Analysis (Read) !BL=2.5V BL=2.5V WL=1 M1M1 M4M4 M5M5 M6M6 Q=1 !Q=0 C bit Read-disturb (read-upset): must limit the voltage rise on !Q to prevent read-upsets from occurring while simultaneously maintaining acceptable circuit speed and area l M 1 must be stronger than M 5 when storing a 1 (as shown) l M 3 must be stronger than M 6 when storing a 0 0 0
Sp09 CMPEN 411 L23 S.7 Read Voltage Ratios V DD = 2.5V V Tn = 0.4V where CR is the Cell Ratio = (W 1 /L 1 )/(W 5 /L 5 ) Keep cell size minimal while maintaining read stability l Make M 1 minimum size and increase the L of M 5 (to make it weaker) -increases load on WL l Make M 5 minimum size and increase the W of M 1 (to make it stronger) Similar constraints on (W 3 /L 3 )/(W 6 /L 6 ) when storing a 0 1.2
Sp09 CMPEN 411 L23 S.8 SRAM Cell Analysis (Write) !BL=2.5V BL=0V WL=1 M1M1 M4M4 M5M5 M6M6 Q=1 !Q=0 C bit The !Q side of the cell cannot be pulled high enough to ensure writing of 0 (because M 1 is on and sized to protect against read upset). So, the new value of the cell has to be written through M 6. l M 6 must be able to overpower M 4 when storing a 1 and writing a 0 l M 5 must be able to overpower M 2 when storing a 0 and writing a 1 0 0
Sp09 CMPEN 411 L23 S.9 Write Voltage Ratios V DD = 2.5V |V Tp | = 0.4V p / n = 0.5 where PR is the Pull-up Ratio = (W 4 /L 4 )/(W 6 /L 6 ) Keep cell size minimal while allowing writes l Make M 4 and M 6 minimum size 1.8
Sp09 CMPEN 411 L23 S.10 Cell Sizing and Performance Keeping cell size minimal is critical for large SRAMs l Minimum sized pull down fets (M 1 and M 3 ) -Requires longer than minimum channel length, L, pass transistors (M 5 and M 6 ) to ensure proper CR -But up-sizing L of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle l Minimum width and length pass transistors -Boost the width of the pull downs (M 1 and M 3 ) -Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger Performance is determined by the read operation l To accelerate the read time, SRAMs use sense amplifiers (so that the bit line doesn’t have to make a full swing)
Sp09 CMPEN 411 L23 S.11 6-T SRAM Layout V DD GND Q Q WL BL M1 M3 M4M2 M5M6 Simple and reliable, but big l signal routing and connections to two bit lines, a word line, and both supply rails Area is dominated by the wiring and contacts Other alternatives to the 6-T cell include the resistive load 4-T cell and the TFT cell neither of which are available in a standard CMOS logic process
Sp09 CMPEN 411 L23 S.12 Multiple Read/Write Port Storage Cell !BL1 BL1 WL1 M1M1 M2M2 M3M3 M4M4 M5M5 M6M6 Q!Q WL2 BL2 !BL2 M7M7 M8M8 To avoid read upset, the widths of M 1 and M 3 will have to be sized up by a factor equal to the number of simultaneously open read ports
Sp09 CMPEN 411 L23 S.13 Resistance-load SRAM Cell M 3 R L R L V DD WL QQ M 1 M 2 M 4 BL
Sp09 CMPEN 411 L23 S.14 Remove R M 3 WL M 1 M 2 M 4 BL
Sp09 CMPEN 411 L23 S.15 Remove R M 3 WL M 2 M 4 Further remove one transistor
Sp09 CMPEN 411 L23 S.16 3-Transistor DRAM Cell M1 M2 M3 X BL1 BL2 WWL RWL XV DD -V T BL1 V DD WWL write RWLreadBL2 V DD -V T VV CsCs Write: C s is charged (or discharged) by asserting WWL and BL1 l Value stored at node X when writing a 1 is V WWL - V Tn Read: C s is “sensed” by asserting RWL and observing BL2 l Read is non-destructive and inverting (ratioless)
Sp09 CMPEN 411 L23 S.17 3-Transistor DRAM Cell M1 M2 M3 X BL1 BL2 WWL RWL XV DD -V T BL1 V DD WWL write RWLreadBL2 V DD -V T VV CsCs Refresh: read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec) Note Vt drop at x: how to fix it?
Sp09 CMPEN 411 L23 S.18 3-T DRAM Layout BL2BL1GND RWL WWL M3 M2 M1 Fewer contacts & wires Total cell area is (compared to 1,092 2 for the 6-T SRAM cell) No special processing steps are needed (so compatible with logic CMOS process) Can use bootstrapping (raise V WWL to a value higher than V DD ) to eliminate threshold drop when storing a “1”
Sp09 CMPEN 411 L23 S.19 1-Transistor DRAM Cell M1 X BL WL XV DD -V T WL write 1 BL V DD CsCs read 1 V DD /2 sensing C BL Write: C s is charged (or discharged) by asserting WL and BL Read: Charge redistribution occurs between C BL and C s l Read is destructive, so must refresh after read Voltage swing is small
Sp09 CMPEN 411 L23 S.20 Sense Amp Operation V(1) V(0) t V PRE V BL Sense amp activated Word line activated
Sp09 CMPEN 411 L23 S.21 1-T DRAM Cell Observations Cell is single ended (complicates the design of the sense amp) Cell requires a sense amp for each bit line due to charge redistribution based read l BL’s precharged to V DD /2 (not V DD as with SRAM design) l all previous designs used SAs for speed, not functionality Cell read is destructive; refresh must follow to restore data Cell requires an extra capacitor (C S ) that must be explicitly included in the design l May not compatible with logic CMOS process A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than V DD )
Sp09 CMPEN 411 L23 S.22 1-T DRAM (3-D capacitor) Source: IBM Non-CMOS
Sp09 CMPEN 411 L23 S.23 Peripheral Memory Circuitry Row and column decoders Read bit line precharge logic Sense amplifiers Timing and control Speed Power consumption Area – pitch matching
Sp09 CMPEN 411 L23 S.24 2D 4x4 __RAM Memory A0A0 Row Decoder !BL WL[0] A1A1 A2A2 Column Decoder sense amplifiers write circuitry BL WL[1] WL[2] WL[3] bit line precharge 2 bit words clocking and control enable read precharge BL i BL i+1
Sp09 CMPEN 411 L23 S.25 2D 4x4 ___RAM Memory A0A0 Row Decoder BL WL[0] A1A1 A2A2 Column Decoder sense amplifiers write circuitry WL[1] WL[2] WL[3] bit line precharge 2 bit words BL 0 BL 1 BL 2 BL 3 clocking, control, and refresh enable read precharge
Sp09 CMPEN 411 L23 S.26 Row Decoders Collection of 2 M complex logic gates organized in a regular, dense fashion (N)AND decoder for 8 address bits WL(0) = !A 7 & !A 6 & !A 5 & !A 4 & !A 3 & !A 2 & !A 1 & !A 0 … WL(255) = A 7 & A 6 & A 5 & A 4 & A 3 & A 2 & A 1 & A 0 NOR decoder for 8 address bits WL(0) = !(A 7 | A 6 | A 5 | A 4 | A 3 | A 2 | A 1 | A 0 ) … WL(255) = !(!A 7 | !A 6 | !A 5 | !A 4 | !A 3 | !A 2 | !A 1 | !A 0 ) Goals: Pitch matched, fast, low power
Sp09 CMPEN 411 L23 S.27 Dynamic Decoders Precharge devices V DD GND WL A 0 A 0 GND A 1 A 1 WL 3 A 0 A 0 A 1 A V DD V V V 2-input NOR decoder 2-input NAND decoder Which one is faster? Smaller? Low power?
Sp09 CMPEN 411 L23 S.28 Pass Transistor Based Column Decoder BL 3 BL 2 BL 1 BL 0 data_out 2 input NOR decoder A1A1 A0A0 S3S3 S2S2 S1S1 S0S0 Read: connect BLs to the Sense Amps (SA) Writes: drive one of the BLs low to write a 0 into the cell l Fast since there is only one transistor in the signal path. However, there is a large transistor count ( (K+1)2 K + 2 x 2 K ) l For K = 2 3 x 2 2 (decoder) + 2 x 2 2 (PTs) = = 20 !BL 3 !BL 2 !BL 1 !BL 0 !data_out
Sp09 CMPEN 411 L23 S.29 Tree Based Column Decoder BL 3 BL 2 BL 1 BL 0 A0A0 !A 0 A1A1 !A 1 data_out Number of transistors = (2 x 2 x (2 K -1)) for K = 2 2 x 2 x (2 2 – 1) = 4 x 3 = 12 Delay increases quadratically with the number of sections (K) (so prohibitive for large decoders) can fix with buffers, progressive sizing, combination of tree and pass transistor approaches !BL 3 !BL 2 !BL 1 !BL 0 !data_out
Sp09 CMPEN 411 L23 S.30 Bit Line Precharge Logic equalization transistor - speeds up equalization of the two bit lines by allowing the capacitance and pull-up device of the nondischarged bit line to assist in precharging the discharged line !PC !BLBL First step of a Read cycle is to precharge (PC) the bit lines to V DD l every differential signal in the memory must be equalized to the same voltage level before Read Turn off PC and enable the WL l the grounded PMOS load limits the bit line swing (speeding up the next precharge cycle)
Sp09 CMPEN 411 L23 S.31 Sense Amplifiers Amplification – resolves data with small bit line swings (in some DRAMs required for proper functionality) Delay reduction – compensates for the limited drive capability of the memory cell to accelerate BL transition SA input output t p = ( C * V ) / I av large small make V as small as possible Power reduction – eliminates a large part of the power dissipation due to charging and discharging bit lines Signal restoration – for DRAMs, need to drive the bit lines full swing after sensing (read) to do data refresh
Sp09 CMPEN 411 L23 S.32 Differential Sense Amplifier Directly applicable to SRAMs M 4 M 1 M 5 M 3 M 2 V DD bit SE Out y
Sp09 CMPEN 411 L23 S.33 Differential Sensing ― SRAM
Sp09 CMPEN 411 L23 S.34 Reliability and Yield Memories operate under low signal-to-noise conditions l word line to bit line coupling can vary substantially over the memory array -folded bit line architecture (routing BL and !BL next to each other ensures a closer match between parasitics and bit line capacitances) l interwire bit line to bit line coupling -transposed (or twisted) bit line architecture (turn the noise into a common-mode signal for the SA) l leakage (in DRAMs) requiring refresh operation suffer from low yield due to high density and structural defects l increase yield by using error correction (e.g., parity bits) and redundancy and are susceptible to soft errors due to alpha particles and cosmic rays
Sp09 CMPEN 411 L23 S.35 Redundancy in the Memory Structure Row address Column address Redundant row Redundant columns Fuse bank
Sp09 CMPEN 411 L23 S.36 Page 4 == ? Redundant Wordline Fused Repair Addresses Enable Normal Wordline Decoder Normal Wordline Functional Address == ? Redundant Wordline Fused Repair Addresses Enable Normal Wordline Decoder Normal Wordline Row Redundancy
Sp09 CMPEN 411 L23 S.37 Page 5 Column Redundancy
Sp09 CMPEN 411 L23 S.38 Error-Correcting Codes Example: Hamming Codes e.g. If B3 flips = 3 2 K >= m+k+1. m # data bit, k # check bit For 64 data bits, needs 7 check bits
Sp09 CMPEN 411 L23 S.39 Performance and area overhead for ECC
Sp09 CMPEN 411 L23 S.40 Redundancy and Error Correction
Sp09 CMPEN 411 L23 S.41 Soft Errors Nonrecurrent and nonpermanent errors from l alpha particles (from the packaging materials) l neutrons from cosmic rays As feature size decreases, the charge stored at each node decreases (due to a lower node capacitance and lower V DD ) and thus Q critical (the charge necessary to cause a bit flip) decreases leading to an increase in the soft error rate (SER ) From Semico Research Corp. MTBF (hours).13 m.09 m Ground-based Civilian Avionics System Military Avionics System189 From Actel
Sp09 CMPEN 411 L23 S.42 Scary Fact Avionics system in civilian aviation: altitude of 30,000 feet on a route crossing the north pole both cause increase in neutron flux. If avionics board uses four 1M 130nm SRAM-based FPGAs, it would be subject to upsets per day = 324 hours between upsets or 3million FITs. Assume one such system on-board each commercial aircraft, 4,000 civilian flights per day, 3 hours average flight time. Nearly 37 aircraft will experience a neutron-induced SRAM-based FPGA configuration failure during the duration of their flight.
Sp09 CMPEN 411 L23 S.43 Modeling of a particle strike
Sp09 CMPEN 411 L23 S.44 A SPICE simulation for SRAM A particle strike !BL BL WL 0->1 1->0 0
Sp09 CMPEN 411 L23 S.45 On-chip Memory: ITRS roadmap
Sp09 CMPEN 411 L23 S.46 State of Art
Sp09 CMPEN 411 L23 S.47 State of Art