Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE477 VLSI Digital Circuits Fall 2003 Lecture 24: Memory Cell Designs

Similar presentations


Presentation on theme: "CSE477 VLSI Digital Circuits Fall 2003 Lecture 24: Memory Cell Designs"— Presentation transcript:

1 CSE477 VLSI Digital Circuits Fall 2003 Lecture 24: Memory Cell Designs
Mary Jane Irwin ( ) [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © J. Rabaey, A. Chandrakasan, B. Nikolic]

2 Review: Random Access Read Write Memories
SRAM – Static Random Access Memory data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology DRAM - Dynamic Random Access Memory periodic refresh required (every 1 to 4 ms) to compensate for the charge loss caused by leakage small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology

3 Review: 2D 4x4 SRAM Memory Bank
read precharge bit line precharge enable WL[0] !BL BL A1 WL[1] Row Decoder A2 WL[2] WL[3] 2 bit words To decrease the bit line delay for reads – use low swing bit lines, i.e., bit lines don’t swing rail-to-rail, but precharge to Vdd (as 1) and discharge to Vdd – 10%Vdd (as 0). (So for 2.5V Vdd, 0 is 2.25V.) Requires sense amplifiers to restore to full swing. Write circuitry – receives full swing from sense amps – or writes full swing to the bit lines A0 Column Decoder clocking and control sense amplifiers BLi BLi+1 write circuitry

4 6-Transistor SRAM Storage Cell
!BL BL WL M1 M2 M3 M4 M5 M6 Q !Q off on 1 off on For lecture Note that it is identical to the register cell from static sequential circuit - cross-coupled inverters. The major job of the pullups is to replenish loss due to leakage Sizing of the transistors is critical Does it consume standby power (yes – but of two forms. see if the students can identify each). Cell leakage and bit line leakage. Talk about the crow-bar effect of the cross-coupled inverters

5 SRAM Cell Analysis (Read)
WL=1 M4 M6 !Q=0 M5 Q=1 M1 Cbit Cbit !BL=2.5V  0 BL=2.5V Read-disturb (read-upset): must limit the voltage rise on !Q to prevent read-upsets from occurring while simultaneously maintaining acceptable circuit speed and area M1 must be stronger than M5 when storing a 1 (as shown) M3 must be stronger than M6 when storing a 0 First precharge both bit lines – BL and !BL – to 1 - Note that bit line capacitance values for Cbit can be in the pF range for large memories (bit line capacitance is from wiring and from diffusion caps of M5’s of all the cells connected to the bit line and the load presented by the sense amp) Then discharge !BL through M5 and M1 (since the cell is holding 1) – key is that you must ensure that the !Q point doesn’t rise too high before Cbit is discharged or the memory cell will change state- read-disturb. BL being 1 will help to keep the cell from toggling. Or can precharge the bit lines to Vdd/2 so !Q could never reach the switching point. This has performance benefits as well.

6 where CR is the Cell Ratio = (W1/L1)/(W5/L5)
Read Voltage Ratios V!Q = [VDSATn + CR(VDD – VTn) - (VDSATn2(1+CR) + CR2(VDD – VTn)2)]/CR where CR is the Cell Ratio = (W1/L1)/(W5/L5)  1.2 VDD = 2.5V VTn = 0.4V Keep cell size minimal while maintaining read stability Make M1 minimum size and increase the L of M5 (to make it weaker) increases load on WL Make M5 minimum size and increase the W of M1 (to make it stronger) Similar constraints on (W3/L3)/(W6/L6) when storing a 0 To avoid read-disturb, the voltage on node !Q must remain below the trip point of the inverter pair for all process, noise, and operating conditions. CRs of no less than 1.2 (most microprocessor fabs use a minimum cell ratio of 1.25 to 2) Simulations should be done at high Vdd and low VTn (and considering process variations and misalignments)

7 SRAM Cell Analysis (Write)
WL=1 M4 M6 !Q=0 M5 Q=1 M1  0 Cbit Cbit !BL=2.5V BL=0V The !Q side of the cell cannot be pulled high enough to ensure writing of 0 (because M1 is on and sized to protect against read upset). So, the new value of the cell has to be written through M6. M6 must be able to overpower M4 when storing a 1 and writing a 0 M5 must be able to overpower M2 when storing a 0 and writing a 1 State shown is that before write takes effect (1 is stored, trying to write a 0) Note that the !Q side of the cell cannot be pulled high enough to ensure the writing of a 1 (because of the Q state holding M1 on for a good path to ground). So writing has to occur through the Q side of the RAM cell. In order to write the cell, the pass gate M6 must be more conductive than the M4 to allow node Q to be pulled to a value low enough for the inverter pair to begin amplifying the new data. The maximum ratio of the pullup size to that of the pass gate required to guarantee that the cell is writable – M6 in linear, M4 in saturation

8 Write Voltage Ratios Keep cell size minimal while allowing writes
VQ = (VDD - VTn) – ((VDD – VTn)2 – 2(p/n)(PR)((VDD – |VTp|)VDSATp – VDSATp2/2)) where PR is the Pull-up Ratio = (W4/L4)/(W6/L6)  1.8 VDD = 2.5V |VTp| = 0.4V p/n = 0.5 Keep cell size minimal while allowing writes Make M4 and M6 minimum size Be sure to consider worst case process corners (strong PMOS, weak NMOS, high VDD) In order to write the cell, node Q must be pulled to a value low enough to trip the inverter combination – so pulling Q below VTn (0.4V) is required. The lower the PR, the lower the value of VQ (has to be below 1.8). The limiting case for the write operation occurs at high Vdd when the pfet is strong (mup high, VTp low) and the nfet is weak (mun low, VTn high) Typically, the widths of the pullup devices are sized at or near process minimum. Longer than minimum channel lengths may also be employed to further reduce the pullup ratio. This is necessary since the read sizing of the SRAM cell dictates that the pass gate sizing should be minimized to prevent read disturbs.

9 Cell Sizing and Performance
Keeping cell size minimal is critical for large SRAMs Minimum sized pull down fets (M1 and M3) Requires longer than minimum channel length, L, pass transistors (M5 and M6) to ensure proper CR But up-sizing L of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle Minimum width and length pass transistors Boost the width of the pull downs (M1 and M3) Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger Performance is determined by the read operation To accelerate the read time, SRAMs use sense amplifiers (so that the bit line doesn’t have to make a full swing) Read requires (dis)charging of the large bit-line capacitance through the stack of two small transistors in the selected cell. Write is dominated by the propagation delay of the cross-coupled inverter pair. SPEED – tp = CLVswing/Iav Read – critical speed op since have large bit line capacitance to discharge through small transistors Write – speed determined by the propagation delay of the cross-coupled inverters tpLH > tpHL due to precharge

10 6-T SRAM Layout Simple and reliable, but big
VDD GND Q WL BL M1 M3 M4 M2 M5 M6 Simple and reliable, but big signal routing and connections to two bit lines, a word line, and both supply rails Area is dominated by the wiring and contacts (11.5 of them) Other alternatives to the 6-T cell include the resistive load 4-T cell and the TFT cell neither of which are available in a standard CMOS logic process Area is dominated by wiring and contacts – 11.5 contacts Other considerations – size of cell (1092 lambda**2 = 582 micron **2 by 0.7 micron design rules) and power – standby/cell of 10**-15A

11 Multiple Read/Write Port Storage Cell
WL2 BL2 !BL2 M7 M8 !BL1 BL1 WL1 M1 M2 M3 M4 M5 M6 Q !Q Allows multiple simultaneous reads of the same cell, so the cell design must be stable for a case of multiple reads. For the case of reads, with more than one pass gate open, the voltage rise in the cell will be larger and thus the size of the pulldown will have to be increased to maintain an acceptably low level to keep from incurring a read upset (by a factor equal to the number of simultaneous open read ports). To avoid read upset, the widths of M1 and M3 will have to be sized up by a factor equal to the number of simultaneously open read ports

12 clocking, control, and refresh
2D 4x4 DRAM Memory read precharge bit line precharge enable WL[0] BL A1 WL[1] Row Decoder A2 WL[2] WL[3] 2 bit words Note refresh logic Note a single bit line (no BLbar) – thus sense amps are required for operation (and are more difficult to design) Note that now sense amps are in front of the column decoder (because one is needed for every bit line, not just for the word being accessed) unlike in the SRAM case. sense amplifiers clocking, control, and refresh BL0 BL1 BL2 BL3 write circuitry A0 Column Decoder

13 3-Transistor DRAM Cell WWL WWL write RWL BL1 VDD M3 X X VDD-VT M1 M2 RWL read Cs BL2 VDD-VT V BL2 BL1 Core of first popular MOS memories (e.g., first 1Kbit memory from Intel). Cs is data storage (internal capacitance of wiring, M2 gate, and M1 diffusion capacitances) No constraints on device sizes (ratioless) Note threshold drop at point X which decreases the drive (gate voltage) of M2 and slows down read time – some designs “bootstrap” the word line voltage (raise the VWWL to a value higher than Vdd to get around threshold drop). Write – uses WWL and BL1. Data is retained as charge on CS once WWL is lowered. Read – uses RWL and BL2. Assume BL2 precharged to Vdd (or Vdd-Vt). If cell is holding 1, then BL2 goes low – so reads are inverting. Refresh – read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec) Write: Cs is charged (or discharged) by asserting WWL and BL1 Value stored at node X when writing a 1 is VWWL - VTn Read: Cs is “sensed” by asserting RWL and observing BL2 Read is non-destructive and inverting

14 3-T DRAM Layout BL2 BL1 GND RWL WWL M3 M2 M1 Total cell area is 576 2 (compared to 1,092 2 for the 6-T SRAM cell) No special processing steps are needed (so compatible with logic CMOS process) Can use bootstrapping (raise VWWL to a value higher than VDD) to eliminate threshold drop when storing a “1” Note many fewer contacts (only 7) and wires than in 6-T SRAM cell 576 lambda**2 as compared to 1092 lambda**2 for SRAM cell

15 1-Transistor DRAM Cell write 1 read 1 WL WL X M1 X VDD-VT Cs CBL VDD BL VDD/2 sensing BL Most pervasive DRAM cell in commercial designs. Write – set BL and activate WL. Once again could bootstrap WL so that voltage drop at X doesn’t occur (to bring it up to Vdd) – common practice Read - BL precharged to Vpre – typically Vdd/2 – then assert WL and sense state of BL that takes effect due to charge sharing between CBL and Cs. Note that Read is destructive (steal charge from Cs) so must follow with a refresh cycle. Note that Cs << CBL (1 to 2 orders of magnitude) so read voltage swings are typically very small (around 250mV for 0.8 micron technology?). Charge transfer ratios are between 1% and 10% delta V = VBL – Vpre = (Vbit – Vpre) (Cs/(Cs + CBL)) REQUIRES a sense amp for each bit line for correct operation (wereas before was used to improve performance (via reduced bit line swings on reads)) Write: Cs is charged (or discharged) by asserting WL and BL Read: Charge redistribution occurs between CBL and Cs Read is destructive, so must refresh after read

16 1-T DRAM Cell Layout How to build the capacitor. Need to “build” and explicit capacitor that is both small and good – the key design challenge in 1T DRAM cells. Note that the technology is not compatible with CMOS logic technology.

17 1-T DRAM Cell Observations
Cell is single ended (complicates the design of the sense amp) Cell requires a sense amp for each bit line due to charge redistribution based read BL’s precharged to VDD/2 (not VDD as with SRAM design) all previous designs used SAs for speed, not functionality Cell read is destructive; refresh must follow to restore data Cell requires an extra capacitor (CS) that must be explicitly included in the design not compatible with logic CMOS process A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than VDD)

18 Content Addressable Memories (CAMs)
Memories addressed by their content. Typical applications include cache tags and translation lookaside buffers (TLBs) for virtual to physical address translation Address issued by CPU (page size = index bits + word select bits) Word in Line 2way Associative Cache Cache Index PA Tag VA Tag Tag Data Tag Data PA Tag Hit = = Most TLBs are small (<= 256 entries) and thus fully associative (CAM implementation) Hit Desired word

19 9-T CAM Cell Writes progress as in a standard SRAM cell
Compares the stored data (Q and !Q) to the bit line data Precharged match line ties to all cells in a row If Q and BL match, x is discharged through M2 and BL (data 0) or M3 and !BL (data 1) and thus M1 is off keeping the match line high Else if Q and BL don’t match, x is charged to VDD – VT and match line discharges thru M1 !BL BL WL Q !Q M3 M2 Precharge all match lines high (so Hit = 0) Full swing reads If the match data matches the data stored in the cell, it leaves the match line high so that Hit = 1; a mismatch pulls the match line low x match M1

20 4x4 CAM Design  1 WL[0] Hit 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 0 to WL[0]
 1 WL[0] Hit to WL[0] of data array WL[1] match[0] 1  0  1 WL[2] match[1] WL[3] match[2] match[3] Read/Write Circuitry Precharge all match lines high (so Hit = 0) and a mismatch pulls the match lines low. The pull-down transistors in the comparator are connected in a wired-OR fashion; even if only one of the bits in a given row mismatches, the match line is pulled low. CAMs (like this one) are not very power efficient! match/write data precharge/match  1

21 Next Lecture and Reminders
Peripheral memory circuits Reading assignment – Rabaey, et al, Reminders HW#5 (optional) due December 2nd Project final reports due December 4th Final grading negotiations/correction (except for the final exam) must be concluded by December 10th Final exam scheduled Tuesday, December 16th from 10:10 to noon in 118 and 113 Thomas


Download ppt "CSE477 VLSI Digital Circuits Fall 2003 Lecture 24: Memory Cell Designs"

Similar presentations


Ads by Google