CSE477 VLSI Digital Circuits Fall 2003 Lecture 24: Memory Cell Designs

Slides:



Advertisements
Similar presentations
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Advertisements

Sistemi Elettronici Programmabili1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Memorie (vedi anche i file pcs1_memorie.pdf.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Introduction to CMOS VLSI Design Lecture 13: SRAM
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Introduction to CMOS VLSI Design SRAM/DRAM
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Memories: –ROM; –SRAM; –DRAM. n PLAs.
Lecture 19: SRAM.
Parts from Lecture 9: SRAM Parts from
Digital Integrated Circuits for Communication
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Washington State University
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2012 Memory Periphery.
Chapter 07 Electronic Analysis of CMOS Logic Gates
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Latches and flip-flops. n RAMs and ROMs.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
Sp09 CMPEN 411 L23 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 23: Memory Cell Designs SRAM, DRAM [Adapted from Rabaey’s Digital Integrated.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 27: November 14, 2011 Memory Core.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
Digital Logic Design Instructor: Kasım Sinan YILDIRIM
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2010 Memory Overview.
ECE 300 Advanced VLSI Design Fall 2006 Lecture 19: Memories
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
CSE477 L07 Pass Transistor Logic.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 07: Pass Transistor Logic Mary Jane Irwin (
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2011 Memory Periphery.
Washington State University
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 7, 2014 Memory Overview.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 8, 2013 Memory Overview.
Sp09 CMPEN 411 L21 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey’s Digital Integrated Circuits,
EE 534 summer 2004 University of South Alabama EE534 VLSI Design System summer 2004 Lecture 14:Chapter 10 Semiconductors memories.
Low Power SRAM VLSI Final Presentation Stephen Durant Ryan Kruba Matt Restivo Voravit Vorapitat.
CSE477 L25 Memory Peripheral.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 25: Peripheral Memory Circuits Mary Jane Irwin (
Lecture 08: Pass Transistor Logic
COE 360 Principles of VLSI Design Delay. 2 Definitions.
CSE477 VLSI Digital Circuits Fall Lecture 23: Semiconductor Memories
COMP541 Memories II: DRAMs
Lecture 3. Lateches, Flip Flops, and Memory
Digital Integrated Circuits A Design Perspective
Memories.
Digital Integrated Circuits for Communication
Lecture 19: SRAM.
IV UNIT : GATE LEVEL DESIGN
CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 25: Peripheral Memory Circuits Mary Jane.
Morgan Kaufmann Publishers
CSE477 VLSI Digital Circuits Fall 2002 Lecture 06: Static CMOS Logic
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 27: System Level Interconnect Mary Jane.
Day 26: November 11, 2011 Memory Overview
CSE477 VLSI Digital Circuits Fall Lecture 07: Pass Transistor Logic
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2003 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Lecture 10: Circuit Families
Day 29: November 11, 2013 Memory Core: Part 1
Memory.
Semiconductor Memories
Day 21: October 29, 2010 Registers Dynamic Logic
Chapter 8 MOS Memory and Storage Circuits
Lecture 10: Circuit Families
Reading: Hambley Ch. 7; Rabaey et al. Secs. 5.2, 5.5, 6.2.1
Day 26: November 10, 2010 Memory Periphery
Day 25: November 8, 2010 Memory Core
Presentation transcript:

CSE477 VLSI Digital Circuits Fall 2003 Lecture 24: Memory Cell Designs Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

Review: Random Access Read Write Memories SRAM – Static Random Access Memory data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology DRAM - Dynamic Random Access Memory periodic refresh required (every 1 to 4 ms) to compensate for the charge loss caused by leakage small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology

Review: 2D 4x4 SRAM Memory Bank read precharge bit line precharge enable WL[0] !BL BL A1 WL[1] Row Decoder A2 WL[2] WL[3] 2 bit words To decrease the bit line delay for reads – use low swing bit lines, i.e., bit lines don’t swing rail-to-rail, but precharge to Vdd (as 1) and discharge to Vdd – 10%Vdd (as 0). (So for 2.5V Vdd, 0 is 2.25V.) Requires sense amplifiers to restore to full swing. Write circuitry – receives full swing from sense amps – or writes full swing to the bit lines A0 Column Decoder clocking and control sense amplifiers BLi BLi+1 write circuitry

6-Transistor SRAM Storage Cell !BL BL WL M1 M2 M3 M4 M5 M6 Q !Q off on 1 off on For lecture Note that it is identical to the register cell from static sequential circuit - cross-coupled inverters. The major job of the pullups is to replenish loss due to leakage Sizing of the transistors is critical Does it consume standby power (yes – but of two forms. see if the students can identify each). Cell leakage and bit line leakage. Talk about the crow-bar effect of the cross-coupled inverters

SRAM Cell Analysis (Read) WL=1 M4 M6 !Q=0 M5 Q=1 M1 Cbit Cbit !BL=2.5V  0 BL=2.5V Read-disturb (read-upset): must limit the voltage rise on !Q to prevent read-upsets from occurring while simultaneously maintaining acceptable circuit speed and area M1 must be stronger than M5 when storing a 1 (as shown) M3 must be stronger than M6 when storing a 0 First precharge both bit lines – BL and !BL – to 1 - Note that bit line capacitance values for Cbit can be in the pF range for large memories (bit line capacitance is from wiring and from diffusion caps of M5’s of all the cells connected to the bit line and the load presented by the sense amp) Then discharge !BL through M5 and M1 (since the cell is holding 1) – key is that you must ensure that the !Q point doesn’t rise too high before Cbit is discharged or the memory cell will change state- read-disturb. BL being 1 will help to keep the cell from toggling. Or can precharge the bit lines to Vdd/2 so !Q could never reach the switching point. This has performance benefits as well.

where CR is the Cell Ratio = (W1/L1)/(W5/L5) Read Voltage Ratios V!Q = [VDSATn + CR(VDD – VTn) - (VDSATn2(1+CR) + CR2(VDD – VTn)2)]/CR where CR is the Cell Ratio = (W1/L1)/(W5/L5)  1.2 VDD = 2.5V VTn = 0.4V Keep cell size minimal while maintaining read stability Make M1 minimum size and increase the L of M5 (to make it weaker) increases load on WL Make M5 minimum size and increase the W of M1 (to make it stronger) Similar constraints on (W3/L3)/(W6/L6) when storing a 0 To avoid read-disturb, the voltage on node !Q must remain below the trip point of the inverter pair for all process, noise, and operating conditions. CRs of no less than 1.2 (most microprocessor fabs use a minimum cell ratio of 1.25 to 2) Simulations should be done at high Vdd and low VTn (and considering process variations and misalignments)

SRAM Cell Analysis (Write) WL=1 M4 M6 !Q=0 M5 Q=1 M1  0 Cbit Cbit !BL=2.5V BL=0V The !Q side of the cell cannot be pulled high enough to ensure writing of 0 (because M1 is on and sized to protect against read upset). So, the new value of the cell has to be written through M6. M6 must be able to overpower M4 when storing a 1 and writing a 0 M5 must be able to overpower M2 when storing a 0 and writing a 1 State shown is that before write takes effect (1 is stored, trying to write a 0) Note that the !Q side of the cell cannot be pulled high enough to ensure the writing of a 1 (because of the Q state holding M1 on for a good path to ground). So writing has to occur through the Q side of the RAM cell. In order to write the cell, the pass gate M6 must be more conductive than the M4 to allow node Q to be pulled to a value low enough for the inverter pair to begin amplifying the new data. The maximum ratio of the pullup size to that of the pass gate required to guarantee that the cell is writable – M6 in linear, M4 in saturation

Write Voltage Ratios Keep cell size minimal while allowing writes VQ = (VDD - VTn) – ((VDD – VTn)2 – 2(p/n)(PR)((VDD – |VTp|)VDSATp – VDSATp2/2)) where PR is the Pull-up Ratio = (W4/L4)/(W6/L6)  1.8 VDD = 2.5V |VTp| = 0.4V p/n = 0.5 Keep cell size minimal while allowing writes Make M4 and M6 minimum size Be sure to consider worst case process corners (strong PMOS, weak NMOS, high VDD) In order to write the cell, node Q must be pulled to a value low enough to trip the inverter combination – so pulling Q below VTn (0.4V) is required. The lower the PR, the lower the value of VQ (has to be below 1.8). The limiting case for the write operation occurs at high Vdd when the pfet is strong (mup high, VTp low) and the nfet is weak (mun low, VTn high) Typically, the widths of the pullup devices are sized at or near process minimum. Longer than minimum channel lengths may also be employed to further reduce the pullup ratio. This is necessary since the read sizing of the SRAM cell dictates that the pass gate sizing should be minimized to prevent read disturbs.

Cell Sizing and Performance Keeping cell size minimal is critical for large SRAMs Minimum sized pull down fets (M1 and M3) Requires longer than minimum channel length, L, pass transistors (M5 and M6) to ensure proper CR But up-sizing L of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle Minimum width and length pass transistors Boost the width of the pull downs (M1 and M3) Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger Performance is determined by the read operation To accelerate the read time, SRAMs use sense amplifiers (so that the bit line doesn’t have to make a full swing) Read requires (dis)charging of the large bit-line capacitance through the stack of two small transistors in the selected cell. Write is dominated by the propagation delay of the cross-coupled inverter pair. SPEED – tp = CLVswing/Iav Read – critical speed op since have large bit line capacitance to discharge through small transistors Write – speed determined by the propagation delay of the cross-coupled inverters tpLH > tpHL due to precharge

6-T SRAM Layout Simple and reliable, but big VDD GND Q WL BL M1 M3 M4 M2 M5 M6 Simple and reliable, but big signal routing and connections to two bit lines, a word line, and both supply rails Area is dominated by the wiring and contacts (11.5 of them) Other alternatives to the 6-T cell include the resistive load 4-T cell and the TFT cell neither of which are available in a standard CMOS logic process Area is dominated by wiring and contacts – 11.5 contacts Other considerations – size of cell (1092 lambda**2 = 582 micron **2 by 0.7 micron design rules) and power – standby/cell of 10**-15A

Multiple Read/Write Port Storage Cell WL2 BL2 !BL2 M7 M8 !BL1 BL1 WL1 M1 M2 M3 M4 M5 M6 Q !Q Allows multiple simultaneous reads of the same cell, so the cell design must be stable for a case of multiple reads. For the case of reads, with more than one pass gate open, the voltage rise in the cell will be larger and thus the size of the pulldown will have to be increased to maintain an acceptably low level to keep from incurring a read upset (by a factor equal to the number of simultaneous open read ports). To avoid read upset, the widths of M1 and M3 will have to be sized up by a factor equal to the number of simultaneously open read ports

clocking, control, and refresh 2D 4x4 DRAM Memory read precharge bit line precharge enable WL[0] BL A1 WL[1] Row Decoder A2 WL[2] WL[3] 2 bit words Note refresh logic Note a single bit line (no BLbar) – thus sense amps are required for operation (and are more difficult to design) Note that now sense amps are in front of the column decoder (because one is needed for every bit line, not just for the word being accessed) unlike in the SRAM case. sense amplifiers clocking, control, and refresh BL0 BL1 BL2 BL3 write circuitry A0 Column Decoder

3-Transistor DRAM Cell WWL WWL write RWL BL1 VDD M3 X X VDD-VT M1 M2 RWL read Cs BL2 VDD-VT V BL2 BL1 Core of first popular MOS memories (e.g., first 1Kbit memory from Intel). Cs is data storage (internal capacitance of wiring, M2 gate, and M1 diffusion capacitances) No constraints on device sizes (ratioless) Note threshold drop at point X which decreases the drive (gate voltage) of M2 and slows down read time – some designs “bootstrap” the word line voltage (raise the VWWL to a value higher than Vdd to get around threshold drop). Write – uses WWL and BL1. Data is retained as charge on CS once WWL is lowered. Read – uses RWL and BL2. Assume BL2 precharged to Vdd (or Vdd-Vt). If cell is holding 1, then BL2 goes low – so reads are inverting. Refresh – read stored data, put its inverse on BL1 and assert WWL (need to do this every 1 to 4 msec) Write: Cs is charged (or discharged) by asserting WWL and BL1 Value stored at node X when writing a 1 is VWWL - VTn Read: Cs is “sensed” by asserting RWL and observing BL2 Read is non-destructive and inverting

3-T DRAM Layout BL2 BL1 GND RWL WWL M3 M2 M1 Total cell area is 576 2 (compared to 1,092 2 for the 6-T SRAM cell) No special processing steps are needed (so compatible with logic CMOS process) Can use bootstrapping (raise VWWL to a value higher than VDD) to eliminate threshold drop when storing a “1” Note many fewer contacts (only 7) and wires than in 6-T SRAM cell 576 lambda**2 as compared to 1092 lambda**2 for SRAM cell

1-Transistor DRAM Cell write 1 read 1 WL WL X M1 X VDD-VT Cs CBL VDD BL VDD/2 sensing BL Most pervasive DRAM cell in commercial designs. Write – set BL and activate WL. Once again could bootstrap WL so that voltage drop at X doesn’t occur (to bring it up to Vdd) – common practice Read - BL precharged to Vpre – typically Vdd/2 – then assert WL and sense state of BL that takes effect due to charge sharing between CBL and Cs. Note that Read is destructive (steal charge from Cs) so must follow with a refresh cycle. Note that Cs << CBL (1 to 2 orders of magnitude) so read voltage swings are typically very small (around 250mV for 0.8 micron technology?). Charge transfer ratios are between 1% and 10% delta V = VBL – Vpre = (Vbit – Vpre) (Cs/(Cs + CBL)) REQUIRES a sense amp for each bit line for correct operation (wereas before was used to improve performance (via reduced bit line swings on reads)) Write: Cs is charged (or discharged) by asserting WL and BL Read: Charge redistribution occurs between CBL and Cs Read is destructive, so must refresh after read

1-T DRAM Cell Layout How to build the capacitor. Need to “build” and explicit capacitor that is both small and good – the key design challenge in 1T DRAM cells. Note that the technology is not compatible with CMOS logic technology.

1-T DRAM Cell Observations Cell is single ended (complicates the design of the sense amp) Cell requires a sense amp for each bit line due to charge redistribution based read BL’s precharged to VDD/2 (not VDD as with SRAM design) all previous designs used SAs for speed, not functionality Cell read is destructive; refresh must follow to restore data Cell requires an extra capacitor (CS) that must be explicitly included in the design not compatible with logic CMOS process A threshold voltage is lost when writing a 1 (can be circumvented by bootstrapping the word lines to a higher value than VDD)

Content Addressable Memories (CAMs) Memories addressed by their content. Typical applications include cache tags and translation lookaside buffers (TLBs) for virtual to physical address translation Address issued by CPU (page size = index bits + word select bits) Word in Line 2way Associative Cache Cache Index PA Tag VA Tag Tag Data Tag Data PA Tag Hit = = Most TLBs are small (<= 256 entries) and thus fully associative (CAM implementation) Hit Desired word

9-T CAM Cell Writes progress as in a standard SRAM cell Compares the stored data (Q and !Q) to the bit line data Precharged match line ties to all cells in a row If Q and BL match, x is discharged through M2 and BL (data 0) or M3 and !BL (data 1) and thus M1 is off keeping the match line high Else if Q and BL don’t match, x is charged to VDD – VT and match line discharges thru M1 !BL BL WL Q !Q M3 M2 Precharge all match lines high (so Hit = 0) Full swing reads If the match data matches the data stored in the cell, it leaves the match line high so that Hit = 1; a mismatch pulls the match line low x match M1

4x4 CAM Design  1 WL[0] Hit 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 0 to WL[0]  1 WL[0] Hit 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 0 to WL[0] of data array WL[1] match[0] 1  0  1 WL[2] match[1] WL[3] match[2] match[3] Read/Write Circuitry Precharge all match lines high (so Hit = 0) and a mismatch pulls the match lines low. The pull-down transistors in the comparator are connected in a wired-OR fashion; even if only one of the bits in a given row mismatches, the match line is pulled low. CAMs (like this one) are not very power efficient! match/write data 1 0 1 1 precharge/match  1

Next Lecture and Reminders Peripheral memory circuits Reading assignment – Rabaey, et al, 12.3-12.4 Reminders HW#5 (optional) due December 2nd Project final reports due December 4th Final grading negotiations/correction (except for the final exam) must be concluded by December 10th Final exam scheduled Tuesday, December 16th from 10:10 to noon in 118 and 113 Thomas