Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM

Similar presentations


Presentation on theme: "CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM"— Presentation transcript:

1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © J. Rabaey, A. Chandrakasan, B. Nikolic]

2 Memory Definitions Size – Kbytes, Mbytes, Gbytes, Tbytes Speed
Read Access – delay between read request and the data available Write Access – delay between write request and the writing of the data into the memory (Read or Write) Cycle - minimum time required between successive reads or writes Read Cycle Read Write Cycle Read Access Read Access Write Write Setup Write Access Data Data Valid Data Written

3 A Typical Memory Hierarchy
By taking advantage of the principle of locality, we can present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology. On-Chip Components Control eDRAM Secondary Memory (Disk) Cache Instr Second Level Cache (SRAM) ITLB Main Memory (DRAM) Datapath Cache Data RegFile DTLB The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). What makes this work is one of the most important principle in computer design - the principle of locality. While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. Speed (ns): ’s ’s ’s ’s ,000’s Size (bytes): ’s K’s K’s M’s T’s Cost: highest lowest

4 More Memory Definitions
Function – functionality, nature of the storage mechanism static and dynamic; volatile and nonvolatile (NV); read only (ROM) Access pattern – random, serial, content addressable Read Write Memories (RWM) NVRWM ROM Random Access Non-Random Access EPROM Mask-prog. ROM SRAM (cache, register file) DRAM (main memory) CAM FIFO, LIFO Shift Register EEPROM FLASH Electrically- prog. PROM Input-output architecture – number of data input and output ports (multiported memories) Application – embedded, secondary, tertiary

5 Random Access Read Write Memories
SRAM – Static Random Access Memory data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology DRAM - Dynamic Random Access Memory periodic refresh required (every 1 to 4 ms) to compensate for the charge loss caused by leakage small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology

6 Evolution in DRAM Chip Capacity
human memory human DNA 4X growth every 3 years! 0.07 m 0.1 m 0.13 m encyclopedia 2 hrs CD audio 30 sec HDTV book m m m m m m page

7 Memory Timing: Approaches
DRAM Timing Multiplexed Adressing SRAM Timing Self-timed

8 6-transistor SRAM Storage Cell
WL M2 M4 Q M6 M5 !Q M1 M3 Note that it is identical to the register cell from static sequential circuit - cross-coupled inverters Consumes power only when switching - no standby power (other than leakage) is consumed The major job of the pullups is to replenish loss due to leakage Sizing of the transistors is critical !BL BL Will cover how the cell works in detail in the next lecture

9 Decoder reduces # of inputs
1D Memory Architecture Word 0 Word 1 Word 2 Word N-1 Word N-2 Storage Cell M bits N words S0 S1 S2 S3 SN-2 SN-1 Input/Output Word 0 Word 1 Word 2 Word N-1 Word N-2 Storage Cell M bits S0 S1 S2 S3 SN-2 SN-1 Input/Output A0 A1 Ak-1 Decoder Only one select line active at a time. E.g., N= 10**6 = 2 **20 (1 Mword) means 1 million select signals By adding decoder reduce number of inputs from 1 million to 20 (address lines). Note, still have to generate 1 million select lines with a very biggggg decoder (see last lecture) Scheme on right, while reducing #inputs, leads to very tall and narrow memories (and very slow because of very long bit lines). Also very big (and slow) address decoder (good to try to pitch match between the decoder and the memory core). N words  N select signals Decoder reduces # of inputs K = log2 N

10 (least significant bits)
2D Memory Architecture bit line (BL) 2K-L word line (WL) AL AL+1 Row Address Row Decoder storage (RAM) cell AK-1 M2L A0 Column Address (least significant bits) selects appropriate word from memory row A1 Column Decoder AL-1 Sense Amplifiers amplifies bit line swing Put multiple words in one memory row – splits the decoder into two decoders (row and column) and makes the memory core square reducing the length of the bit lines (but increasing the length of the word lines). The lsb part of the address goes into the column decoder (e.g., 6 bits so that 64 words are assigned to one row (with 32 bits per word gives 2**11 bit line pairs) leaving 14 bits for the row decoder (giving 2**14 word lines) for an not quite square array. The RAM cell needs to be as compact and fast as possible since it is replicated thousands of times in the core array. Often are willing to trade off noise margins, logic swing, input-output isolation, fan-out, or speed for area. To speed things up (and reduce power consumption), don’t force bit lines to swing from rail-to-rail so need sense amplifiers to restore the signal to full rail-to-rail amplitude. This scheme is good only for up to 64 Kb to 256 Kb. For bigger memories it is too SLOW because the word and bit lines are too long. Read/Write Circuits Input/Output (M bits)

11 3D (or Banked) Memory Architecture
Row Addr Column Addr Block Addr A1 A0 1M word memory with 32 bits/word – 2 bit block address; 6 bit column addr giving 2**11 bit line pairs; 12 bit row address giving 2**12 word lines for almost square memory arrays Input/Output (M bits) Advantages: 1. Shorter word and bit lines so faster access 2. Block addr activates only 1 block saving power

12 2D 4x4 SRAM Memory Bank read precharge bit line precharge enable WL[0]
Row Decoder A2 WL[2] WL[3] 2 bit words A0 Column Decoder clocking and control To decrease the bit line delay for reads – use low swing bit lines, i.e., bit lines don’t swing rail-to-rail, but precharge to Vdd (as 1) and discharge to Vdd – 10%Vdd (as 0). (So for 2.5V Vdd, 0 is 2.25V.) Requires sense amplifiers to restore to full swing. Write circuitry – receives full swing from sense amps – or writes full swing to the bit lines sense amplifiers BLi BLi+1 write circuitry

13 Quartering Gives Shorter WLs and BLs
Precharge Circuit Precharge Circuit data Write Circuitry Write Circuitry Sense Amps Row Decoder Sense Amps Ai-1 … A0 Column Decoder Column Decoder In reality, memory is designed in quadrants (hence the 4x growth every generation). This configuration halfs the length of the word and bit lines (for increased speed). Read Precharge Read Precharge AN-1 … Ai

14 Decreasing Word Line Delay
Drive the word line from both sides polysilicon word line metal word line driver WL Use a metal bypass polysilicon word line metal bypass WL driving from both sides reduces the worst case delay of the word line by 4 (like buffer insertion to reduce RC delay) Use silicides

15 Read Only Memories (ROMs)
A memory that can only be read and never altered Programs for fixed applications that once developed and debugged, never need to be changed, only read Fixing the contents at manufacturing time leads to small and fast implementations. WL BL = 1 BL = 0 WL BL = 0 BL = 1

16 MOS OR ROM Cell Array BL(0) BL(1) BL(2) BL(3) WL(0) VDD WL(1) on on
For class handout – skip covering this one in class (would make a good basis for an exam question) WL(3) predischarge

17 MOS OR ROM Cell Array 1 0 0 1 0 0 0 0 BL(0) BL(1) BL(2) BL(3) WL(0)
BL(0) BL(1) BL(2) BL(3) WL(0) VDD  1 WL(1) on on WL(2) VDD notice how the overhead of the supply lines are reduced by sharing them between neighboring cells. This requires the mirroring of the odd cells around the horizontal axis, an approach that is extensively used in memory cores of all styles. What are the values of the data stored? Beware of threshold drop. WL(3) predischarge 1  0

18 Precharged MOS NOR ROM VDD precharge WL(0) enable GND WL(1) A1
Row Decoder A2 WL(2) GND For class handout. WL(3) BL(0) BL(1) BL(2) BL(3)

19 Precharged MOS NOR ROM VDD  1 precharge WL(0) enable GND WL(1) A1 1
 1 precharge WL(0) enable GND WL(1) A1 1 on on  1 Row Decoder A2 WL(2) GND For lecture. Lots of ways to build them, this is just the structure of choice. First precharge all bit lines to 1 (make sure all word lines are inactive (0) – can do with an enabled decoder), then activate appropriate word line, existing fets selectively discharge bit lines low. So, no fet means a 1 is stored, fet means a 0 is stored. Note that there is only one pull down in series in the network so its fast! What is stored in this ROM is the inverse of what is stored in the OR ROM WL(3) BL(0) BL(1) BL(2) BL(3)

20 MOS NOR ROM Layout 1 Memory is programmed by adding transistors where needed (ACTIVE mask – early in the fab process) cell size of 9.5  x 7  WL(0) GND WL(1) metal1 on top of diffusion WL(2) Each cell has diffusion run under the metal 1 to GND WL(3)

21 MOS NOR ROM Layout 2 Memory is programmed by adding contacts where needed (CONTACT mask – one of the last processing steps) All transistors are fabricated the presence of a metal contact creates a 0-cell WL(0) GND WL(1) cell size of 11  x 7  WL(2) Selective addition of metal to diffusion – add contacts to “program”. Wafers can be prefabricated up to the CONTACT mask and stockpiled. Note that ground is run in diffusion (not great) and is connected to a vertical diffusion run underneath the m1 strip. Done for the sake of space. A metal bypass with regularly-spaced straps keeps the voltage drop over the ground wire within bounds. A large part of the cell is devoted to the bit line contact and the ground connection. Pull-down fets are 4/2 transistors GND WL(3)

22 MOS NAND ROM V DD Pull-up devices BL [0] BL [1] BL [2] BL [3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row

23 Programmming using the Metal-1 Layer Only MOS NAND ROM Layout
Cell (8l x 7l) Programmming using the Metal-1 Layer Only No contact to VDD or GND necessary; Loss in performance compared to NOR ROM drastically reduced cell size Polysilicon Diffusion Metal1 on Diffusion

24 Programmming using Implants Only NAND ROM Layout Cell (5l x 6l)
A threshold lowering implant using n-type impurities turns the device to be always on ->short… Polysilicon Threshold-altering implant Metal1 on Diffusion

25 Transient Model for 512x512 NOR ROM
precharge metal1 rword BL poly Cbit WL cword Word line parasitics (distributed RC model) Resistance/cell:  Wire capacitance/cell: fF Gate capacitance/cell: fF Transient response – time from word line activation to bit line traversing voltage swing (typically 10% of Vdd). Most of the delay is attributable to interconnect parasitics. WL is best modeled as a distributed RC line since its implemented in poly with a relatively high sheet resistance (silicided poly would be advisable). BL is implemented in metal1, so can use capacitive model and all capacitive loads can be lumped into a single element. Bit line parasitics (lumped C model) Resistance/cell:  (which is negligible) Wire capacitance/cell: fF Drain capacitance/cell: 0.8 fF

26 Transient Model for 512x512 MOS NAND ROM
V DD Model for NAND ROM BL C r L bit c r bit WL word Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates Drain/Source and complete gate capacitance c word Bit line parasitics Resistance/cell: 8.7K (compared to in NOR) Speed: NOR: TLH=1.87 ns TLH= 1.2 us

27 Next Lecture and Reminders
SRAM, DRAM, and CAM cores Reading assignment – Rabaey, et al,


Download ppt "CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM"

Similar presentations


Ads by Google