CSE477 VLSI Digital Circuits Fall Lecture 23: Semiconductor Memories

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
Sistemi Elettronici Programmabili1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Memorie (vedi anche i file pcs1_memorie.pdf.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Introduction to CMOS VLSI Design SRAM/DRAM
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Static Memory Outline –Types of Static Memory –Static RAM –Battery Backup –EPROM –Flash Memory –EEPROM Goal –Understand types of static memory –Understand.
Semiconductor Memories Lecture 1: May 10, 2006 EE Summer Camp Abhinav Agarwal.
12/1/2004EE 42 fall 2004 lecture 381 Lecture #38: Memory (2) Last lecture: –Memory Architecture –Static Ram This lecture –Dynamic Ram –E 2 memory.
Lecture on Electronic Memories. What Is Electronic Memory? Electronic device that stores digital information Types –Volatile v. non-volatile –Static v.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2012 Memory Periphery.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Latches and flip-flops. n RAMs and ROMs.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
Memory Semiconductor Memory Classification ETEG 431 SG Size: Bits, Bytes, Words. Timing Parameter: Read, Write Cycle… Function: ROM, RWM, Volatile, Static,
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
ECE 300 Advanced VLSI Design Fall 2006 Lecture 19: Memories
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
CSE477 L07 Pass Transistor Logic.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 07: Pass Transistor Logic Mary Jane Irwin (
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2011 Memory Periphery.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition,
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Sp09 CMPEN 411 L21 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey’s Digital Integrated Circuits,
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
Computer Architecture Chapter (5): Internal Memory
CSE477 L25 Memory Peripheral.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 25: Peripheral Memory Circuits Mary Jane Irwin (
CSE431 L18 Memory Hierarchy.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 18: Memory Hierarchy Review Mary Jane Irwin (
CSE477 L27 System Interconnect.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 27: System Level Interconnect Mary Jane Irwin (
Index What is an Interface Pins of 8085 used in Interfacing Memory – Microprocessor Interface I/O – Microprocessor Interface Basic RAM Cells Stack Memory.
Prof. Hsien-Hsin Sean Lee
Memories.
Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 7th Edition
Lecture 19: SRAM.
Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
Recap DRAM Read Cycle DRAM Write Cycle FAST Page Access Mode
CSE477 VLSI Digital Circuits Fall 2003 Lecture 24: Memory Cell Designs
Internal Memory.
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 25: Peripheral Memory Circuits Mary Jane.
MOS Memory and Storage Circuits
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Memory Units Memories store data in units from one to eight bits. The most common unit is the byte, which by definition is 8 bits. Computer memories are.
William Stallings Computer Organization and Architecture 7th Edition
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
William Stallings Computer Organization and Architecture 8th Edition
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 27: System Level Interconnect Mary Jane.
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2003 Lecture 22: Shifters, Decoders, Muxes Mary Jane.
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
Memory.
Semiconductor Memories
Electronics for Physicists
William Stallings Computer Organization and Architecture 8th Edition
DIICD Class 13 Memories.
Semiconductor memories are classified in different ways. A distinction is made between read-only (ROM) and read-write (RWM) memories. The contents RWMs.
Presentation transcript:

Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 CSE477 VLSI Digital Circuits Fall 2003 Lecture 23: Semiconductor Memories Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg477 [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

Review: Basic Building Blocks Datapath Execution units Adder, multiplier, divider, shifter, etc. Register file and pipeline registers Multiplexers, decoders Control Finite state machines (PLA, ROM, random logic) Interconnect Switches, arbiters, buses Memory Caches (SRAMs), TLBs, DRAMs, buffers

Memory Definitions Size – Kbytes, Mbytes, Gbytes, Tbytes Speed Read Access – delay between read request and the data available Write Access – delay between write request and the writing of the data into the memory (Read or Write) Cycle - minimum time required between successive reads or writes Read Cycle Read Write Cycle Read Access Read Access Write Write Setup Write Access Data Data Valid Data Written

A Typical Memory Hierarchy By taking advantage of the principle of locality, we can present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology. On-Chip Components Control eDRAM Secondary Memory (Disk) Cache Instr Second Level Cache (SRAM) ITLB Main Memory (DRAM) Datapath Cache Data The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). What makes this work is one of the most important principle in computer design - the principle of locality. While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. RegFile DTLB Speed (ns): .1’s 1’s 10’s 100’s 1,000’s Size (bytes): 100’s K’s 10K’s M’s T’s Cost: highest lowest

More Memory Definitions Function – functionality, nature of the storage mechanism static and dynamic; volatile and nonvolatile (NV); read only (ROM) Access pattern – random, serial, content addressable Read Write Memories (RWM) NVRWM ROM Random Access Non-Random Access EPROM Mask-prog. ROM SRAM (cache, register file) DRAM (main memory) CAM FIFO, LIFO Shift Register EEPROM FLASH Electrically- prog. PROM Input-output architecture – number of data input and output ports (multiported memories) Application – embedded, secondary, tertiary

Random Access Read Write Memories (WRMs) SRAM – Static Random Access Memory data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology DRAM - Dynamic Random Access Memory periodic refresh required (every 1 to 4 ms) to compensate for the charge loss caused by leakage small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology

Evolution in DRAM Chip Capacity human memory human DNA 4X growth every 3 years! 0.07 m 0.1 m 0.13 m encyclopedia 2 hrs CD audio 30 sec HDTV book 0.18-0.25 m 0.35-0.4 m 0.5-0.6 m 0.7-0.8 m 1.0-1.2 m 1.6-2.4 m page

6-transistor SRAM Storage Cell WL M2 M4 Q M6 M5 !Q M1 M3 Note that it is identical to the register cell from static sequential circuit - cross-coupled inverters Consumes power only when switching - no standby power (other than leakage) is consumed The major job of the pullups is to replenish loss due to leakage Sizing of the transistors is critical !BL BL Will cover how the cell works in detail in the next lecture

Decoder reduces # of inputs 1D Memory Architecture Word 0 Word 1 Word 2 Word N-1 Word N-2 Storage Cell M bits N words S0 S1 S2 S3 SN-2 SN-1 Input/Output Word 0 Word 1 Word 2 Word N-1 Word N-2 Storage Cell M bits S0 S1 S2 S3 SN-2 SN-1 Input/Output A0 A1 Ak-1 Decoder Only one select line active at a time. E.g., N= 10**6 = 2 **20 (1 Mword) means 1 million select signals By adding decoder reduce number of inputs from 1 million to 20 (address lines). Note, still have to generate 1 million select lines with a very biggggg decoder (see last lecture) Scheme on right, while reducing #inputs, leads to very tall and narrow memories (and very slow because of very long bit lines). Also very big (and slow) address decoder (good to try to pitch match between the decoder and the memory core). N words  N select signals Decoder reduces # of inputs K = log2 N

(least significant bits) 2D Memory Architecture bit line (BL) 2K-L word line (WL) AL AL+1 Row Address Row Decoder storage (RAM) cell AK-1 M2L A0 Column Address (least significant bits) selects appropriate word from memory row A1 Column Decoder AL-1 Put multiple words in one memory row – splits the decoder into two decoders (row and column) and makes the memory core square reducing the length of the bit lines (but increasing the length of the word lines). The lsb part of the address goes into the column decoder (e.g., 6 bits so that 64 words are assigned to one row (with 32 bits per word gives 2**11 bit line pairs) leaving 14 bits for the row decoder (giving 2**14 word lines) for an not quite square array. The RAM cell needs to be as compact and fast as possible since it is replicated thousands of times in the core array. Often are willing to trade off noise margins, logic swing, input-output isolation, fan-out, or speed for area. To speed things up (and reduce power consumption), don’t force bit lines to swing from rail-to-rail so need sense amplifiers to restore the signal to full rail-to-rail amplitude. This scheme is good only for up to 64 Kb to 256 Kb. For bigger memories it is too SLOW because the word and bit lines are too long. Sense Amplifiers amplifies bit line swing Read/Write Circuits Input/Output (M bits)

3D (or Banked) Memory Architecture Row Addr Column Addr Block Addr A1 A0 1M word memory with 32 bits/word – 2 bit block address; 6 bit column addr giving 2**11 bit line pairs; 12 bit row address giving 2**12 word lines for almost square memory arrays Input/Output (M bits) Advantages: 1. Shorter word and bit lines so faster access 2. Block addr activates only 1 block saving power

2D 4x4 SRAM Memory Bank read precharge bit line precharge enable WL[0] Row Decoder A2 WL[2] WL[3] 2 bit words A0 Column Decoder clocking and control To decrease the bit line delay for reads – use low swing bit lines, i.e., bit lines don’t swing rail-to-rail, but precharge to Vdd (as 1) and discharge to Vdd – 10%Vdd (as 0). (So for 2.5V Vdd, 0 is 2.25V.) Requires sense amplifiers to restore to full swing. Write circuitry – receives full swing from sense amps – or writes full swing to the bit lines sense amplifiers BLi BLi+1 write circuitry

Quartering Gives Shorter WLs and BLs Precharge Circuit Precharge Circuit data Write Circuitry Write Circuitry Sense Amps Row Decoder Sense Amps Ai-1 … A0 Column Decoder Column Decoder In reality, memory is designed in quadrants (hence the 4x growth every generation). This configuration halfs the length of the word and bit lines (for increased speed). Read Precharge Read Precharge AN-1 … Ai

Decreasing Word Line Delay Drive the word line from both sides polysilicon word line metal word line driver WL Use a metal bypass polysilicon word line metal bypass WL driving from both sides reduces the worst case delay of the word line by 4 (like buffer insertion to reduce RC delay) Use silicides

Decreasing Bit Line Delay (and Energy) Reduce the bit line voltage swing need sense amp for each column to sense/restore signal Isolate sense amps from bit lines after sensing (to prevent the sense amps from changing the bit line voltage further) - bit line isolation Isolate memory cells from the bit lines after sensing (to prevent the memory cells from changing the bit line voltage further) - pulsed word line generation of word line pulses very critical too short - sense amp operation may fail too long - power efficiency degraded (because bit line swing size depends on duration of the word line pulse) use feedback signal from bit lines

Bit Line Isolation !BL BL V = 0.1Vdd isolate Read sense amplifier sense amplifier outputs

Pulsed Word Line From Row Decoder !BL BL WL Dummy column cells tied to a fixed value (0) Done (1  0) 10% populated so capacitance is 10% of a regular column Read Done WL BL Dummy BL V = Vdd V = 0.1Vdd Dummy BL has reached full swing and triggers Done ( 0) when regular BLs reach 10% swing

Read Only Memories (ROMs) A memory that can only be read and never altered Programs for fixed applications that once developed and debugged, never need to be changed, only read Fixing the contents at manufacturing time leads to small and fast implementations. WL BL = 1 BL = 0 WL BL = 0 BL = 1

MOS OR ROM Cell Array BL(0) BL(1) BL(2) BL(3) WL(0) VDD WL(1) on on For class handout – skip covering this one in class (would make a good basis for an exam question) WL(3) predischarge

MOS OR ROM Cell Array 1 0 0 1 0 0 0 0 BL(0) BL(1) BL(2) BL(3) WL(0) 1 0 0 1 0 0 0 0 BL(0) BL(1) BL(2) BL(3) WL(0) VDD  1 WL(1) on on WL(2) VDD notice how the overhead of the supply lines are reduced by sharing them between neighboring cells. This requires the mirroring of the odd cells around the horizontal axis, an approach that is extensively used in memory cores of all styles. What are the values of the data stored? Beware of threshold drop. WL(3) predischarge 1  0

Precharged MOS NOR ROM VDD precharge WL(0) enable GND WL(1) A1 Row Decoder A2 WL(2) GND For class handout. WL(3) BL(0) BL(1) BL(2) BL(3)

Precharged MOS NOR ROM VDD  1 precharge WL(0) enable GND WL(1) A1 1  1 precharge WL(0) enable GND WL(1) A1 1 on on  1 Row Decoder A2 WL(2) GND For lecture. Lots of ways to build them, this is just the structure of choice. First precharge all bit lines to 1 (make sure all word lines are inactive (0) – can do with an enabled decoder), then activate appropriate word line, existing fets selectively discharge bit lines low. So, no fet means a 1 is stored, fet means a 0 is stored. Note that there is only one pull down in series in the network so its fast! What is stored in this ROM is the inverse of what is stored in the OR ROM WL(3) BL(0) BL(1) BL(2) BL(3) 1 1 1 1 0 1 1 0

MOS NOR ROM Layout 1 Memory is programmed by adding transistors where needed (ACTIVE mask – early in the fab process) cell size of 9.5  x 7  WL(0) GND WL(1) metal1 on top of diffusion WL(2) GND WL(3)

MOS NOR ROM Layout 2 Memory is programmed by adding contacts where needed (CONTACT mask – one of the last processing steps) the presence of a metal contact creates a 0-cell WL(0) GND WL(1) cell size of 11  x 7  WL(2) Selective addition of metal to diffusion – add contacts to “program”. Wafers can be prefabricated up to the CONTACT mask and stockpiled. Note that ground is run in diffusion (not great) and is connected to a vertical diffusion run underneath the m1 strip. Done for the sake of space. A metal bypass with regularly-spaced straps keeps the voltage drop over the ground wire within bounds. A large part of the cell is devoted to the bit line contact and the ground connection. Pull-down fets are 4/2 transistors GND WL(3)

Transient Model for 512x512 NOR ROM precharge metal1 rword BL poly Cbit WL cword Word line parasitics (distributed RC model) Resistance/cell: 17.5 Wire capacitance/cell: 0.049 fF Gate capacitance/cell: 0.75 fF Transient response – time from word line activation to bit line traversing voltage swing (typically 10% of Vdd). Most of the delay is attributable to interconnect parasitics. WL is best modeled as a distributed RC line since its implemented in poly with a relatively high sheet resistance (silicided poly would be advisable). BL is implemented in metal1, so can use capacitive model and all capacitive loads can be lumped into a single element. Bit line parasitics (lumped C model) Resistance/cell: 0.275 (which is negligible) Wire capacitance/cell: 0.09 fF Drain capacitance/cell: 0.8 fF

Propagation Delay of 512x512 NOR ROM Word line delay Delay of a distributed rc-line containing M cells tword = 0.38(rword x cword) M2 = 0.38 (17.5  x (0.049 + 0.75) fF) 5122 = 1.4 nsec Bit line delay Assuming (0.5/0.25) pull-down and (1.3125/0.25) pull-up with reduced swing bit lines (2.5V to 1.5V) Cbit = 512 x (0.8 + 0.09) fF = 0.46 pF tHL = 0.69 (13 k/2 || 31k/5.25) 0.46 pF = 0.98 nsec (and tLH = 0.69 (31k/5.25) 0.46 pF = 1.87 nsec) word line delay is factor of M**2 – quadratic term (since no buffers in word line) so that tword = 1.4 nsec. Word line delay dominates due to the large resistance of the poly wire. Driving the line from both sides and using metal bypass lines (global word lines) would help speed it up. But the most effective approach is to carefully partition the memory into sub-blocks of adequate size that balance word and bit-line delay. Partitioning also helps to reduce the energy consumption attributed to driving and switching the word lines. The bit-line delay can be reduced as well by further reducing the voltage swing on the BL and letting the SA restor the output signal to full swing. Voltage swings of around 0.5V are quite common.

Nonvolatile Read-Write Memories (NVRWM) UV (ultraviolet light exposure), FN (Fowler-Nordheim tunneling), Hot e (avalanche hot-electron-injection); VDD = 3.3 or 5V; VPP = 12 or 12.5V # Trans /Cell Cell Area Mechanism Power Supply Program/ Erase Cycles Erase Write Read MASK ROM 1T 0.35-5 VDD EPROM 1 UV Hot e VPP ~100 EEPROM 2T 3-5 FN VPP (int) 104 -105 FLASH 1-2

Next Lecture and Reminders SRAM, DRAM, and CAM cores Reading assignment – Rabaey, et al, 12.2.3-12.2.4 Reminders HW#5 will (optional) due December 2nd Project final reports due December 4th Final grading negotiations/correction (except for the final exam) must be concluded by December 10th Final exam scheduled Tuesday, December 16th from 10:10 to noon in 118 and 113 Thomas