CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
Sistemi Elettronici Programmabili1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Memorie (vedi anche i file pcs1_memorie.pdf.
Digital Integrated Circuits A Design Perspective
Budapest University of Technology and Economics Department of Electron Devices Microelectronics, BSc course MOS circuits: basic construction.
1 Pertemuan 13 Memory Matakuliah: H0362/Very Large Scale Integrated Circuits Tahun: 2005 Versi: versi/01.
Introduction to CMOS VLSI Design Lecture 13: SRAM
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Memory See: P&H Appendix C.8, C.9.

CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
ECE 301 – Digital Electronics Memory (Lecture #21)
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
Chapter 10. Memory, CPLDs, and FPGAs
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES.
11/03/05ELEC / Lecture 181 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
1 Lecture 16B Memories. 2 Memories in General Computers have mostly RAM ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 32: Array Subsystems (DRAM/ROM) Prof. Sherief Reda Division of Engineering,
Introduction to CMOS VLSI Design SRAM/DRAM
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Spring 07, Feb 27 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Consumption in a Memory Vishwani D. Agrawal.
Chapter 5 Internal Memory
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Memories: –ROM; –SRAM; –DRAM. n PLAs.
Lecture 19: SRAM.
1 Lecture 16B Memories. 2 Memories in General RAM - the predominant memory ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
1 EE365 Read-only memories Static read/write memories Dynamic read/write memories.
Parts from Lecture 9: SRAM Parts from
55:035 Computer Architecture and Organization
Semiconductor Memories Lecture 1: May 10, 2006 EE Summer Camp Abhinav Agarwal.
Lecture on Electronic Memories. What Is Electronic Memory? Electronic device that stores digital information Types –Volatile v. non-volatile –Static v.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
1 Microprocessor-based systems Course 6 Memory design.
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Semiconductor Memories.  Semiconductor memory is an electronic data storage device, often used as computer memory, implemented on a semiconductor-based.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
Memory Semiconductor Memory Classification ETEG 431 SG Size: Bits, Bytes, Words. Timing Parameter: Read, Write Cycle… Function: ROM, RWM, Volatile, Static,
Digital Design: Principles and Practices
Digital Logic Design Instructor: Kasım Sinan YILDIRIM
Advanced VLSI Design Unit 06: SRAM
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
ECE 300 Advanced VLSI Design Fall 2006 Lecture 19: Memories
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
Chapter 10 Memories Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 7, 2005.
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
Budapest University of Technology and Economics Department of Electron Devices Microelectronics, BSc course MOS circuits: basic construction.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition,
Introduction to Computer Organization and Architecture Lecture 7 By Juthawut Chantharamalee wut_cha/home.htm.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Sp09 CMPEN 411 L21 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey’s Digital Integrated Circuits,
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
Computer Architecture Chapter (5): Internal Memory
CSE477 L25 Memory Peripheral.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 25: Peripheral Memory Circuits Mary Jane Irwin (
Prof. Hsien-Hsin Sean Lee
CSE477 VLSI Digital Circuits Fall Lecture 23: Semiconductor Memories
Chapter 5 Internal Memory
Mary Jane Irwin ( ) CSE477 VLSI Digital Circuits Fall 2002 Lecture 25: Peripheral Memory Circuits Mary Jane.
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
Semiconductor Memories
William Stallings Computer Organization and Architecture 8th Edition
DIICD Class 13 Memories.
Presentation transcript:

CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

Memory Definitions Size – Kbytes, Mbytes, Gbytes, Tbytes Speed Read Access – delay between read request and the data available Write Access – delay between write request and the writing of the data into the memory (Read or Write) Cycle - minimum time required between successive reads or writes Read Cycle Read Write Cycle Read Access Read Access Write Write Setup Write Access Data Data Valid Data Written

A Typical Memory Hierarchy By taking advantage of the principle of locality, we can present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology. On-Chip Components Control eDRAM Secondary Memory (Disk) Cache Instr Second Level Cache (SRAM) ITLB Main Memory (DRAM) Datapath Cache Data RegFile DTLB The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). What makes this work is one of the most important principle in computer design - the principle of locality. While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. Speed (ns): .1’s 1’s 10’s 100’s 1,000’s Size (bytes): 100’s K’s 10K’s M’s T’s Cost: highest lowest

More Memory Definitions Function – functionality, nature of the storage mechanism static and dynamic; volatile and nonvolatile (NV); read only (ROM) Access pattern – random, serial, content addressable Read Write Memories (RWM) NVRWM ROM Random Access Non-Random Access EPROM Mask-prog. ROM SRAM (cache, register file) DRAM (main memory) CAM FIFO, LIFO Shift Register EEPROM FLASH Electrically- prog. PROM Input-output architecture – number of data input and output ports (multiported memories) Application – embedded, secondary, tertiary

Random Access Read Write Memories SRAM – Static Random Access Memory data is stored as long as supply is applied large cells (6 fets/cell) – so fewer bits/chip fast – so used where speed is important (e.g., caches) differential outputs (output BL and !BL) use sense amps for performance compatible with CMOS technology DRAM - Dynamic Random Access Memory periodic refresh required (every 1 to 4 ms) to compensate for the charge loss caused by leakage small cells (1 to 3 fets/cell) – so more bits/chip slower – so used for main memories single ended output (output BL only) need sense amps for correct operation not typically compatible with CMOS technology

Evolution in DRAM Chip Capacity human memory human DNA 4X growth every 3 years! 0.07 m 0.1 m 0.13 m encyclopedia 2 hrs CD audio 30 sec HDTV book 0.18-0.25 m 0.35-0.4 m 0.5-0.6 m 0.7-0.8 m 1.0-1.2 m 1.6-2.4 m page

Memory Timing: Approaches DRAM Timing Multiplexed Adressing SRAM Timing Self-timed

6-transistor SRAM Storage Cell WL M2 M4 Q M6 M5 !Q M1 M3 Note that it is identical to the register cell from static sequential circuit - cross-coupled inverters Consumes power only when switching - no standby power (other than leakage) is consumed The major job of the pullups is to replenish loss due to leakage Sizing of the transistors is critical !BL BL Will cover how the cell works in detail in the next lecture

Decoder reduces # of inputs 1D Memory Architecture Word 0 Word 1 Word 2 Word N-1 Word N-2 Storage Cell M bits N words S0 S1 S2 S3 SN-2 SN-1 Input/Output Word 0 Word 1 Word 2 Word N-1 Word N-2 Storage Cell M bits S0 S1 S2 S3 SN-2 SN-1 Input/Output A0 A1 Ak-1 Decoder Only one select line active at a time. E.g., N= 10**6 = 2 **20 (1 Mword) means 1 million select signals By adding decoder reduce number of inputs from 1 million to 20 (address lines). Note, still have to generate 1 million select lines with a very biggggg decoder (see last lecture) Scheme on right, while reducing #inputs, leads to very tall and narrow memories (and very slow because of very long bit lines). Also very big (and slow) address decoder (good to try to pitch match between the decoder and the memory core). N words  N select signals Decoder reduces # of inputs K = log2 N

(least significant bits) 2D Memory Architecture bit line (BL) 2K-L word line (WL) AL AL+1 Row Address Row Decoder storage (RAM) cell AK-1 M2L A0 Column Address (least significant bits) selects appropriate word from memory row A1 Column Decoder AL-1 Sense Amplifiers amplifies bit line swing Put multiple words in one memory row – splits the decoder into two decoders (row and column) and makes the memory core square reducing the length of the bit lines (but increasing the length of the word lines). The lsb part of the address goes into the column decoder (e.g., 6 bits so that 64 words are assigned to one row (with 32 bits per word gives 2**11 bit line pairs) leaving 14 bits for the row decoder (giving 2**14 word lines) for an not quite square array. The RAM cell needs to be as compact and fast as possible since it is replicated thousands of times in the core array. Often are willing to trade off noise margins, logic swing, input-output isolation, fan-out, or speed for area. To speed things up (and reduce power consumption), don’t force bit lines to swing from rail-to-rail so need sense amplifiers to restore the signal to full rail-to-rail amplitude. This scheme is good only for up to 64 Kb to 256 Kb. For bigger memories it is too SLOW because the word and bit lines are too long. Read/Write Circuits Input/Output (M bits)

3D (or Banked) Memory Architecture Row Addr Column Addr Block Addr A1 A0 1M word memory with 32 bits/word – 2 bit block address; 6 bit column addr giving 2**11 bit line pairs; 12 bit row address giving 2**12 word lines for almost square memory arrays Input/Output (M bits) Advantages: 1. Shorter word and bit lines so faster access 2. Block addr activates only 1 block saving power

2D 4x4 SRAM Memory Bank read precharge bit line precharge enable WL[0] Row Decoder A2 WL[2] WL[3] 2 bit words A0 Column Decoder clocking and control To decrease the bit line delay for reads – use low swing bit lines, i.e., bit lines don’t swing rail-to-rail, but precharge to Vdd (as 1) and discharge to Vdd – 10%Vdd (as 0). (So for 2.5V Vdd, 0 is 2.25V.) Requires sense amplifiers to restore to full swing. Write circuitry – receives full swing from sense amps – or writes full swing to the bit lines sense amplifiers BLi BLi+1 write circuitry

Quartering Gives Shorter WLs and BLs Precharge Circuit Precharge Circuit data Write Circuitry Write Circuitry Sense Amps Row Decoder Sense Amps Ai-1 … A0 Column Decoder Column Decoder In reality, memory is designed in quadrants (hence the 4x growth every generation). This configuration halfs the length of the word and bit lines (for increased speed). Read Precharge Read Precharge AN-1 … Ai

Decreasing Word Line Delay Drive the word line from both sides polysilicon word line metal word line driver WL Use a metal bypass polysilicon word line metal bypass WL driving from both sides reduces the worst case delay of the word line by 4 (like buffer insertion to reduce RC delay) Use silicides

Read Only Memories (ROMs) A memory that can only be read and never altered Programs for fixed applications that once developed and debugged, never need to be changed, only read Fixing the contents at manufacturing time leads to small and fast implementations. WL BL = 1 BL = 0 WL BL = 0 BL = 1

MOS OR ROM Cell Array BL(0) BL(1) BL(2) BL(3) WL(0) VDD WL(1) on on For class handout – skip covering this one in class (would make a good basis for an exam question) WL(3) predischarge

MOS OR ROM Cell Array 1 0 0 1 0 0 0 0 BL(0) BL(1) BL(2) BL(3) WL(0) 1 0 0 1 0 0 0 0 BL(0) BL(1) BL(2) BL(3) WL(0) VDD  1 WL(1) on on WL(2) VDD notice how the overhead of the supply lines are reduced by sharing them between neighboring cells. This requires the mirroring of the odd cells around the horizontal axis, an approach that is extensively used in memory cores of all styles. What are the values of the data stored? Beware of threshold drop. WL(3) predischarge 1  0

Precharged MOS NOR ROM VDD precharge WL(0) enable GND WL(1) A1 Row Decoder A2 WL(2) GND For class handout. WL(3) BL(0) BL(1) BL(2) BL(3)

Precharged MOS NOR ROM VDD  1 precharge WL(0) enable GND WL(1) A1 1  1 precharge WL(0) enable GND WL(1) A1 1 on on  1 Row Decoder A2 WL(2) GND For lecture. Lots of ways to build them, this is just the structure of choice. First precharge all bit lines to 1 (make sure all word lines are inactive (0) – can do with an enabled decoder), then activate appropriate word line, existing fets selectively discharge bit lines low. So, no fet means a 1 is stored, fet means a 0 is stored. Note that there is only one pull down in series in the network so its fast! What is stored in this ROM is the inverse of what is stored in the OR ROM WL(3) BL(0) BL(1) BL(2) BL(3) 1 1 1 1 0 1 1 0

MOS NOR ROM Layout 1 Memory is programmed by adding transistors where needed (ACTIVE mask – early in the fab process) cell size of 9.5  x 7  WL(0) GND WL(1) metal1 on top of diffusion WL(2) Each cell has diffusion run under the metal 1 to GND WL(3)

MOS NOR ROM Layout 2 Memory is programmed by adding contacts where needed (CONTACT mask – one of the last processing steps) All transistors are fabricated the presence of a metal contact creates a 0-cell WL(0) GND WL(1) cell size of 11  x 7  WL(2) Selective addition of metal to diffusion – add contacts to “program”. Wafers can be prefabricated up to the CONTACT mask and stockpiled. Note that ground is run in diffusion (not great) and is connected to a vertical diffusion run underneath the m1 strip. Done for the sake of space. A metal bypass with regularly-spaced straps keeps the voltage drop over the ground wire within bounds. A large part of the cell is devoted to the bit line contact and the ground connection. Pull-down fets are 4/2 transistors GND WL(3)

MOS NAND ROM V DD Pull-up devices BL [0] BL [1] BL [2] BL [3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row

Programmming using the Metal-1 Layer Only MOS NAND ROM Layout Cell (8l x 7l) Programmming using the Metal-1 Layer Only No contact to VDD or GND necessary; Loss in performance compared to NOR ROM drastically reduced cell size Polysilicon Diffusion Metal1 on Diffusion

Programmming using Implants Only NAND ROM Layout Cell (5l x 6l) A threshold lowering implant using n-type impurities turns the device to be always on ->short… Polysilicon Threshold-altering implant Metal1 on Diffusion

Transient Model for 512x512 NOR ROM precharge metal1 rword BL poly Cbit WL cword Word line parasitics (distributed RC model) Resistance/cell: 17.5 Wire capacitance/cell: 0.049 fF Gate capacitance/cell: 0.75 fF Transient response – time from word line activation to bit line traversing voltage swing (typically 10% of Vdd). Most of the delay is attributable to interconnect parasitics. WL is best modeled as a distributed RC line since its implemented in poly with a relatively high sheet resistance (silicided poly would be advisable). BL is implemented in metal1, so can use capacitive model and all capacitive loads can be lumped into a single element. Bit line parasitics (lumped C model) Resistance/cell: 0.275 (which is negligible) Wire capacitance/cell: 0.09 fF Drain capacitance/cell: 0.8 fF

Transient Model for 512x512 MOS NAND ROM V DD Model for NAND ROM BL C r L bit c r bit WL word Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates Drain/Source and complete gate capacitance c word Bit line parasitics Resistance/cell: 8.7K (compared to 0.275 in NOR) Speed: NOR: TLH=1.87 ns TLH= 1.2 us

Next Lecture and Reminders SRAM, DRAM, and CAM cores Reading assignment – Rabaey, et al, 12.1-12.2.4