DIICD Class 13 Memories.

Slides:



Advertisements
Similar presentations
COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
Advertisements

Chapter 5 Internal Memory
Computer Organization and Architecture
Computer Organization and Architecture
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Sistemi Elettronici Programmabili1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Memorie (vedi anche i file pcs1_memorie.pdf.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Digital Integrated Circuits A Design Perspective
Elettronica T AA Digital Integrated Circuits © Prentice Hall 2003 SRAM & DRAM.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Memory See: P&H Appendix C.8, C.9.

11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 32: Array Subsystems (DRAM/ROM) Prof. Sherief Reda Division of Engineering,
Overview Memory definitions Random Access Memory (RAM)
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Lecture 4: Computer Memory
Chapter 5 Internal Memory
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Semiconductor Memories Lecture 1: May 10, 2006 EE Summer Camp Abhinav Agarwal.
Lecture on Electronic Memories. What Is Electronic Memory? Electronic device that stores digital information Types –Volatile v. non-volatile –Static v.
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES Adapted from Jan Rabaey's IC Design. Copyright 1996 UCB.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Latches and flip-flops. n RAMs and ROMs.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
CMOS Layout poly diffusion side view top view metal cuts
Digital Design: Principles and Practices
Digital Logic Design Instructor: Kasım Sinan YILDIRIM
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
 Seattle Pacific University EE Logic System DesignMemory-1 Memories Memories store large amounts of digital data Each bit represented by a single.
Washington State University
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
07/11/2005 Register File Design and Memory Design Presentation E CSE : Introduction to Computer Architecture Slides by Gojko Babić.
Computer Architecture Chapter (5): Internal Memory
Prof. Hsien-Hsin Sean Lee
Lecture 3. Lateches, Flip Flops, and Memory
Digital Logic Design Alex Bronstein Lecture 3: Memory and Buses.
Memories.
Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 7th Edition
Lecture 19: SRAM.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Internal Memory.
Morgan Kaufmann Publishers
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
MOS Memory and Storage Circuits
Memory Units Memories store data in units from one to eight bits. The most common unit is the byte, which by definition is 8 bits. Computer memories are.
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
Hakim Weatherspoon CS 3410 Computer Science Cornell University
EE345: Introduction to Microcontrollers Memory
William Stallings Computer Organization and Architecture 7th Edition
Digital Integrated Circuits A Design Perspective
William Stallings Computer Organization and Architecture 8th Edition
Design Technologies Custom Std Cell Performance Gate Array FPGA Cost.
BIC 10503: COMPUTER ARCHITECTURE
Memory.
Semiconductor Memories
Electronics for Physicists
Chapter 8 MOS Memory and Storage Circuits
Overview Last lecture Digital hardware systems Today
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Chapter 4 Internal Memory
William Stallings Computer Organization and Architecture 8th Edition
Prof. Hakim Weatherspoon
Presentation transcript:

DIICD Class 13 Memories

Registers Used for storing data Structure Register files N-bit wide Parallel/serial read/write Clocked Static/dynamic implementation Register files Multiple read/write ports possible Example: 32-bit wide by 16-bit deep, dual-port parallel read, single port parallel write register file 32 bits . . . 16 words 32 [©Hauck] Spring 2006

Implementing Registers Using Logic Gates Flip-flops Simple SR latch: JK, D, T Clocked Master-slave (edge-triggered) S S R Q Q’ 1 1 Q Q’ 1 0 0 1 0 1 1 0 0 0 x x Q S Q R Q Q R Spring 2006

Implementing Registers in CMOS Direct gate implementation too costly A master-slave JK flip-flop uses 38 CMOS transistors Directly implement in transistors Example: clocked SR FF Q Q Q f S R Note: carefully size the S, R and f transistors so that we can write [Rab96] p.342 Spring 2006

Implementing Registers in CMOS (cont.) Another example: D latch (register) Uses transmission gate When “WR” asserted, “write” operation will take place Stack D latch structures to get n-bit register WR Q Q D WR WR WR Spring 2006

Memory (Array) Design Array of bits Area very important Memory takes considerable area in processor chips Compaction results in fewer memory chip modules, more on-chip cache Timing and power consumption of memory blocks have significant impact on the system Different types RAM (SRAM, DRAM, CAM) ROM (PROM, EEPROM, FLASH) Spring 2006

Memory Design (cont.) Static vs. dynamic RAM Static (SRAM) Dynamic needs refreshing Refreshing: read, then write back to restore charge Either periodically or after each read Static (SRAM) Data stored as long as supply voltage is applied Large (6 transistors/cell) Fast Dynamic (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Special fabrication process [©Prentice Hall] Spring 2006

Memory Architecture: the Big Picture Address: which one of the M words to access Data: the N bits of the word are read/written Word 0 Word 1 Word M-2 Word M-1 . . . N bits S0 S1 S2 SM-2 SM-1 ... Decoder A0 A1 Ak-1 Address decoder S0 Word 0 S1 Word 1 Storage cells S2 ... . . . SM-2 Word M-2 SM-1 Word M-1 word select lines N bits Spring 2006

Memory Access Timing: the Big Picture Send address on the address lines, wait for the word line to become stable Read/write data on the data lines Read Cycle READ Write Cycle Read Access Read Access WRITE Write Access Data Valid Data Written DATA [©Prentice Hall] Spring 2006 EE 5324 - VLSI Design II - © Kia Bazargan

Memory Cell: Static RAM (8 transistors) 8-transistor cell Bit_i is the data bus Sj is the word line Sj biti Bit’ used to reduce delay Bus drivers Sense Amplifier (inverter with high gain) used for fast switching Make sure inverters in cell are weaker than the combination of “write buffer” and pass transistor Rd/WR Very big driver [©Hauck] Spring 2006

Memory Cell: Static RAM (6 transistors) 6-transistor cell Must adjust inverters for input coming through n-type pass gate Bus drivers Must adjust senseAmp for input coming through n-type pass gate Harder to drive 1 than 0 through write buffer (high resistance via n-transistor) One side is sending 0 anyway (bit or bit’)  written correctly biti biti (BL) Sj (WL) Rd/WR [©Hauck] Spring 2006

6-Transistor SRAM Cell: Layout WL is word line (select line Sj) BL is bit line (biti) VDD M2 M4 BL M5 M1 M3 M6 Q M4 Vdd WL M2 Q Q M1 M3 GND M5 M6 WL BL BL [Rab96] p.578 [©Prentice Hall] Spring 2006

6-Transistor Memory Array 8 words deep RAM, 2 bits wide words To write to word j: Set Sj=1, all other S lines to 0 Send data on the global bit0, bit0’, bit1, bit1’ To read word k: Set Sk=1, all other S lines to 0 Sense data on bit0 and bit1. bit0 bit0 bit1 bit1 S0 S1 S7 Rd/WR Rd/WR bit0 bit0 bit1 bit1 Spring 2006

Dynamic RAM 4-Transistor Cell Dynamic charge storage must be refreshed Dedicated busses for reading and writing data in data out WR Keeps its value for about 1ms Rd [©Hauck] Spring 2006

Dynamic RAM 3-Transistor Cell Precharge data_out’ to generate ‘1’ outputs precharge 3-transistor cell No p-type transistors yield a very compact layout for cell No Vdd connection Sense Amplifier must be able to quickly detect dropping voltage data in data out WR Rd [©Hauck] Spring 2006

Dynamic RAM 3-Transistor Cell: Layout Din Dout GND Dout Din Rd M3 WR M3 M2 M2 M1 WR M1 Rd [Rab96] p.586 [©Prentice Hall] Spring 2006

Dynamic RAM 1-Transistor Cell http://www.ece.umn.edu/users/kia/Courses/EE5324 Dynamic RAM 1-Transistor Cell 1-transistor cell Storage capacitor is source of cell transistor Special processing steps to make the storage capacitor large Charge sharing with bus capacitance (Ccell << Cbus) Extra demand on sense amplifier to detect small changes Destructive read (must write immediately) Precharge to middle voltage level Bi Si (WL) Storage capacitor Note that the storage capacitor shown is not a physical capacitor (we just use the capacitor notation to show that it keeps electric charges). The “storage capacitor” is implemented as a huge piece of metal (usually a hole in the substrate). There is no connection to the ground metal lines (again, the capacitor/ground is just a convention) Also, there is no physical capacitor connected to the bus. The fact that the bus is a long piece of wire makes it behave as a capacitor [©Hauck] Spring 2006 VLSI Design II – © Kia Bazargan

Dynamic RAM 1-Transistor Cell: Observations DRAM memory cell is single-ended Read operation is destructive Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design Polysilicon-diffusion plate capacitor Trench or stacked capacitor When writing a “1” into a DRAM cell, a threshold voltage is lost Set WL to a higher value than Vdd [©Prentice Hall] Spring 2006

Dynamic RAM 1-Transistor Cell: Layout Capacitor Metal word line M1 word SiO2 line poly poly Field Oxide n+ n+ Inversion layer Diffused induced by bit line plate bias Polysilicon plate Polysilicon gate (a) Cross-section (b) Layout Used Polysilicon-Diffusion Capacitance Expensive in Area [©Prentice Hall] Spring 2006

RAM Cells: Summary Static Dynamic Fastest (no refresh) Simple design Right solution for small memory arrays such as register files Dynamic Densest: 1T is best and is the way to go for large memory arrays Built-in circuitry to step through cells and refresh (can do more than one word at a time) Sense amplifier needed for fast read operation [©Hauck] Spring 2006

Multi-Port RAM Cells Idea: add more input and output transistors bus_B bus_A row-bus_A row-bus_B bus_B bus_A Idea: add more input and output transistors Can be applied to all variants Usually not done for 1T cells [©Hauck] Spring 2006

ROM Cell: MOS NAND Pullup devices BL0 BL1 BL2 BL3 WL0 WL1 WL2 WL3 All word lines high by default with exception of selected row [©Prentice Hall] Spring 2006

Memory Cell Array Interface: Example 1 S0 Memory parameters: 16-bit wide 1024-word deep Accessing word 9 Address = 00000010012 Word 0 S1 Word 1 S2 Word 2 1 … A0 . . . . A1 S9 A2 Word 9 Decoder A3 ... ... A9 S1022 Word 1022 S1023 Word 1023 16 bits SenseAmp / Drivers 16 bits Spring 2006

Memory Cell Array Layout S0 Memory performance (speed) Storage cell speed (read, write) Data bus capacitance Periphery: address decoders, sense amplifiers, buffers Memory area Cell array layout How to layout the cells array? Linear is bad: Long data busses  large capacity A lot of cells connected to data bus Decoder will have a lot of logic levels Word 0 Word M-2 Word M-1 . . . Word 1 Word 2 S1 S2 A0 A1 Decoder ... ... Ak-1 SM-2 SM-1 N bits SenseAmp / Drivers N bits Spring 2006

Memory Cell Array Layout (cont.) http://www.ece.umn.edu/users/kia/Courses/EE5324 Memory Cell Array Layout (cont.) Group the M words into M/L rows, each containing L words Benefits? S0..L-1 Word 0 Word 1 . . . Word L-1 SL..2L-1 address: L bits k-L bits Alog L Word L Word L+1 . . . Word 2L-1 S2L..3L-1 Word 2L Word 2L+1 . . . Word 3L-1 Alog L+1 . . . . . . . . . . . . Row Decoder ... Ak-1 SM-L..M-1 Word M-L . . . . . . Word M-1 Address decoding is faster Row (column) decoder has less inputs, hence simpler structure Row and column address decoding could be done in parallel Row and column address decoding could be done separately (so, if you need to access two words in a row, send the row address once only) Power consumption? Bit line capacitance? Length? N bits N bits . . . N bits SAmp/Drv SAmp/Drv . . . SAmp/Drv N bits N bits . . . N bits Column Decoder + MUX N bits A0 Alog L-1 . . . Spring 2006 VLSI Design II – © Kia Bazargan

Memory Cell Array Access Example word=16-bit wide(N), row=8 words(L), address=10 bits (k) Accessing word 9= 00000010012 L=8 words S0..7 S8..15 Word 0 Word 1 . . . Word 7 M/L = 1024/8= 128 rows A3 Word 8 Word 9 . . . Word 15 1 … S16..23 Word 16 Word 17 . . . Word 23 A4 Row Decoder . . . . . . . . . . . . ... A9 S1016-1023 Word 1016 . . . . . . Word 1023 16 bits 16 bits . . . 16 bits SAmp/Drv SAmp/Drv . . . SAmp/Drv 16 bits 16 bits . . . 16 bits A0 Column Decoder + MUX A1 A2 16 bits Spring 2006

Hierarchical Memory Structure Taking the idea one step further Shorter wires within each block Enable only one block addr decoder power savings Row Address Column Address Blk EN Block Address Blk EN Blk EN Blk EN Global Bus SAmp/ Drv Global drivers/ sense amplifiers [Rab96] p. 558 Spring 2006

Decreasing Word Line Delay Word line delay comes into play! We used to have long busses, made 2D array  shorter busses But, longer word lines! How to decrease the delay on the word lines? Break the word line by inserting buffers Place the decoder in the middle Polysilicon word line Polysilicon word line Metal word line Metal bypass (a) Drive the word line from both sides (b) Use metal bypass [©Prentice Hall] Spring 2006

Decreasing Word Line Delay (cont.) Place the decoder in the middle Add buffers to outputs of decoder d e c o d e r memory cell array memory cell array k Address lines [©Hauck] Spring 2006

Row Decoder Implementation Collection of 2k high fan-in (k inputs) logic gates Regular and dense structure N(AND) decoder NOR decoder WL0 = A0.A1.A2.A3.A4.A5.A6.A7.A8.A9 WL511 = A0.A1.A2.A3.A4.A5.A6.A7.A8.A9 WL0 = A0+A1+A2+A3+A4+A5+A6+A7+A8+A9 WL511 = A0+A1+A2+A3+A4+A5+A6+A7+A8+A9 [©Prentice Hall] Spring 2006

Row Decoder Implementation (cont.) Precharge devices GND GND WL3 WL3 WL2 WL2 WL1 WL1 WL0 WL0 f f Vdd A0 A0 A1 A1 A0 A0 A1 A1 Dynamic 2-to-4 NOR Decoder 2-to-4 MOS Dynamic NAND Decoder [©Prentice Hall] Spring 2006

Row Decoder Implementation (cont.) WL1 WL0 A0A1 A0A1 A0A1 A0A1 A2A3 A2A3 A2A3 A2A3 A0 A1 A1 A0 A2 A3 A3 A2 Splitting decoder into two or more logic layers produces a faster and cheaper implementation [©Prentice Hall] Spring 2006

Column Multiplexers: Tree-Based Decoder Route many inputs to a single output Inputs come from different words, same bit position Series transistors are slow On the critical path too Area? One-bit very small, but have to repeat the “decoding” for all bit positions. Bit position i of L word columns w0 w1 w2 w3 w4 w5 w6 w7 A0' A0 A1' A1 A2' A2 Bit position i [©Hauck] Spring 2006

Column Multiplexers: Faster Implementation Decode address into one-hot signals Each bit passes through single n-device or pass gate Column decoding done in parallel w/ row decoding Decoder A0 A1 . . . Alog L-1 bit 0 bit 1 [©Hauck] Spring 2006

Content Addressable Memory (CAM) Instead of address, provide data  find a match Applications: cache, physical particle collider Needs “Encoder”: Inverse function of decoder Take a one-hot collection of signals and encode them m bits e n c o d e r content addressable memory cell array 2n rows m n [©Hauck] Spring 2006

Content Addressable Memory Cell Read and write like normal 6T memory cell Match signal is precharged to 1, pulled to 0 if no match Send data on bit’ and data’ on bit for matching Match remains 1 iff all bits in word match row select match bit bit' [©Hauck] Spring 2006

Encoders e n c o d e r content addressable memory cell array row select match bit bit' Spring 2006

Non-Volatile Memory Cells Programmable after fabrication Keep their configuration even after the supply voltage is disconnected Basic idea: Use a floating strip of polysilicon between the substrate and the gate Put charges on the floating gate Increase threshold voltage  disable the device Different types based on the erasure method Spring 2006

Floating-Gate Transistor (FAMOS) D Source Drain t G ox t ox S + p + n n Substrate (a) Device cross-section (b) Schematic symbol [©Prentice Hall] Spring 2006

Characteristics of Some NVM Cells Spring 2006