Digital Integrated Circuits A Design Perspective

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Computer Organization and Architecture
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
Sistemi Elettronici Programmabili1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Memorie (vedi anche i file pcs1_memorie.pdf.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Digital Integrated Circuits A Design Perspective
Elettronica T AA Digital Integrated Circuits © Prentice Hall 2003 SRAM & DRAM.
Digital Integrated Circuits A Design Perspective
Introduction to CMOS VLSI Design Lecture 13: SRAM
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 25 - Subsystem.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 32: Array Subsystems (DRAM/ROM) Prof. Sherief Reda Division of Engineering,
Introduction to CMOS VLSI Design SRAM/DRAM
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Lecture 19: SRAM.
Parts from Lecture 9: SRAM Parts from
55:035 Computer Architecture and Organization
Semiconductor Memories Lecture 1: May 10, 2006 EE Summer Camp Abhinav Agarwal.
12/1/2004EE 42 fall 2004 lecture 381 Lecture #38: Memory (2) Last lecture: –Memory Architecture –Static Ram This lecture –Dynamic Ram –E 2 memory.
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Semiconductor Memories.  Semiconductor memory is an electronic data storage device, often used as computer memory, implemented on a semiconductor-based.
© Digital Integrated Circuits 2nd Memories Digital Integrated Circuits A Design Perspective SemiconductorMemories Jan M. Rabaey Anantha Chandrakasan Borivoje.
Semiconductor Memories Mohammad Sharifkhani. Outline Introduction Non-volatile memories.
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES Adapted from Jan Rabaey's IC Design. Copyright 1996 UCB.
© Digital Integrated Circuits 2nd Memories Digital Integrated Circuits A Design Perspective SemiconductorMemories Jan M. Rabaey Anantha Chandrakasan Borivoje.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Latches and flip-flops. n RAMs and ROMs.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
Memory Semiconductor Memory Classification ETEG 431 SG Size: Bits, Bytes, Words. Timing Parameter: Read, Write Cycle… Function: ROM, RWM, Volatile, Static,
Digital Design: Principles and Practices
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
ECE 300 Advanced VLSI Design Fall 2006 Lecture 19: Memories
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
Reading Assignment: Chapter 10 of Rabaey Chapter 8.3 of Weste
Chapter 10 Memories Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 7, 2005.
Chapter 6: Internal Memory Computer Architecture Chapter 6 : Internal Memory Memory Processor Input/Output.
Washington State University
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
© Digital Integrated Circuits 2nd Memories Digital Integrated Circuits A Design Perspective SemiconductorMemories Jan M. Rabaey Anantha Chandrakasan Borivoje.
Washington State University
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM
1 Semiconductor Memories. 2 Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory EPROM E 2 PROM FLASH.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition,
Introduction to Computer Organization and Architecture Lecture 7 By Juthawut Chantharamalee wut_cha/home.htm.
Semiconductor Memory Types
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE 534 summer 2004 University of South Alabama EE534 VLSI Design System summer 2004 Lecture 14:Chapter 10 Semiconductors memories.
Memory (Contd..) Memory Timing: Definitions ETEG 431 SG.
Chapter 5 Internal Memory. contents  Semiconductor main memory - organisation - organisation - DRAM and SRAM - DRAM and SRAM - types of ROM - types of.
Computer Architecture Chapter (5): Internal Memory
CSE477 L25 Memory Peripheral.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 25: Peripheral Memory Circuits Mary Jane Irwin (
EE586 VLSI Design Partha Pande School of EECS Washington State University
Norhayati Soin 06 KEEE 4426 WEEK 15/1 6/04/2006 CHAPTER 6 Semiconductor Memories.
William Stallings Computer Organization and Architecture 7th Edition
MOS Memory and Storage Circuits
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 7th Edition
Digital Integrated Circuits A Design Perspective
William Stallings Computer Organization and Architecture 8th Edition
Memory.
Semiconductor Memories
William Stallings Computer Organization and Architecture 8th Edition
Presentation transcript:

Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Semiconductor Memories December 20, 2002

Chapter Overview Memory Classification Memory Architectures The Memory Core Periphery Reliability Case Studies

Semiconductor Memory Classification Non-Volatile Read-Write Memory Read-Write Memory Read-Only Memory Random Non-Random EPROM Mask-Programmed Access Access 2 E PROM Programmable (PROM) SRAM FIFO FLASH LIFO DRAM Shift Register CAM

Memory Timing: Definitions Read-access is from read request time to the time data is available on the output Write-access is from the write request to the final writing of the input data to the memory

Memory Architecture: Decoders bits M bits S S Word 0 Word 0 S 1 Word 1 A Word 1 S 2 Storage Storage Word 2 A Word 2 cell 1 cell N words Decoder A S K 2 1 N 2 2 Word N 2 2 Word N - 2 S N 2 1 Word N 2 1 Word N - 1 K = log N 2 Input-Output Input-Output ( M bits) ( M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals K = log 2 N Decoder reduces the number of selected signals

Array-Structured Memory Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH Amplify swing to rail-to-rail amplitude Selects appropriate word

Hierarchical Memory Architecture Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings

Block Diagram of 4 Mbit SRAM Clock generator CS, WE buffer I/O Y -address X x1/x4 controller Z Predecoder and block selector Bit line load Transfer gate Column decoder Sense amplifier and write driver X – row address Y – column address Z – block address Block 1 Global row decoder 128 K Array Block 0 Subglobal row decoder Subglobal row decoder Block 31 Block 30 Local row decoder [Hirose90]

Contents-Addressable Memory Data (64 bits) Every row that satisfies the validity check (matches the mask bits) will activate the validity bits I/O Buffers Commands Comparand Data pattern to be matched Mask Priority encoder passes Only the valid rows and selects the one with the highest address in case of the multiple matches CAM Array Control Logic R/W Address (9 bits) Address Decoder ProrityEncoder 2 9 words 64 bits

Memory Timing: Approaches Multiplexed addressing Smaller number of pins needed since row and column addresses are sent sequentially Strobe signals RAS and CAS are used to read them Complete addressing Complete address is presented at once with sensing circuits to detect transitions on the address buss DRAM Timing Multiplexed Addressing SRAM Timing Self-timed

Read-Only Memory Cells Bit line (BL) is resistively clamped to the ground, so its default value is 0 Diode disadvantage – no electrical isolation between bit and word lines BL is resistively clamped to VDD, so its default value is 1 WL BL 1 VDD GND Diode ROM MOS ROM 1 MOS ROM 2

MOS OR ROM BL [0] BL [1] BL [2] BL [3] WL [0] V WL [1] WL [2] V WL [3] DD WL [1] WL [2] V DD WL [3] V bias Pull-down loads

MOS NOR ROM WL [0] V Pull-up devices GND WL [1] WL [2] GND WL [3] BL DD Pull-up devices WL [0] GND WL [1] WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3]

MOS NOR ROM Layout Programming using the Active Layer Only Cell (9.5l x 7l) Programming using the Active Layer Only diffusion is added to create transistors GND Polysilicon GND Metal1 Diffusion Metal1 on Diffusion VDD VDD VDD VDD Connected to VDD through pMOS

MOS NOR ROM Layout Programming using the Contact Layer Only Cell (11l x 7l) GND Programming using the Contact Layer Only contacts are added to create transistors Polysilicon Metal1 GND Diffusion Metal1 on Diffusion VDD VDD VDD VDD Connected to VDD through pMOS

MOS NAND ROM V DD Pull-up devices BL [0] BL [1] BL [2] BL [3] WL [0] WL [1] WL [2] WL [3] All word lines high by default with exception of selected row

MOS NAND ROM Layout Programming using the Metal-1 Layer Only Cell (8l x 7l) Programming using the Metal-1 Layer Only No horizontal contact to GND necessary; Loss in performance compared to NOR ROM drastically reduced cell size Polysilicon Diffusion Metal1 on Diffusion Add metal to eliminate transistor (short circuit)

NAND ROM Layout Programming using Implants Only Cell (5l x 6l) Polysilicon Threshold-altering implant Implant switches transistor on permanently thus eliminating it in ROM this results in a very small layout area (2 times smaller than NOR cell) Metal1 on Diffusion

Equivalent Transient Model for MOS NOR ROM DD C bit r word c WL BL Model for NOR ROM Word line parasitics Wire capacitance and gate capacitance Wire resistance (polysilicon) Bit line parasitics Resistance not dominant (metal) Drain and Gate-Drain capacitance

Equivalent Transient Model for MOS NAND ROM DD Model for NAND ROM BL C r L bit c r bit WL word c word Word line parasitics Similar to NOR ROM Bit line parasitics Resistance of cascaded transistors dominates Drain/Source and complete gate capacitance

Decreasing Word Line Delay

Precharged MOS NOR ROM V f pre DD Precharge devices WL [0] GND WL [1] WL [2] GND WL [3] BL [0] BL [1] BL [2] BL [3] PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design.

Non-Volatile Memories The Floating-gate Avalanche injection MOS transistor (FAMOS) D Source Drain t ox t ox n + p n +_ Substrate Schematic symbol Device cross-section

Floating-Gate Transistor Programming 20 V 10 V 5 V D S Avalanche injection 0 V - 5 V D S Removing programming voltage leaves charge trapped 5 V - 2.5 V D S Programming results in higher V T . Hot electrons go through the oxide and are trapped reducing floating gate voltage Typically new threshold is around 7 V so 5 V supply is not sufficient to turn transistor on, so the device is disabled

A “Programmable-Threshold” Transistor Shifted threshold effectively switches transistor off permanently Shift in the threshold depends on the charge injected onto the floating gate Injected charges are well insulated by silicon dioxide and can stay there with power off for many years. Floating gate used in almost all nonvolatile memories (EPROM, EEPROM, Flash)

Erasable-Programmable Read-Only Memory (EPROM) EPROM is erased by shining ultraviolet light through a transparent package window UV makes the oxide slightly conductive by generation of electron-hole pairs The erasure process is slow (seconds to minutes) Programming takes 5-10 msec Erasing can be repeated up to a thousand times Threshold is difficult to control after many erasures so special on-chip circuitry is required to control it High current required during programming

Floating Gate Tunneling Oxide FLOTOX Transistor EEPROM Source Drain V 20 – 30 nm -10 V GD 10 V n 1 n 1 Substrate p 10 nm Fowler-Nordheim tunneling I-V characteristic FLOTOX transistor Smaller voltage is required to program and programming is reversible by changing the voltage sign

EEPROM Cell BL WL V Absolute threshold control is hard Programmed transistor might be in depletion mode hard to turn off by word-line signal 2 transistor cell Design is larger than EPROM device Thin oxide is hard to make Erase-program can be repeated 105 times WL V DD

Flash EEPROM Control gate p- substrate drain Programming 12 V on gate Flash EPROM combines density of EPROM with versatility of EEPROM Programming performed by avalanche hot-electron injection (fast 1-10msec) Erasure of the complete chip is done using tunneling (careful control >100ms) No extra access transistor needed Control gate Erasure 12 V on source p- substrate Floating gate Thin tunneling oxide ~10 nm n + source drain Programming 12 V on gate

Cross-sections of NVM cells Flash EPROM Courtesy Intel

Basic Operations in a NOR Flash Memory― Erase All transistors erased Read back and repeat if erase is still needed

Basic Operations in a NOR Flash Memory― Write Programming requires 12V ant the gate and 6V at the drain with 0V source

Basic Operations in a NOR Flash Memory― Read In read operation the programmed transistor stores 1 as it is switched off permanently NOR Flash memories have Fast random read time Slow erasure and programming time Need precise control of thresholds

NAND Flash Memory Courtesy Toshiba High dielectric material – Unit Cell Word line(poly) Source line (Diff. Layer) High dielectric material – oxide-nitride-oxide for large CFG Select transistors Large storage density lower cost Fast programming 200-400 nsec Fast serial access Courtesy Toshiba

NAND Flash Memory Word lines Select transistor Bit line contact All contacts between word line are eliminated resulting in 40% smaller cell than NOR structure During erasure all transistor are depletion mode obtained by setting 20V at the source During program the selected word line is set to 20V to store “1” increasing its threshold During read both select transistors are enabled and read proceeds as in the NAND ROM Word lines Select transistor Bit line contact Source line contact Active area STI Courtesy Toshiba

New Nonvolatile Memories FRAM – Ferroelectric RAM Uses programmable capacitors Dielectric cristals polarize under electric field Very high density, many read/write cycles, Low power MRAM – Magnetoresistive RAM Similar to magnetic core memories Spin electronic or tunneling magnetic resistance

Characteristics of State-of-the-art NVM

Read-Write Memories (RAM) STATIC (SRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential DYNAMIC (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended

6-transistor CMOS SRAM Cell WL V DD M M 2 4 Q Q M M 6 5 M M 1 3 BL BL

CMOS SRAM Analysis (Read) WL V DD BL M 4 BL Q = Q = 1 M 6 M 5 V M V V DD 1 DD DD C C bit bit Initially BL and BLbar are precharged. Assume that cell stores 1 When the WL goes high BLbar is discharged low Must keep Qbar voltage rise below the nMOS threshold (0.4V) to avoid flipping the cell

CMOS SRAM Analysis (Read) 1.2 1 Cell ratio 0.8 0.6 0.4 Design for cell ratio in the green zone 0.2 0.5 1.2 1 1.5 2 2.5 3 Voltage Rise (V) Cell Ratio (CR)

CMOS SRAM Analysis (Write) BL = 1 Q M 4 5 6 V DD WL

CMOS SRAM Analysis (Write) Must pull down cell voltage VQ below the nMOS threshold (0.4V)

6T-SRAM — Layout VDD GND Q WL BL M1 M3 M4 M2 M5 M6

Resistance-load SRAM Cell 3 R L V DD WL Q 1 2 4 BL For large resistor values use undoped polysilicon with sheet resistance in Tohm/sq An alternative solution uses low quality parasitic thin-film PMOS (TFT) with OFF current 10-13A Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem

Resistance-load SRAM Cell 3 R L V DD WL Q 1 2 4 BL For large resistor values use undoped polysilicon with sheet resistance in Tohm/sq An alternative solution uses low quality parasitic thin-film PMOS (TFT) with OFF current 10-13A Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem

Static CAM Memory Cell ••• ••• Incoming data on Bit and Bit_bar Word ••• Wired-NOR Match Line Match M1 M2 M7 M6 M4 M5 M8 M9 M3 int S Incoming data on Bit and Bit_bar are compared with the stored data S and S_bar Match line is precharged high ••• ••• If there is a match the internal signal int is grounded and match stays high Otherwise int is pulled high and match goes low

SRAM Characteristics

3-Transistor DRAM Cell No constraints on device ratios WWL BL 1 M X 3 2 C S RWL V DD T D No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL -V Tn

3T-DRAM — Layout BL2 BL1 GND RWL WWL M3 M2 M1

1-Transistor DRAM Cell Storage capacitance Bit-line is precharged Write: C is charged or discharged by asserting WL and BL. S Read: Charge redistribution takes places between bit line and storage capacitance D V BL PRE – (V BIT C S + )------------ = Voltage swing is small; typically around 250 mV.

DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD

Sense Amp Operation D V (1) (0) t Sense amp activated PRE BL Sense amp activated Word line activated

1-T DRAM Cell Cross-section Layout Capacitor Metal word line Poly SiO 2 Field Oxide n + Inversion layer induced by plate bias M word 1 line Diffused bit line Polysilicon plate Polysilicon gate Cross-section Layout Uses polysilicon-diffusion capacitance Expensive in area

Poly-diffusion capacitor 1T-DRAM

Advanced 1T DRAM Cells Stacked-capacitor Cell Trench Cell Word line Cell plate Capacitor dielectric layer Insulating Layer Cell Plate Si Capacitor Insulator Transfer gate Isolation Refilling Poly Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide Trench Cell Stacked-capacitor Cell

Advanced 1T DRAM Cells Stacked-capacitor Cell Trench Cell Word line Cell plate Capacitor dielectric layer Insulating Layer Cell Plate Si Capacitor Insulator Transfer gate Isolation Refilling Poly Storage electrode Storage Node Poly Si Substrate 2nd Field Oxide Trench Cell Stacked-capacitor Cell

CAM in Cache Memory Hit Logic Address Decoder ARRAY Input Drivers Tag Hit Address SRAM Sense Amps / Input Drivers Data R/W Hit Logic Address Decoder Cash memory is used to store frequently accessed data to lower the memory access time and power. In cash CAM stores addresses and SRAM stores data. Once the address of requested data matches the one in CAM Hit signal goes high and data is read from SRAM otherwise the external slow memory must be read

Periphery Decoders Sense Amplifiers Input/Output Buffers Control / Timing Circuitry

Row Decoders Collection of 2M complex logic gates Organized in regular and dense fashion (N)AND Decoder - followed by inverter NOR Decoder

Hierarchical Decoders Multi-stage implementation improves performance • A 2 3 WL 1 NAND decoder using 2-input pre-decoders

Dynamic Decoders 2-input NOR decoder 2-input NAND decoder Precharge devices GND GND V DD WL 3 WL 3 WL WL 2 2 WL 1 WL 1 WL WL V f A A A A DD 1 1 A A A A 1 1 f 2-input NOR decoder Identical to NOR ROM 2-input NAND decoder Identical to NAND ROM

4-input pass-transistor based Column Decoder S BL 1 2 3 D 2-input NOR decoder Advantages: speed (tpd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count

4-to-1 tree based Column Decoder BL BL BL BL 1 2 3 A A A 1 A 1 D Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders Solutions: buffers progressive sizing combination of tree and pass transistor approaches

Decoder for circular shift-register V DD R WL f 1 2 • In serial access memories read or write address changes sequentially Only one WLi is active (a pointer) and shifts by one with clock f R is used for reset

Sense Amplifiers Idea: Use Sense Amplifer small s.a. transition input C D V × I av ---------------- = make V as small as possible small large Idea: Use Sense Amplifer small transition s.a. input output

Differential Sense Amplifier V DD M M 3 4 y Out bit M M bit 1 2 SE M Directly applicable to SRAMs. Gain Asense=-gm (ro2||ro4) gm is transconductance of the input transistors 5

Differential Sensing ― SRAM V DD BL EQ Diff. Sense Amp (a) SRAM sensing scheme (b) two stage differential amplifier SRAM cell i WL - x Output PC M 3 1 5 2 4 SE y Precharge bit lines by pulling PC_bar low Disable precharge to read Set SE to activate the sense amplifier Inputs from memory

Latch-Based Sense Amplifier (DRAM) EQ BL BL V DD SE SE Initialized in its meta-stable point with EQ Once adequate voltage gap is created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.

Charge-Redistribution Amplifier V ref BL V V L M S 1 Vin C small M M C 2 3 large Transient Response Concept Vs prechrged to VDD and VL to Vref-Vth M1 is cut off When M2 pulls down M1 conducts and Vs is quickly lowered to equalize VL

Charge-Redistribution Amplifier― EPROM V DD SE M Load 4 Out C Cascode out V M device casc 3 C col Column WLC M decoder 2 BL C EPROM M BL 1 WL array

Single-to-Differential Conversion S.A. Cell - x Output WL V ref BL + How to make a good Vref for differential amplifier? Must deliver x_bar smaller than x if “1” is stored and larger than x if “0” is stored

Open bitline architecture with dummy cells BLL L 1 R … BLR V DD SE EQ Dummy cell Bit line divided into left and right halves to reduce capacitance Dummy cells are added for reference When EQ signal is raised BLL and BLR are precharged to VDD/2 and L and L_bar are enabled charging Dummy cell to VDD/2 When reading Li word activate both Li and L When reading Ri word activate both Ri and L_bar

DRAM Read Process with Dummy Cell 3 3 2 2 BL BL V V 1 1 BL BL 1 2 3 1 2 3 t (ns) t (ns) reading 0 reading 1 3 EQ WL 2 V SE 1 1 2 3 t (ns) control signals

Voltage Down Converter Memories require different level of voltages Boosted word-line voltage VDD+Vtn Precharge half VDD voltage Reduced internal VDD power supply Negative substrate bias voltage - + V DD REF bias M drive DL Equivalent Model IR IL- IR IL This circuit delivers the reference voltage to VDL When VDL<VREF => PMOS drive gate is discharged increasing VDL When VDL>VREF=> PMOS drive gate is charged increasing VDL

Charge Pump - Transistors M1 and M2 are connected as diodes. The charges stored at capacitor Q=Cpump(VDD-VT) When B rises above Vload by more than threshold the Vload increases Maximum load voltage Vload rises to 2*(VDD-VT) – higher than VDD

DRAM Timing Total of 24 timing constraints must be observed

RDRAM Architecture network mux/demux memory array Data bus Clocks Very high speed packet transfer protocol is used to transfer large amount of data Narrow bus uses several clock cycles to transfer the data mux/demux network memory array Data bus Clocks Column Row demux packet dec. Bus k * l

Address Transition Detection SRAM memories are triggered by events detected by ATD circuits DELAY t d A 1 N 2 V DD ATD …

Reliability and Yield

Trends in DRAM Parameters 4K 10 100 1000 64K 1M 16M 256M 4G 64G Memory Capacity (bits / chip) C D (fF) S Q (fC) V smax (mV) DD (V) = 2 ( + ) bit line capacitance voltage swing storage charge storage capacitance power supply voltage are reduced in new technologies From [Itoh01]

Open Bit-line Architecture —Cross Coupling Word line selected creates charge redistribution Dummy line selected to compensate charge redistribution EQ WL WL WL WL WL WL 1 C D C D 1 WBL WBL BL BL C Sense C BL BL Amplifier C C C C C C Charge redistribution If both sides of memory array were symmetrical then the injected bit line noise would be compensated as a common mode signal for sense amplifiers

Folded-Bitline Architecture Sense Amplifier C WL 1 WBL D BL EQ x y … Better matching of BL BL_bar capacitances, so the cross-coupling noise can be suppressed

Transposed-Bitline Architecture SA C cross (a) Straightforward bit-line routing (b) Transposed bit-line architecture equalizes cross coupling noise in BL and BL_bar BL ‘ ‘’

Sources of Power Dissipation in Memories PERIPHERY ROW DEC selected non-selected CHIP COLUMN DEC nC DE V INT f mC C PT I DCP ARRAY m n m(n - 1)i hld mi act DD SS = S i D +S iact active selector current ihld data retention current CDE decoder capacitance CPT peripheral capacitance IDCP static peripheral current VINT internal supply voltage From [Itoh00]

Noise Sources in 1T DRam substrate BL Adjacent BL -particles WL cross electrode a -particles leakage S WL BL substrate Adjacent BL WBL Capacitance Cs must be above 30fF otherwise a single a-particle can destroy its charge Free neutrons from cosmic rays carry 10 time more charges than alpha particles Memories covered with polymide to protect against alpha radiation Error correction codes used to correct most failures

Alpha-particles -particle a WL V BL SiO n DD BL SiO 2 n + + - - + - + - + - Alpha particle have energy 8-9 MeV And penetrates silicon up to 10mm depth + - + 1 Particle generates ~ 1 Million electron-hole pairs This is comparable with the 50 fF capacitance storage at 3.5V

Memory Yield Yield curves at different stages of process maturity (from [Veendrick92]) Memory yield problem is fought using redundancy and error correction

Redundancy Row Decoder Row Address Redundant rows Fuse : Bank columns Memory Array Row Decoder Column Column Decoder Address

Error-Correcting Codes Example: Hamming Codes with e.g. B3 Wrong 1 = 3

Redundancy and Error Correction

Redundancy and Error Correction

Data Retention in SRAM (A) SRAM leakage increases with technology scaling yet for 64-Gb memory leakage current should be less than 3.5 aA at 25C Reduce leakage by turning off unused memory blocks (cashes) Increase thresholds by using body biasing Increase resistance in the leakage path Lower the supply voltage Lower the junction temperature Scale down the refresh period 1.30u 1.10u 900n 700n 500n 300n 100n 0.00 .600 1.20 1.80 Factor 7 0.13 m CMOS m 0.18 m CMOS VDD Ileakage (A)

Suppressing Leakage in SRAM V DD low-threshold transistor V V DD DDL sleep V DD,int sleep V DD,int SRAM SRAM SRAM cell cell cell SRAM SRAM SRAM cell cell cell V SS,int sleep Inserting Extra Resistance Reducing the supply voltage

Data Retention in DRAM Data retention (standby) current Active current Estimated current distribution for DRAM generations From [Itoh00]

Case Studies Programmable Logic Array SRAM Flash Memory

PLA versus ROM Programmable Logic Array Main difference structured approach to random logic “two level logic implementation” NOR-NOR (product of sums) NAND-NAND (sum of products) IDENTICAL TO ROM! Main difference ROM: fully populated PLA: one element per minterm Note: Importance of PLA’s has drastically reduced 1. slow 2. better software techniques (mutli-level logic synthesis) for random logic design

Programmable Logic Array Pseudo-NMOS PLA V DD GND GND GND GND GND GND GND V X X X X X X f f DD 1 1 2 2 1 AND-plane OR-plane

Dynamic PLA AND-plane OR-plane f GND V f f f V X X X X X X f f GND AND DD f OR f OR f AND V X X X X X X f f GND DD 1 1 2 2 1 AND-plane OR-plane

Clock Signal Generation for self-timed dynamic PLA Self-timing is recommended in PLA for maximum performance Dummy AND row in PLA estimates maximum loading condition for the required precharge time Worst case discharge time is estimated in a similar way f t pre eval AND OR (a) Clock signals (b) Timing generation circuitry Dummy AND row

PLA Layout

4 Mbit SRAM Hierarchical Word-line Architecture To improve performance and reduce power consumption in large memories the word line is hierarchically divided and sections are activated as needed

Bit-line Circuitry Block Bit-line select ATD load BEQ Local WL Memory cell B / T B / T CD CD CD I / O I/O line I / O Sense amplifier

Sense Amplifier (and Waveforms) I/O Lines Address Data-cut ATD BEQ SEQ DATA Vdd GND SA, SA I / O I / O SEQ Block select ATD BS SA BS SA SEQ SEQ SEQ SEQ DATA De i BS

1 Gbit Flash Memory From [Nakamura02]

Writing Flash Memory Final Distribution Evolution of thresholds 10 8 10 6 Number of cells 10 4 Read level (4.5 V) 10 2 10 0V 1V 2V 3V 4V Vt of memory cells Evolution of thresholds Final Distribution During erasure all bits are programmed to become depletion devices It takes four cycles of write/erase to establish all device threshold > 0.8 V From [Nakamura02]

125mm2 1Gbit NAND Flash Memory 32 word lines x 1024 blocks Charge pump 2kB Page buffer & cache 10.7mm 16896 bit lines 11.7mm From [Nakamura02]

125mm2 1Gbit NAND Flash Memory Technology 0.13m p-sub CMOS triple-well 1poly, 1polycide, 1W, 2Al Cell size 0.077m2 Chip size 125.2mm2 Organization 2112 x 8b x 64 page x 1k block Power supply 2.7V-3.6V Cycle time 50ns Read time  25s Program time 200s / page Erase time 2ms / block From [Nakamura02]

Semiconductor Memory Trends (up to the 90’s) Memory Size as a function of time: x 4 every three years

Semiconductor Memory Trends (updated) From [Itoh01]

Semiconductor Memory Trends There was an apparent shift in memory market due to the increase of flash memory use for video and other personalized storage devices other than personal computers SOC technology implements systems integrated with memories and processors on a single die Challenges of making even bigger and denser memories with lower market incentives responsible for the slow down in the memory chip capacities

Trends in Memory Cell Area From [Itoh01]

Semiconductor Memory Trends Technology feature size for different SRAM generations