DRAM Tutorial 18-447 Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.

Slides:



Advertisements
Similar presentations
Prith Banerjee ECE C03 Advanced Digital Design Spring 1998
Advertisements

5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
A Case for Subarray-Level Parallelism (SALP) in DRAM Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, Onur Mutlu.
Understanding a Problem in Multicore and How to Solve It
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Memory See: P&H Appendix C.8, C.9.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 11: Large Cache Design Topics: large cache basics and… An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al.,
1 Lecture 7: Caching in Row-Buffer of DRAM Adapted from “A Permutation-based Page Interleaving Scheme: To Reduce Row-buffer Conflicts and Exploit Data.
1 EE365 Read-only memories Static read/write memories Dynamic read/write memories.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.
1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
Scalable Many-Core Memory Systems Lecture 2, Topic 1: DRAM Basics and DRAM Scaling Prof. Onur Mutlu HiPEAC.
A Case for Subarray-Level Parallelism (SALP) in DRAM Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, Onur Mutlu.
Memory Technology “Non-so-random” Access Technology:
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use ECE/CS 352: Digital Systems.
Lecture 13 Main Memory Computer Architecture COE 501.
CPEN Digital System Design
Digital Logic Design Instructor: Kasım Sinan YILDIRIM
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization ProcessorMemoryChannel Limited bandwidthHigh energy Carnegie Mellon UniversityIntel.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Computer Architecture Lecture 24 Fasih ur Rehman.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,
Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University.
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
ECE/CS 552: Cache Concepts © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith.
“With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
CS203 – Advanced Computer Architecture Main Memory Slides adapted from Onur Mutlu (CMU)
15-740/ Computer Architecture Lecture 25: Main Memory
Simple DRAM and Virtual Memory Abstractions for Highly Efficient Memory Systems Thesis Oral Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Phillip.
CS161 – Design and Architecture of Computer Main Memory Slides adapted from Onur Mutlu (CMU)
CS 704 Advanced Computer Architecture
COMP541 Memories II: DRAMs
Reducing Memory Interference in Multicore Systems
Dynamic Random Access Memory (DRAM) Basic
Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Kevin Chang Prashant Nair, Donghyuk Lee, Saugata Ghose, Moinuddin.
Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.
Samira Khan University of Virginia Oct 9, 2017
Prof. Gennady Pekhimenko University of Toronto Fall 2017
The Main Memory system: DRAM organization
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri.
Today, DRAM is just a storage device
Ambit In-memory Accelerator for Bulk Bitwise Operations
Lecture: DRAM Main Memory
Digital Logic & Design Dr. Waseem Ikram Lecture 40.
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
15-740/ Computer Architecture Lecture 19: Main Memory
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Prof. Onur Mutlu ETH Zürich Fall September 2018
Computer Architecture Lecture 30: In-memory Processing
Presentation transcript:

DRAM Tutorial Lecture Vivek Seshadri

Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2

Vivek Seshadri – Thesis Proposal Goals Cost Latency Bandwidth Parallelism Power Energy 3

Vivek Seshadri – Thesis Proposal DRAM Chip 4 Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O

Vivek Seshadri – Thesis Proposal Sense Amplifier 5 enable top bottom Inverter

Vivek Seshadri – Thesis Proposal Sense Amplifier – Two Stable States V DD Logical “1” Logical “0”

Vivek Seshadri – Thesis Proposal Sense Amplifier Operation 7 0 VTVT VBVB V T > V B 1 0 V DD

Vivek Seshadri – Thesis Proposal DRAM Cell – Capacitor 8 Empty State Fully Charged State Logical “0”Logical “1” 1 2 Small – Cannot drive circuits Reading destroys the state

Vivek Seshadri – Thesis Proposal Capacitor to Sense Amplifier V DD 1 0

Vivek Seshadri – Thesis Proposal DRAM Cell Operation 10 ½V DD 01 0 V DD ½V DD +δ

Vivek Seshadri – Thesis Proposal DRAM Subarray – Building Block for DRAM Chip 11 Row Decoder Cell Array Array of Sense Amplifiers (Row Buffer) 8Kb

Vivek Seshadri – Thesis Proposal DRAM Bank 12 Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O (64b) Address Data

Vivek Seshadri – Thesis Proposal DRAM Chip 13 Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Row Decoder Array of Sense Amplifiers Cell Array Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Shared internal bus Memory channel - 8bits

Vivek Seshadri – Thesis Proposal DRAM Operation 14 Row Decoder Array of Sense Amplifiers Cell Array Bank I/O Data 1 2 ACTIVATE Row READ/WRITE Column 3 PRECHARGE Row Address Column Address

RowClone Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, T. C. Mowry Vivek Seshadri

Vivek Seshadri – Thesis Proposal Memory Channel – Bottleneck Core Cache MC Memory Channel Limited Bandwidth High Energy

Vivek Seshadri – Thesis Proposal Goal: Reduce Memory Bandwidth Demand Core Cache MC Memory Channel Reduce unnecessary data movement

Vivek Seshadri – Thesis Proposal Bulk Data Copy and Initialization Bulk Data Copy Bulk Data Initialization srcdst val

Vivek Seshadri – Thesis Proposal Bulk Data Copy and Initialization Bulk Data Copy Bulk Data Initialization srcdst val

Vivek Seshadri – Thesis Proposal Bulk Copy and Initialization – Applications Forking Zero initialization (e.g., security) VM Cloning Deduplication Checkpointing Page Migration Many more

Vivek Seshadri – Thesis Proposal Shortcomings of Existing Approach Core Cache MC Channel src dst High latency ( 1046 ns to copy 4 KB) Interference High Energy ( 3600 nJ to copy 4 KB)

Vivek Seshadri – Thesis Proposal Our Approach: In-DRAM Copy with Low Cost Core Cache MC Channel dst High latency Interference High Energy src X X X ?

RowClone: In-DRAM Copy 23

Vivek Seshadri – Thesis Proposal Two Key Observations 24 Row Decoder Any operation on one sense amplifier can be easily performed in bulk Many DRAM cells share the same sense amplifier 1 2

Vivek Seshadri – Thesis Proposal Bulk Copy in DRAM – RowClone 25 ½V DD 01 0 V DD ½V DD +δ Data gets copied

Vivek Seshadri – Thesis Proposal Fast Parallel Mode – Benefits 26 LatencyEnergy Bulk Data Copy (4KB across a module) 1046 ns to 90 ns 3600 nJ to 40 nJ No bandwidth consumption Very little changes to the DRAM chip 11X74X

Vivek Seshadri – Thesis Proposal Fast Parallel Mode – Constraints Location constraint – Source and destination in same subarray Size constraint – Entire row gets copied (no partial copy) Can still accelerate many existing primitives (copy-on-write, bulk zeroing) Alternate mechanism to copy data across banks (pipelined serial mode – lower benefits than Fast Parallel)

Vivek Seshadri – Thesis Proposal End-to-end System Design Software interface –memcpy and meminit instructions Managing cache coherence – Use existing DMA support! Maximizing use of Fast Parallel Mode – Smart OS page allocation 28

Vivek Seshadri – Thesis Proposal Applications Summary 29

Vivek Seshadri – Thesis Proposal Results Summary 30