©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012 ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM.

Slides:



Advertisements
Similar presentations
Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Advertisements

COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
Figure (a) 8 * 8 array (b) 16 * 8 array.
Chapter 5 Internal Memory
Computer Organization and Architecture
Prith Banerjee ECE C03 Advanced Digital Design Spring 1998
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
These slides incorporate figures from Digital Design Principles and Practices, third edition, by John F. Wakerly, Copyright 2000, and are used by permission.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
©Wen-mei W. Hwu and David Kirk/NVIDIA, SSL(2014), ECE408/CS483/ECE498AL, University of Illinois, ECE408/CS483 Applied Parallel Programming Lecture.
EECC341 - Shaaban #1 Lec # 19 Winter Read Only Memory (ROM) –Structure of diode ROM –Types of ROMs. –ROM with 2-Dimensional Decoding. –Using.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
10/11/2007EECS150 Fa07 - DRAM 1 EECS Components and Design Techniques for Digital Systems Lec 14 – Storage: DRAM, SDRAM David Culler Electrical Engineering.
1 The Basic Memory Element - The Flip-Flop Up until know we have looked upon memory elements as black boxes. The basic memory element is called the flip-flop.
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
1 Lecture 16B Memories. 2 Memories in General Computers have mostly RAM ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 Lecture 16B Memories. 2 Memories in General RAM - the predominant memory ROM (or equivalent) needed to boot ROM is in same class as Programmable Logic.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
1 EE365 Read-only memories Static read/write memories Dynamic read/write memories.
Main Memory by J. Nelson Amaral.
8-5 DRAM ICs High storage capacity Low cost Dominate high-capacity memory application Need “refresh” (main difference between DRAM and SRAM) -- dynamic.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign Lecture 10: GPU as part of the PC Architecture.
Memory Technology “Non-so-random” Access Technology:
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 8 – Memory Basics Logic and Computer Design.
© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 11 Parallel.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Survey of Existing Memory Devices Renee Gayle M. Chua.
CMPUT 429/CMPE Computer Systems and Architecture1 CMPUT429 - Winter 2002 Topic5: Memory Technology José Nelson Amaral.
© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 12 Parallel.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use ECE/CS 352: Digital Systems.
CPEN Digital System Design
Digital Logic Design Instructor: Kasım Sinan YILDIRIM
1 ECE408/CS483 Applied Parallel Programming Lecture 10: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al University.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.
1 ECE 8823A GPU Architectures Module 5: Execution and Resources - I.
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Computer Architecture Lecture 24 Fasih ur Rehman.
Feb. 26, 2001Systems Architecture I1 Systems Architecture I (CS ) Lecture 12: State Elements, Registers, and Memory * Jeremy R. Johnson Mon. Feb.
COMP541 Memories II: DRAMs
©Wen-mei W. Hwu and David Kirk/NVIDIA, University of Illinois, CS/EE 217 GPU Architecture and Parallel Programming Lecture 6: DRAM Bandwidth.
D ESIGN AND I MPLEMENTAION OF A DDR SDRAM C ONTROLLER FOR S YSTEM ON C HIP Magnus Själander.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 5-6: Memory System.
© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 19: Atomic.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
RAM RAM - random access memory RAM (pronounced ramm) random access memory, a type of computer memory that can be accessed randomly;
© David Kirk/NVIDIA and Wen-mei W
CS 704 Advanced Computer Architecture
COMP541 Memories II: DRAMs
7-5 DRAM ICs High storage capacity Low cost
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM
SRAM Memory External Interface
Dynamic Random Access Memory (DRAM) Basic
COMP541 Memories II: DRAMs
ECE408/CS483 Fall 2015 Applied Parallel Programming Lecture 7: DRAM Bandwidth ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University.
ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM Bandwidth
DRAM Bandwidth Slide credit: Slides adapted from
Digital Logic & Design Dr. Waseem Ikram Lecture 40.
15-740/ Computer Architecture Lecture 19: Main Memory
AKT211 – CAO 07 – Computer Memory
ECE 8823A GPU Architectures Module 5: Execution and Resources - I
DRAM Hwansoo Han.
Bob Reese Micro II ECE, MSU
Presentation transcript:

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM Bandwidth 1

Objective To understand DRAM bandwidth –Cause of the DRAM bandwidth problem –Programming techniques that address the problem: memory coalescing, corner turning, ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois,

Global Memory (DRAM) Bandwidth IdealReality ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois,

DRAM Bank Organization Each core array has about 1M bits Each bit is stored in a tiny capacitor, made of one transistor Memory Cell Core Array Row Decoder Sense Amps Column Latches Mux Row Addr Column Addr Off-chip Data Wide Narrow Pin Interface 4

A very small (8x2 bit) DRAM Bank ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, decode 011 Sense amps Mux 5

DRAM core arrays are slow. Reading from a cell in the core array is a very slow process –DDR: Core speed = ½ interface speed –DDR2/GDDR3: Core speed = ¼ interface speed –DDR3/GDDR4: Core speed = ⅛ interface speed –… likely to be worse in the future decode To sense amps A very small capacitance that stores a data bit About 1000 cells connected to each vertical line ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois,

DRAM Bursting. For DDR{2,3} SDRAM cores clocked at 1/N speed of the interface: –Load (N × interface width) of DRAM bits from the same row at once to an internal buffer, then transfer in N steps at interface speed –DDR2/GDDR3: buffer width = 4X interface width ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois,

DRAM Bursting ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, decode 010 Sense amps Mux 8

DRAM Bursting ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, decode 011 Sense amps and buffer Mux 9

DRAM Bursting for the 8x2 Bank ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, time Address bits to decoder Core Array access delay 2 bits to pin 2 bits to pin Non-burst timing Burst timing Modern DRAM systems are designed to be always accessed in burst mode. Burst bytes are transferred but discarded when accesses are not to sequential locations. 10

Multiple DRAM Banks ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, decode Sense amps Mux decode Sense amps Mux 0110 Bank 0 Bank 1 11

DRAM Bursting for the 8x2 Bank ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, time Address bits to decoder Core Array access delay 2 bits to pin 2 bits to pin Single-Bank burst timing, dead time on interface Multi-Bank burst timing, reduced dead time 12