An introduction to SDRAM and memory controllers 5kk73.

Slides:



Advertisements
Similar presentations
MEMORY popo.
Advertisements

Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Chapter 5 Internal Memory
Computer Organization and Architecture
Computer Organization and Architecture
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Anshul Kumar, CSE IITD CSL718 : Main Memory 6th Mar, 2006.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Chapter 5 Internal Memory
Memory Key component of a computer system is its memory system to store programs and data. ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides12.ppt.
Main Memory by J. Nelson Amaral.
Overview Booth’s Algorithm revisited Computer Internal Memory Cache memory.
CSCI 4717/5717 Computer Architecture
Memory Technology “Non-so-random” Access Technology:
Faculty of Information Technology Department of Computer Science Computer Organization and Assembly Language Chapter 5 Internal Memory.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Survey of Existing Memory Devices Renee Gayle M. Chua.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CPEN Digital System Design
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Chapter 4: MEMORY Internal Memory.
Memory Cell Operation.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Computer Architecture Lecture 24 Fasih ur Rehman.
Semiconductor Memory Types
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
CS 1410 Intro to Computer Tecnology Computer Hardware1.
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
Computer Architecture Chapter (5): Internal Memory
CSE431 L18 Memory Hierarchy.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 18: Memory Hierarchy Review Mary Jane Irwin (
CPEG3231 Integration of cache and MIPS Pipeline  Data-path control unit design  Pipeline stalls on cache misses.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
CS 704 Advanced Computer Architecture
COMP541 Memories II: DRAMs
CS 1251 Computer Organization N.Sundararajan
Chapter 5 Internal Memory
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
Computer Architecture
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
15-740/ Computer Architecture Lecture 19: Main Memory
AKT211 – CAO 07 – Computer Memory
William Stallings Computer Organization and Architecture 8th Edition
Presentation transcript:

An introduction to SDRAM and memory controllers 5kk73

Outline ► Part 1: DRAM and controller basics –DRAM architecture and operation –Timing constraints –DRAM controller ► Part 2: DRAMs in embedded systems –Challenges in sharing DRAMs –Real-time guarantees with DRAMs –Future DRAM architectures and controllers 2

Memory device ► “A device that preserves information for retrieval” - Web definition 3

Semiconductor memories ► “Semiconductor memory is an electronic data storage device, often used as computer memory, implemented on a semiconductor-based integrated circuit” - Wikipedia definition ► The main characteristics of semiconductor memory are low cost, high density (bits per chip), and ease of use 4

Semiconductor memory types 5 ► RAM (Random Access Memory) –DRAM (Dynamic RAM) Synchronous DRAM (SDRAM) –SRAM (Static RAM) ► ROM (Read Only Memory) –Mask ROM, Programmable ROM (PROM), EPROM (Erasable PROM), UVEPROM (Ultra Violet EPROM) ► NVRAM (Non-Volatile RAM) or Flash memory

Memory hierarchy 6 Registers L1 Cache L2 Cache Off-chip memory Secondary memory (Hard disk) Access speed Distance from processor Size (capacity) CPU Off-chip memory Processor Registers L1 Cache L2 Cache Secondary memory

Memory hierarchy ModuleMemory type used Access Time CapacityManaged by RegistersSRAM1 cycle~500BSoftware/com piler L1 CacheSRAM1-3 cycles~64KBHardware L2 CacheSRAM5-10 cycles1-10MBHardware Off-chip memory DRAM ~100 cycles~10GBSoftware/OS Secondary memory Disk drive~1000 cycles~1TBSoftware/OS 7 Credits: J.Leverich, Stanford CPU Off-chip memory Processor Registers L1 Cache L2 Cache Secondary memory

SRAM vs DRAM 8 ► A bit is stored as charge on the capacitor ► Bit cell loses charge over time (read operation and circuit leakage) - Must periodically refresh - Hence the name Dynamic RAM Credits: J.Leverich, Stanford Static Random Access Memory Dynamic Random Access Memory ► Bitlines driven by transistors - Fast (10x) ► 1 transistor and 1 capacitor vs. 6 transistors –Large (~6-10x)

SRAM vs DRAM: Summary ► SRAM is preferable for register files and L1/L2 caches –Fast access –No refreshes –Simpler manufacturing (compatible with logic process) –Lower density (6 transistors per cell) –Higher cost ► DRAM is preferable for stand-alone memory chips –Much higher capacity –Higher density –Lower cost –DRAM is the main focus in this lecture! 9 Credits: J.Leverich, Stanford

DRAM: Internal architecture ► Bit cells are arranged to form a memory array ► Multiple arrays are organized as different banks –Typical number of banks are 4, 8 and 16 ► Sense amplifiers raise the voltage level on the bitlines to read the data out 10 Credits: J.Leverich, Stanford Bank 4 –Row Buffer Bank 3 –Row Buffer Bank 2 –Row Buffer Bank 1 Memory Array Row decoder Column decoder Sense amplifiers (row buffer) Address register Address MS bits LS bits Data

DRAM: Read access sequence ► Decode row address & drive word- lines ► Selected bits drive bit-lines –Entire row read ► Amplify row data ► Decode column address & select subset of row ► Send to output ► Precharge bit-lines for next access 11 –Row Buffer Bank 1 Memory Array Row decoder Column decoder Sense amplifiers Address register Address MS bits LS bits Credits: J.Leverich, Stanford Data

DRAM: Memory access protocol ► To reduce pin count, row and column share same address pins –RAS = Row Address Strobe –CAS = Column Address Strobe ► Data is accessed by issuing memory commands ► 5 basic commands –ACTIVATE –READ –WRITE –PRECHARGE –REFRESH 12 –Row Buffer Bank 1 Memory Array 2 n Row x 2 n Column Row decoder Column decoder Sense amplifiers RAS Address Credits: J.Leverich, Stanford Data CAS n 2n2n m 2m2m 1 2m2m

DRAM: Basic operation 13 –Row Buffer Row decoder Column decoder Row buffer Row address Data Row 0 Addresses (Row 0, Column 0) (Row 0, Column 1) (Row 0, Column 10) (Row 1, Column 0) Commands ACTIVATE Row 0 READ Column 0 READ Column 1 READ Column 10 PRECHARGE Row 0 ACTIVATE Row 1 READ Column 0 Columns Rows Row 0 Row buffer HIT! Column address Row 1 Row buffer MISS! Credits: J.Leverich, Stanford

DRAM: Basic operation (Summary) ► Access to an “open row” –No need to issue ACTIVATE command –READ/WRITE will access row buffer ► Access to a “closed row” –If another row is already active, issue PRECHARGE first –Issue ACTIVATE to open a new row –READ/WRITE will access row buffer –Optional: PRECHARGE after READ/WRITEs finished If PRECHARGE issued  Closed-page policy If not  Open-page policy 14 Credits: J.Leverich, Stanford

DRAM: Burst access ► Each READ/WRITE command can transfer multiple words (8 in DDR3) ► Observe the number of words transferred in a single clock cycle –Double Data Rate (DDR) 15 Credits: J.Leverich, Stanford

DRAM: Banks ► DRAM chips can consist of multiple banks –Address = (Bank x, Row y, Column z) ► Banks operate independently, but share command, address and data pins –Each bank can have a different row active –Can overlap ACTIVATE and PRECHARGE latencies!(i.e. READ to bank 0 while ACTIVATING bank 1)  Bank-level parallelism 16 –Row Buffer Row 0 –Row Buffer Row 1 Bank 0 Bank 1 Credits: J.Leverich, Stanford

DRAM: Bank-level parallelism ► Enable DRAM access from different banks in parallel –Reduces memory access latency and improves efficiency! 17 Credits: J.Leverich, Stanford

2Gb x8 DDR3 Chip [Micron] ► Observe the bank organization 18 Credits: J.Leverich, Stanford

2Gb x8 DDR3 Chip [Micron] ► Observe row width, bi-directional bus and 64  8 data-path 19 Credits: J.Leverich, Stanford

DDR3 SDRAM: Current standard ► Introduced in 2007 ► SDRAM  Synchronous DRAM (Clocked) –DDR = Double Data Rate Data transferred on both clock edges –400 MHz = 800 MT/s –x4, x8, x16 datapath widths –Minimum burst length of 8 –8 banks –1Gb, 2Gb, 4Gb capacity 20

DRAM: Timing Constraints –tRCD= Row to Column command delay Time taken by the charge stored in the capacitor cells to reach the sense amps –tRAS= Time between RAS and data restoration in DRAM array (minimum time a row must be open) –tRP= Time to precharge DRAM array ► Memory controller must respect the physical device characteristics! 21 RD ACT PRE DnD1 NOP CMD DATA t RAS t RCD t RL t RP

DRAM: Timing Constraints ► There are a bunch of other timing constraints… –tCCD= Time between column commands –tWTR= Write to read delay (bus turaround time) –tCAS= Time between column command and data out –tWR= Time from end of last write to PRECHARGE –tFAW= Four ACTIVATE window (limits current surge) Maximum number of ACTIVATEs in this window is limited to four –tRC= tRAS+ tRP= Row “cycle” time Minimum time between accesses to different rows ► Timing constraints makes performance analysis and memory controller design difficult! 22

DRAM controller ► Request scheduler decides which memory request to be selected ► Memory map translates logical address  physical address Loical address = incoming address Physical address = (Bank, Row Column) ► Command generator issues memory commands respecting the physical device characteristics 23 Address Command DRAM RequestschedulerRequestscheduler MemorymapMemorymap CommandgeneratorCommandgenerator DRAM controller Back-end Front-end

Request scheduler ► Many algorithms exist to determine how to schedule memory requests –Prefer requests targeting open rows Increases number of row buffer hit –Prefer read after read and write after write Minimize bus turnaround –Always prefer reads, since reads are blocking and writes often posted Reduce stall cycles of processor 24

Memory map ► Memory map decodes logical address to physical address –Physical address is (bank, row, column) –Decoding is done by slicing the bits in the logical address ► Several memory mapping schemes exist –Continuous, Bank Interleaved 25 MemorymapMemorymap Logical addr. Physical addr. 0x10FF00 (2, 510, 128)

Continuous memory map ► Map sequential address to columns in row ► Switch bank when all columns in row are visited ► Switch row when all banks are visited 26

Bank-interleaved memory map 27 ► Bank-interleaved memory map –Maps bursts to different banks in interleaving fashion –Active row in a bank is not changed until all columns are visited

Memory map generalization ► Continuous and interleaving memory map are just 2 possible memory mapping schemes –In the most general case, an arbitrary set of bits out of the logical address could be used for the row, column and bank address, respectively 28 Example memory map (1 burst per bank, 2 banks interleaving, 8 words per burst): Logical address: RRR RRRR RRRR RRBB CCCC CCCB CCCW Burst-size Bank interleaving Bank-offset Example memory: Example memory: 16-bit DDR MB 8 banks 8K rows / bank 1024 columns / row 16 bits / column Row Bit 0 Bit 26 Can be done in different ways – choice affects memory efficiency!

Command generator ► Decide selection of memory requests ► Generate SDRAM commands without violating timing constraints 29 CommandgeneratorCommandgenerator

Command generator ► Different page policies to determine which command to schedule –Close-page policy: Close rows as soon as possible to activate new one faster, i.e, not to waste time to PRECHARGE the open row of the previous request –Open page policy: Keep rows open as long as possible to benefit from locality, i.e., assuming the next request will target the same open row 30

Open page or Close page? 31 –Row Buffer Row decoder Column decoder Row buffer Row address Data Row 0 Addresses (Row 0, Column 0) (Row 0, Column 1) (Row 0, Column 10) (Row 1, Column 0) Commands ACTIVATE Row 0 READ Column 0 READ Column 1 READ Column 10 PRECHARGE Row 0 ACTIVATE Row 1 READ Column 0 Columns Rows Row 0 Row buffer HIT! Column address Row 1 Row buffer MISS! Credits: J.Leverich, Stanford

A modern DRAM controller [Altera] 32 Image: Altera

Conclusions (Part 1) ► SDRAM is used as off-chip high-volume storage –Cheaper, slower than SRAM ► DRAM timing constraints makes it hard to design memory controller ► Selection of memory map and command/request sheduling algorithms impacts memory access time and/or efficiency 33

Outline ► Part 1: DRAM and controller basics –DRAM architecture and operation –Timing constraints –DRAM controller ► Part 2: DRAMs in embedded systems –Challenges in sharing DRAMs –Real-time guarantees with DRAMs –Future DRAM architectures and controllers 34

Trends in embedded systems 35 ► Embedded systems get increasingly complex –Increasingly complex applications (more functionality) –Growing number of applications integrated in a device –Requires increased system performance without increasing power ► The case of a generic car manufacturer –Typical number of ECUs in a car in 2000  20 –Number of ECUs in Audi A8 Sedan  over 80

System-on-Chip (SoC) ► The resulting complex contemporary platforms are heterogeneous multi-processor systems –Resources in the system are shared to reduce cost 36

SoC: Video and audio processing system ► DRAM is typically used as shared main memory for cost and reasons 37 Host CPU Video Engine Audio Processor GPU DMA Controller Input processor LCD Controller Memory controller Interconnect DRAM A.B. Soares et.al., Development of a SoC for Digital Television Set-Top Box: Architecture and System Integration Issues, International Journal of Reconfigurable Computing Volume 2013

Set-top box architecture [Philips] 38

DRAM controller architecture ► The arbiter grants memory access to one of the memory clients at a time –Example: Round-Robin, Time Division Multiplexing (TDM) priority- based arbiters 39 Bus Arbiter Client 1 Client 2 Client 3 Client 4 Client n DRAM DRAM controller MemorymapMemorymap CommandgeneratorCommandgenerator

DRAM controller for real-time systems ► Clients in real-time systems have requirements on latency/bandwidth –A fixed set of memory access parameters such as burst size, page- policy etc in the back-end bounds transaction execution time –Predictable arbiters, such as TDM fixed time slots, Round Robin bounds response time 40 Bounds response time Bounds execution time Back-end Interconnect Arbiter Client 1 Client 2 Client 3 Client 4 Client n DRAM B.Akesson et.al., “Predator: A Predictable SDRAM Memory Controller”, CODES+ISSS, 2007

DRAMs in the market 41 FamilyGenerationsDatapath width (bits)Frequency range (MHz) DDR DDR DDR DDR LPDDR 16 and LPDDR216 and LPDDR316 and WIDE IOSDR ► Observe the increase in operating frequency with every generation

DRAMs: Bandwidth vs clock frequency ► WIDE IO gives much higher bandwidth at lower frequency –Low power consumption 42

Multi-channel DRAM: WIDE IO ► Bandwidth demands of future embedded systems > 10 GB/s –Memory power consumption scales up with memory operating frequency  “Go parallel” ► Multi-channel memories –Each channel is an independent memory module with dedicated data and control path –WIDE IO DRAM (4 channels) 43 Channel bit IO Channel bit IO Channel bit IO Channel bit IO

Multi-channel DRAM controller ► The Atomizer chops the incoming requests into a number of service units ► Channel Selector (CS) routes the service units to the different memory channels according to the configuration in the Sequence Genrators 44 Memory client 1 Arbiter Back-end DRAM controller 1 Atomizer Sequence gen 1 Memory client 2 Atomizer Sequence gen 2 Arbiter Back-end DRAM controller 2 CS Channel 1 Channel 2 M.D.Gomony et.al., “Architecture and Optimal Configuration of a Real-Time Multi-Channel Memory Controller”, DATE, 2012

Multi-channel DRAM controller ► Multi-channel memories allow memory requests to be interleaved across multiple memory channels –Reduces access latency 45 Memory client 1 Arbiter Back-end DRAM controller 1 Sequence gen 1 Arbiter Back-end DRAM controller 2 Channel 1 Channel 2 Atomizer CS

Wide IO memory controller [Cadence] 46 Image: Cadence

Future DRAM: HMC ► Hybrid Memory Cube (HMC) –16 memory channels ► How does the memory controller for HMC look like? 47 Image: Micron, HMC

Conclusions (part 2) ► DRAMs are shared in multi-processor SoC to reduce cost and to enable communication between the processing elements ► Sharing DRAMs between multiple memory clients can done using different arbitration algorithms ► Predictable arbitration and back-end provides real-time guarantees on latency and bandwidth to real-time clients ► Multi-channel DRAMs allows a memory request to be interleaved across memory channels 48

49

References ► B. Jacob et al., Memory systems: cache, DRAM, disk. Morgan Kaufmann, 2007 ► B.Akesson et.al., “Predator: A Predictable SDRAM Memory Controller”, CODES+ISSS, 2007 ► M.D.Gomony et.al., “Architecture and Optimal Configuration of a Real-Time Multi-Channel Memory Controller”, DATE, 2012 ► 50