1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.

Slides:



Advertisements
Similar presentations
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Advertisements

Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Computer Organization and Architecture
A Case for Refresh Pausing in DRAM Memory Systems
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
1 Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, intro to multi-thread programming models.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
Dezső Sima Spring 2008 (Ver. 1.0)  Sima Dezső, 2008 FB-DIMM technology.
©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM.
1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Survey of Existing Memory Devices Renee Gayle M. Chua.
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
FAMU-FSU College of Engineering 1 Computer Architecture EEL 4713/5764, Fall 2006 Dr. Linda DeBrunner Module #17—Main Memory Concepts.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
CPEN Digital System Design
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
Yi-Lin, Tu 2013 IEE5011 –Fall 2013 Memory Systems Wide I/O High Bandwidth DRAM Yi-Lin, Tu Department of Electronics Engineering National Chiao Tung University.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
02/21/2003 CART 1 On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Computer Architecture Lecture 24 Fasih ur Rehman.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Lecture 5: Refresh, Chipkill Topics: refresh basics and innovations, error correction.
1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.
1 Lecture: Memory Technology Innovations Topics: state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile cells, photonics Multiprocessor.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
Computer Architecture Chapter (5): Internal Memory
15-740/ Computer Architecture Lecture 25: Main Memory
1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
COMP541 Memories II: DRAMs
ESE532: System-on-a-Chip Architecture
Reducing Memory Interference in Multicore Systems
COMP541 Memories II: DRAMs
RAM Chapter 5.
Samira Khan University of Virginia Oct 9, 2017
Lecture 15: DRAM Main Memory Systems
Lecture: Memory, Multiprocessors
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: Memory Technology Innovations
Lecture 6: Reliability, PCM
Lecture 24: Memory, VM, Multiproc
Lecture 22: Cache Hierarchies, Memory
Lecture 17: Multi-threaded Applications
15-740/ Computer Architecture Lecture 19: Main Memory
DRAM Hwansoo Han.
CS 295: Modern Systems Storage Technologies Introduction
Presentation transcript:

1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC

2 Refresh Every DRAM cell must be refreshed within a 64 ms window A row read/write automatically refreshes the row Every refresh command performs refresh on a number of rows, the memory system is unavailable during that time A refresh command is issued by the memory controller once every 7.8us on average

3 RAIDR Liu et al., ISCA 2012 Process variation impacts the leakage rate for each cell Groups of rows are classified into bins based on leakage rate Each bin has its own refresh rate (multiples of 64ms) that are tracked with Bloom filters Prior work:  Smart Refresh: skip refresh for recently read rows  Flikker: non-critical data is placed in rows that are refreshed less frequently

4 DRAM Chip Floorplan From: Vogelsang, MICRO 2010

5 Modern Memory System PROC.. 4 DDR3 channels 64-bit data channels 800 MHz channels 1-2 DIMMs/channel 1-4 ranks/channel

6 Cutting-Edge Systems PROC SMB.. The link into the processor is narrow and high frequency The Scalable Memory Buffer chip is a “router” that connects to multiple DDR3 channels (wide and slow) Boosts processor pin bandwidth and memory capacity More expensive, high power

7 Buffer-on-Board Examples From: Cooper-Balis et al., ISCA 2012

8 FB-DIMM 100 W for a fully-populated FB-DIMM channel under high load. Processor FB-DIMM AMB DRAM : … … FB-DIMM AMB DRAM : … … FB-DIMM AMB DRAM : … … … FB-DIMM architecture; up to 8 FB-DIMMs can be daisy-chained through AMBs. / 10 bit 14 bit /

9 LR-DIMM Figure from: Yoon et al., ISCA 2012

10 BOOM Yoon et al., ISCA 2012 Using LPDDR chips at low frequency Wide internal datapath Longer burst length on DBUS No cache line buffering Higher activate power Same bandwidth, but lower power and higher latency

11 BOOM with Sub-Ranking

12 Disaggregated Memory Lim et al., ISCA 2009 To support high capacity memory, build a separate memory blade that is shared by all compute blades in a rack For example, if average utilization is 2 GB, each compute blade is only provisioned with 2 GB of memory; but the compute blade can also access (say) 2 TB of data in the memory blade The hierarchy is exclusive and data is managed at page granularity Remote memory access is via PCIe (120 ns latency and 1 GB/s in each direction)

13 Micron HMC 3D-stacked device with memory+logic High capacity, low power, high bandwidth Can move functionalities to the memory package Figure from: T. Pawlowski, HotChips 2011

14 HMC Details 32 banks per die x 8 dies = 256 banks per package 2 banks x 8 dies form 1 vertical slice (shared data bus) High internal data bandwidth (TSVs)  entire cache line from a single array (2 banks) that is 256 bytes wide Future generations: eight links that can connect to the processor or other HMCs – each link (40 GBps) has 16 up and 16 down lanes (each lane has 2 differential wires) 1866 TSVs at 60 um pitch and 2 Gb/s (50 nm 1Gb DRAMs) 3.7 pJ/bit for DRAM layers and 6.78 pJ/bit for logic layer (existing DDR3 modules are 65 pJ/bit) Figure from: T. Pawlowski, HotChips 2011

15 Title Bullet