Download presentation
Presentation is loading. Please wait.
Published byLiliane Desmarais Modified over 5 years ago
1
18-447 Computer Architecture Lecture 30: In-memory Processing
Vivek Seshadri Carnegie Mellon University Spring 2015, 4/13/2015
2
Goals for This Lecture Understand DRAM technology
How it is built? How it operates? What are the trade-offs? Can we use DRAM for more than just storage? In-DRAM copying In-DRAM bitwise operations
3
DRAM Module and Chip
4
Goals of DRAM Design Cost Latency Bandwidth Parallelism Power Energy
Reliability
5
DRAM Chip Bank
6
DRAM Cell – Capacitor Empty State Fully Charged State Logical “0”
1 Small – Cannot drive circuits 2 Reading destroys the state
7
Sense Amplifier top enable Inverter bottom
8
Sense Amplifier – Two Stable States
VDD en en VDD Logical “1” Logical “0”
9
Sense Amplifier Operation
VT VDD VT > VB en dis VB
10
Capacitor to Sense Amplifier
? en VDD en VDD
11
DRAM Cell Operation Cell regains charge Cell loses charge ½VDD ½VDD+δ
dis en ½VDD
12
Amortizing Cost – DRAM Tile
Row Driver
13
DRAM Subarray Row Decoder Row Driver Tile Tile Tile
14
DRAM Subarray Tile Tile Tile Tile Tile Tile Tile Row Decoder
Row Driver Tile Tile Tile Tile Tile Tile Tile
15
Array of Sense Amplifiers (8Kb) Array of Sense Amplifiers
DRAM Bank Row Decoder Cell Array Array of Sense Amplifiers (8Kb) Cell Array Address Row Decoder Cell Array Array of Sense Amplifiers Cell Array Bank I/O (64b) Data Address
16
Array of Sense Amplifiers (8Kb) Array of Sense Amplifiers
DRAM Chip Shared internal bus Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Array of Sense Amplifiers Bank I/O (64b) Memory channel - 8bits
17
Array of Sense Amplifiers
DRAM Operation Row Decoder 1 ACTIVATE Row Row Address 2 READ/WRITE Column Row Decoder Cell Array Array of Sense Amplifiers Cell Array 3 PRECHARGE Bank I/O Data Column Address
18
Goals for This Lecture Understand DRAM technology
How it is built? How it operates? What are the trade-offs? Can we use DRAM for more than just storage? In-DRAM copying In-DRAM bitwise operations
19
Trade-offs in DRAM Design
Cost Latency Bandwidth Parallelism Power Energy Reliability Rows/Subarray Data width, Chips/DIMM Banks/Chip
20
Goals for This Lecture Understand DRAM technology
How it is built? How it operates? What are the trade-offs? Can we use DRAM for more than just storage? In-DRAM copying In-DRAM bitwise operations
21
Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization
RowClone Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization Vivek Seshadri Put down “Vivek” Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, T. C. Mowry
22
Memory Channel – Bottleneck
Limited Bandwidth Memory Core Cache Channel MC Core High Energy
23
Goal: Reduce Memory Bandwidth Demand
Core Cache Channel MC Core Reduce unnecessary data movement
24
Bulk Data Copy and Initialization
src dst Bulk Data Copy Bulk Data Initialization dst val
25
Bulk Copy and Initialization – Applications
Zero initialization (e.g., security) Forking Checkpointing VM Cloning Deduplication Page Migration Many more
26
Shortcomings of Existing Approach
High Energy (3600nJ to copy 4KB) Core Cache dst Channel MC src Core 1 micro second even with pipelining… High latency (1046ns to copy 4KB) Interference
27
Our Approach: In-DRAM Copy with Low Cost
X High Energy Core Cache dst Channel MC ? src Core X High latency X Interference
28
RowClone: In-DRAM Copy
29
Bulk Copy in DRAM – RowClone
½VDD +δ ½VDD VDD Data gets copied 1 ½VDD
30
Fast Parallel Mode – Benefits
Bulk Data Copy (4KB across a module) 11X 74X Latency Energy 1046ns to 90ns 3600nJ to 40nJ No bandwidth consumption Very little changes to the DRAM chip
31
Fast Parallel Mode – Constraints
Location constraint Source and destination in same subarray Size constraint Entire row gets copied (no partial copy) Can still accelerate many existing primitives (copy-on-write, bulk zeroing) 1 Alternate mechanism to copy data across banks (pipelined serial mode – lower benefits than Fast Parallel) 2
32
End-to-end System Design
Software interface memcpy and meminit instructions Managing cache coherence Use existing DMA support! Maximizing use of Fast Parallel Mode Smart OS page allocation
33
Results Summary
34
Goals for This Lecture Understand DRAM technology
How it is built? How it operates? What are the trade-offs? Can we use DRAM for more than just storage? In-DRAM copying In-DRAM bitwise operations
35
Triple Row Activation C(A + B) + ~C(AB) Final State AB + BC + AC ½VDD
dis en ½VDD
36
Throughput Results L1 L2 L3 Memory
37
Goals for This Lecture Understand DRAM technology
How it is built? How it operates? What are the trade-offs? Can we use DRAM for more than just storage? In-DRAM copying In-DRAM bitwise operations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.