Download presentation
Presentation is loading. Please wait.
1
Rajeev Balasubramonian
CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories Rajeev Balasubramonian Andrew B. Kahng Naveen Muralimanohar Ali Shafiee Vaishnav Srinivas
2
Main Memory Matters Software Architecture Technology
In-Memory DBs, Key-Value Stores Graph Algorithms, Deep Learning Software Commodity CPUs, Accelerators Shift in bottlenecks Example innovations: NDP, DDR to GDDR5 3x TOPS in TPU Architecture Technology DDR4, HMC, HBM, NVM The Innovation Hub is Moving to Memory
3
Two Silos CACTI 7 can be used out-of-the-box when defining memory parameters for traditional memory systems CACTI 7 primitives can be leveraged to model and evaluate new memory architectures
4
Talk Outline CACTI for the main memory
Inputs/outputs The nuts and bolts Modeling I/O power Design space exploration Case studies: two novel architectures Cascaded Channels Narrow Channels
5
CACTI for Memory Cost Table Exhaustive Search Bandwidth Table
Capacity Cost Table #channels, ECC vs. Not Bandwidth Table DRAM Type: DDR3,DDR4 Power Parameters Access Pattern: bw, row buffer hits, Rd/Wr ratio Channel Configs Energy per access Inputs and outputs
6
Cost and capacity relationship is not linear
DIMM Cost Cost factors: technology, capacity, support for ECC, max bandwidth, vendor Aggregated costs from online sources Cost is volatile and should be updated periodically Cost in dollars 4GB 8GB 16GB 32GB 64GB DDR3 UDIMM 40 76 RDIMM 42 64 122 304 LRDIMM 211 287 1079 DDR4 26 46 33 60 126 310 279 331 1474 Cost and capacity relationship is not linear
7
Bandwidth Bandwidth depends on load, voltage, and DIMM type 1DPC (MHz)
DDR3 UDIMM-DR 533 667 RDIMM-DR 800 RDIMM-QR LRDIMM-QR 1.2V DDR4 1066 933
8
Power Modeling Extending CACTI-I/O DDR4 and SerDes support added
SerDes parameters from literature for different lengths/speeds For parallel buses, support for more accurate termination power with HSPICE simulations Different termination models for each bus type Different frequency, DIMMs per channel On-DIMM and on-board Different range (short or long)
9
Interconnect Model API
10
Power Analysis (DDR3)
11
Power Analysis (DDR4)
12
Cost and Bandwidth Analysis
Highest possible BW for the demanded capacity Lowest possible cost for the demanded capacity
13
Two Case Studies Key Observations New Idea I: Cascaded Segments
High DPC less BW More channels high bw and low cost New Idea I: Cascaded Segments Each segment has few DIMMs higher BW New Idea II: Narrow Channels Partition the channel into many parallel channels Fewer DIMMs per data wire, new ECC higher BW Lower power on DIMM
14
Cascaded Channels Same DPC, higher BW Same BW, lower cost CPU RoB
DIMM CPU RoB 533 MHz 667MHz 667MHz Relay on Board chip Same BW, lower cost 64 GB CPU 32 GB RoB 667 MHz 667MHz 667MHz one memory cycle increase in latency
15
Unbalanced channel Load
Hybrid Memory NVM is slow Software optimized to access DRAM more Unbalanced channel Load balanced channel Load D CPU N One Channel DRAM One Channel NVM Frontend DRAM Backend NVM
16
Narrow Channels Higher Bandwidth but Higher Latency
Command/Address Bus is shared between channels Higher Bandwidth but Higher Latency Lower frequency/power for DRAM Chips! ECC on DIMM and CRC for link to reduce bw
17
Methodology Trace-based simulation Trace fed to USIMM
Memory-intensive Benchmarks (NPB and SPEC2006) Trace generated by Simics 8-core at 3.2 GHz L1D = 32KB, L1I = 32KB, L2 = 8MB Power CACTI 7
18
Cascaded Channels DDR3 DDR4 25% higher BW 13% higher BW 22% higher IPC
19
Cascaded Latency
20
Cascaded Power: DRAM Cartridge
533 MHz 70% utilization 667MHz 70% utilization 667MHz 35% utilization CPU CPU DIMM BoB I/O Total Power/BW Baseline 23.2W 5.5W 9.4W 38.1W 7.9 (nJ/B) Cascaded 22.6W 6.4W 12.2W 41.2W 6.7 (nJ/B)
21
Cascaded Cost
22
Cascaded Hybrid Percentage of Load on DRAM
23
Narrow Channel: Performance
Performance Improvement: 2-channel-x36 18% 3-channel-x24 17%
24
Narrow Channel: Power 23% overall memory power reduction
25
Conclusion CACTI 7: models off-chip memories and I/O
Detailed I/O power model Design space exploration Analyzes trade-offs: capacity, power, bandwidth, and cost Two novel architectures Cascaded channels Narrow channels
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.