Addressing Service Interruptions in Memory with Thread-to-Rank Assignment Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah Niladrish Chatterjee, NVIDIA Jung-Sik Kim, Samsung Electronics 4/18/2016 ISPASS 2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
DRAM Refresh: Quick Recap DRAM cell leaks through access transistor Leakage increases with temperature DRAM cell must be Refreshed every 64ms 1/8K of the DRAM rank is refreshed every 7.8µs Bit Line Word DRAM Cell Leaks more with Temperature Leak 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Refresh Timing Parameters 7.8 ms or 3.9 ms tREFI tRFC tRFC tRFC 640 ns (32 Gb) tRefresh tRecovery 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
tRFC Projections 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Refresh determines memory peak power Refresh Power in DRAM Command Current (mA) Act 67 Read 125 Write Refresh 245 Refresh determines memory peak power 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Stagger refresh to reduce peak power Rank 1 Rank 2 Rank 3 Rank 4 MC 8-core CMP MC Channel 1 Channel 2 Stagger refresh to reduce peak power 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Effect of Staggered Refresh 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Talk Outline DRAM refresh background Goal: Low peak power of staggered refresh, performance of simultaneous refresh Analyzing stalls from refresh Solution: Thread-to-rank assignment Results 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Each Staggered Refresh Rank 1 Rank 2 Rank 3 Rank 4 Each Staggered Refresh stalls many cores MC 8-core CMP MC Channel 1 Channel 2 Stalled T1 R1 T2 R3 R2 T7 T3 T8 Stalled Thread Rank T1 R2 T2 R3 T1 R1 T2 R2 T2 R1 T3 R1 Rank 1 Refreshing => 3 Threads Stalled Rank 3 Refreshing => 3 Threads Stalled 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Limit the Spread- Address Mapping 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
% Refreshes Affecting a Thread Highest Performance Loss 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
37% increase in Execution Time Highest Performance Loss 37% increase in Execution Time 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Rank Assigned Page Mapping Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Rank 1 Rank 3 Rank 2 Rank 4 8-core CMP MC MC Channel 1 Channel 2 Strict mapping of threads to ranks. e.g., used for cache partitioning by Lin et al., HPCA 2008 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Limit the Spread- Page Mapping Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Rank 1 Rank 3 Rank 2 Rank 4 MC 8-core CMP MC Channel 1 Channel 2 Relaxed mapping of threads to ranks. 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Modified Clock Algorithm P List of Pages in Memory P P P P P P P P P P P Baseline Hand 1 2 3 4 Modified List of Pages in Ranks Hands 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Methodology Simics + USIMM DRAM Specifications 8 RISC cores, UltraSPARC III ISA 3.2 GHz, 4-wide OoO, 64-entry RoB 32 KB I&D L1 caches, 4 cycles 4/8 MB shared L2 cache, 10 cycles DRAM Specifications 2 Channels, 2 Ranks per Channel, 16 Banks per Rank 800MHz DDR4 DRAM SPEC 2006, NPB, and Cloudsuite, Parsec 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
18% better than Staggered Refresh Thread-to-rank Assignment 18% better than Staggered Refresh 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Relaxing Rank Assignment 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Comparisons to Prior Work 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Conclusions Exposes an important artifact in memory stalls Service interruptions require a re-evaluation of data placement RA (rank assignment) is a simple solution for an emerging problem RA can also be leveraged to reduce the impact of NVM write drain RA is a software solution that only requires best-effort page mapping Outperforms hardware-only schemes 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment
Thank You 4/18/2016 Addressing Service Interruptions in Memory with Thread to Rank Assignment