Download presentation
Presentation is loading. Please wait.
Published byLydia Clerk Modified over 9 years ago
1
Pooja ROY, Manmohan MANOHARAN, Weng Fai WONG National University of Singapore ESWEEK (CASES) October 2014 EnVM : Virtual Memory Design for New Memory Architectures
2
New Memory Architectures NVMs (STT-RAM, MRAM, etc.) –Energy efficient –Higher density –High write latency (3x slower than reads) –Low write endurance Solution Hybrid Memories 2International Conference on Compilers, Architectures and Synthesis of Embedded Systems NVM SRAM/ DRAM
3
Hybrid Caches SRAM + STT-RAM hybrid design Data allocation –Reducing writes to NVM partition –Redirecting write intensive data to SRAM partition Performance Impact –Data movement between partitions is expensive Energy Impact –High writes to NVM might offset energy savings 3International Conference on Compilers, Architectures and Synthesis of Embedded Systems
4
Motivation Different solutions (previous works) for each level of memory –Not co-operative. Conflicting. –Not holistic for hybrid memory hierarchy 4 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
5
Motivation ♯ Stack data layout for hybrid L1 cache (Li et.al. ISLPED’12) ♯ Reuse distance based data allocation for hybrid L2 cache (Chen et.al. LCTES’12) d c b a x1 x2 x3 x4 a d b c x1 x2 x3 x4 Write reuse sequence Read reuse sequence Write intensive Read intensive Stack 5 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
6
Motivation Different solutions (previous works) for each level of memory –Not co-operative. Conflicting. –Not holistic for hybrid memory hierarchy Hardware solutions heavy modifications and energy overheads Software solutions partial support or profile based techniques 6 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
7
Our Approach - EnVM Makes use of virtual memory to provide for all hybrid memory hierarchy Handles static and dynamic data, no profiling required Utilizes existing hardware Advocates migration-less cache design 7 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
8
EnVM 8 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
9
Static Analysis a = p b = q a = a-5b = b*2 c = p+q d = p-q b1 b2b3 b4 a (0,1) p (1,0) b (0,1) q (1,0) c (0,1) d (0,1) b (1,2) a (1,2) p (2,0) q (2,0)q (3,0) p (3,0) (variable, read count, write count) Abstract interpretation based dataflow analysis Heuristic estimate of memory access intensity 9 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
10
Read Intensive Write intensive Static Analysis c (0,1) d (0,1) b (1,2) a (1,2) q (3,0) p (3,0) Clustering based on unsupervised machine learning algorithm Classification to 4 classes and then to 2 partition Read intensive allocated to STT-RAM partition Write intensive allocated to SRAM partition Classes Low Read – Low Write Low Read – High Write High Read – Low Write High Read – High Write STT-RAM SRAM 10 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
11
Memory Access Types Variables show high read OR write affinity 11 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
12
Dynamic Memory Hard to analyze Exposed to programmer Dynamic memory library support –Enable dual heap structure Two distinct system calls ( r_malloc, w_malloc ) 12 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
13
EnVM Layout Existing virtual memory layoutProposed virtual memory layout X86 Segment registers do boundary checking Minimum modification to fit other architectures Allocating the data from each segment to either STT-RAM or SRAM 13 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
14
Evaluation Comparison –Hardware only method (HW) on hybrid L1 –Software method based on stack layout (SW1) on hybrid L1 –Software method based on reuse distance (SW2) on hybrid L2 –Our method on hybrid L1 (EnVM) MARSSx86 Cycle Accurate Simulator Processor : Unicore, 3GHz, Commit Width - 4 Memory - Hybrid L1 Design L1 I-Cache (SRAM)64K, 64B Line, 3 Cycles L1 D-Cache (Hybrid)SRAM : 4K, 4-way, 3 Cycles STTRAM : 64K, 4-way, Read - 3 Cycle, Write - 10 Cycles L2 (SRAM)2M, 8-way, 15 Cycles, 64B Lines Memory - Hybrid L2 Design L1 I-Cache (SRAM)64K, 8-way, 3 Cycles, 64B Line L1 D-Cache (SRAM)32K, 8-way, 3 Cycles, 64B Line L2 (Hybrid)SRAM : 1M, 4-way, 3 Cycles STTRAM : 2M, 8-way, Read - 11 Cycle, Write - 30 Cycles 14 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
15
Write Reduction Normalized to HW Reduces 47.6% (HW) & 15% (SW1) 15 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
16
Energy Savings Normalized to pure SRAM configuration Max. energy reduction 50% for 458.sjeng Reduces 21% (HW) & 6% (SW1) 16 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
17
Performance Impact Normalized to pure SRAM configuration Comparable IPC Write latency is offset by bigger cache capacities 17 of 60International Conference on Compilers, Architectures and Synthesis of Embedded Systems
18
Summary Holistic management of process memory to aid hybrid memory hierarchy Reduces writes - 47.6% (HW) & 15% (SW1) Reduces energy - 21% (HW) & 6% (SW1) Minimum hardware modification No profiling of applications No migration of data Improvements –Dynamic memory management 18International Conference on Compilers, Architectures and Synthesis of Embedded Systems
19
Thank You 19International Conference on Compilers, Architectures and Synthesis of Embedded Systems
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.