Download presentation
Presentation is loading. Please wait.
1
Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong
2
Outline Memory System Overview Related work Experiment setup Page level access measurements Solution Expected Speedup
3
Processor-Memory Gap µProc 60% / year. Doubles every 1.5years DRAM 9% / year. Doubles every 10 years Processor-Memory Performance Gap: Grows 50% / year http://www.e-insite.net/ednmag
4
Memory Access Time Core L1 L2 MC DRAM CPU Access Time (cycles) L13 L28 DRAM181 Data for 1.8GHz Opteronwww.aceshardware.com/
5
Large Size Memory Accesses Applications –Initialization –Data Movement –Stream operations Operating System –Task Creation –System Calls –Page Allocation, Management Functions that would use them –Memset, Clear User –Memcpy, Copy from User, Copy To User
6
Experiment Setup Workstation based – 2.4 GHz P4 (Wonko) – 750MHz PIII (Majikthise) – 900 MHz P III (Jaleel) Bochs x86 emulator Operating System –Linux Kernel v 2.4.19 Applications –SPEC2000 Integer benchmarks using glibc-2.2.5
7
Memset : Count
8
Memset : Access Size
9
Memset : % Overhead
10
Memcpy: Count
11
Memcpy : Access Size
12
Memcpy: % Overhead
13
OS : Memset / Clear User Real-Time Plot Behavior over Time Frequency of operation Access Size Operation Duration Averages
14
OS : Memcpy / Copy User Real-Time Plot Behavior over Time Frequency of operation Access Size Operation Duration Averages
15
Page based Commands Set Page –A constant Copy Page –A B Page level Arithmetic operations –A B + C –A B - C
16
Page based Commands 4 kB DRAM SETPAGE ZERO, 0x04000
17
Page based Commands 4 kB DRAM SETPAGE ZERO, 0x04000 Cache 128 bytes
18
Page based Commands Issue 4 kB DRAM SETPAGE ZERO, 0x04000 Cache How do we ensure Memory and Cache Consistency? 128 bytes
19
How much data is actually in the cache ? Function% Hit Rate Boot + Halt % Hit Rate SPEC workload Memset7.23%0.23 Memcpy ( Source)7.8810.53% Memcpy (Destination)< 0.01 %
20
Page based Commands 4 kB DRAM SETPAGE ZERO, 0x04000
21
Page based Commands Issue SETPAGE ZERO, 0x04000 4 kB DRAM 4 kB DRAM level Page Fragmentation
22
Page based Commands Issue SETPAGE ZERO, 0x04000 4 kB DRAM 4 kB DRAM level Page Fragmentation Maximum number of rows a page can occupy is 2
23
Solution Hardware at Cache Level Ability to map s/w pages to h/w pages
24
Expected Speedup I Current Implementation EndAddr Addr + Length While ( Address < EndAddr) Mem[Address] SetValue Address Address + 1 Memset( Address, Length, SetValue) Proposed Implementation While (Length >= PageSize) SetPage (SetValue, Address) Length Length – PageSize Address Address + Length Call Memset ( Address, Length, SetValue)
25
Expected Speedup II Current Memset Time for a page : 4 s Expected Memset Time for a page = # Rows in a page * Time to read a Row + +Cache Coherence Logic + Misc = 2 * 100 ns + X = 200 ns + X
26
Related Work IRAM – On-chip DRAM –Advantage: bigger storage, eliminates much of the off-chip memory access, energy efficient –Disadvantage: not much performance increase, doesn’t work with conventional microprocessors Active page – bring computation to DRAM –break the memory into fixed page-size and add reconfigurable logic to DRAM Heap paper shows some memory accesses that can be eliminated entirely
27
Conclusion Page- based commands are necessary.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.