Download presentation
Presentation is loading. Please wait.
1
Memory Arithmetic Unit Interface Jason M. Meier Justin S. Teller Tom J. Keeley
2
Memory Controller Current Paradigm Task 1 CPU: Task 2 MEMORY: CPU MEMORY CTRL: DRAM System Done: Task 1
3
Active Pages Implementation Used Configurable DRAM - RADRAM Reconfigurable logic implements various memory functions “Active Page” consists of a page of data and a set of associated functions Works on individual DRAM chips Processor-centric and Memory-centric partitioning * Active Pages - Oskin, Chong, Sherwood – ISCA ‘98
4
MAUI Implementation Task 1 CPU: MEMORY: CPU MEMORY CTRL/MAUI: Task 1 DRAM System Task 2 MAUI Memory Controller MAU Done: Task 1
5
1) CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus. 2) MC interprets command and places a Read command in the transaction queue. 3) DRAM performs read. 4) Result is stored in appropriate register in the MAUI register file. MAUI Instruction Set LOAD REG CPU: DRAM: R MC/MAUI: DRAM System MAUI Memory Controller MAU 1 2 3 4 1 2 3 4 MAUI_LD,offset( )
6
1) CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus. 2) MC interprets command and places integer in the appropriate register in the MAUI register file. MAUI Instruction Set II LOADI REG CPU: DRAM: MC/MAUI: DRAM System MAUI Memory Controller MAU 1 2 1 2 MAUI_LDI,
7
1) CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty. 2) CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus. 3) MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue. 4) Step 3 repeats for the length of the array. MAUI Instruction Set III MAU_ADD CPU: DRAM: W MC/MAUI: 1 2 4 MAUI_ADD,,, CPU DRAM System MAUI Memory Controller MAU 1 2 3 3 RRW 4
8
Issues: Read & Write Locks
9
Issues: Address Mapping TLB Virtual Space Physical Space Memory that is Contiguous in Virtual Space may not be Contiguous in Physical Space MAUI assumes consecutive addressing (size register) MAUI operations which cross page boundaries must be split into separate operations for each page Programmer will not know mapping scheme Result: All MAUI operations will need to be privileged instructions, accessed by programs through a system call.
10
The compiler will be responsible for deciding when MAUI instructions should be used. This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI. Issues: Compiler Issues
11
Issues: Task Interrupts Task 1 CPU: Task 2 MEMORY: CPU MEMORY CTRL/MAUI: Task 1 DRAM System Task 2 MAUI Memory Controller MAU
12
Memory maui_ld r1, 0 Transaction Queue BIU maui_ld r1, 0 Example: maui_add I Memory Controller
13
Memory maui_ld r2, 5 Example: maui_add II Transaction Queue Memory Controller BIU
14
Memory maui_ld r3, 10 Example: maui_add III Transaction Queue Memory Controller BIU
15
Memory maui_ld r4, 2 Example: maui_add IV Transaction Queue Memory Controller BIU
16
Memory maui_add r3, r1, r2 R, 0 R, 5 maui_add r3, r1, r2 Example: maui_add V Transaction Queue Memory Controller BIU
17
Memory Read 10 D1[0] maui_add r3, r1, r2* Example: maui_add VI Transaction Queue Memory Controller BIU
18
Memory D2[0] Read 10 maui_add r3, r1, r2* Example: maui_add VII Transaction Queue Memory Controller BIU
19
Memory R, 1 R, 6 W,10, D1[0]+D2[0] Read 10 maui_add r3, r1, r2* Example: maui_add VIII Transaction Queue Memory Controller BIU
20
Memory Write 6, D D1[1] maui_add r3, r1, r2* Example: maui_add IX Transaction Queue Memory Controller BIU
21
Memory D2[1] Write 6, D maui_add r3, r1, r2* Example: maui_add X Transaction Queue Memory Controller BIU
22
Memory Next Instruction W,10, D1[1]+D2[1] Example: maui_add XI Transaction Queue Memory Controller BIU
23
Advantages & Disadvantages Advantages Better performance for DRAM latency bound computations Lower latency to DRAM compared to CPU Reduced traffic on front-side bus Concurrent execution Disadvantages MAUI operates at a lower clock frequency Increased compiler complexity Increased fabrication costs (More Logic = More $$) Recently used data may not be cached
24
Alternative Implementation MAUI Occupies its Own Read & Write Bus CPU DRAM System MAUI MAU Memory Controller MAUI Read & Write Bus Eliminate Contention with CPU for DRAM system resources. Create Circular Data flow resulting in increased performance Need Specialized Triple-Ported DRAM system leading to increased production costs üGOOD X BAD
25
Simulated on SimpleScalar version 4.0 One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses. Found up to a 43% speedup! Test Setup
26
Results Total CPU Cycles
27
Future Enhancements I DRAM System MAUI Memory Controller MAUS MAU Multi-tasking Task 1 CPU: Task 2 MEMORY: MEMORY CTRL/MAUI: Task 1 Task 2 Task 3 Larger Register File More MAUs for Parallelism Small Cache
28
Future Enhancements II MAU_ADD CPU: DRAM: W MC/MAUI: Better Pipelining RRWRRRRRRWW DRAM System MAUI Memory Controller MAU Larger Register File to Hold Intermediate Results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.