Memory Arithmetic Unit Interface Jason M. Meier Justin S. Teller Tom J. Keeley.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Memory.
PIPELINE AND VECTOR PROCESSING
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
Computer Organization and Architecture
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
General information Course web page: html Office hours:- Prof. Eyal.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Operating System Support Focus on Architecture
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Chapter 12 Pipelining Strategies Performance Hazards.
Multiprocessing Memory Management
Memory Organization.
Chapter 9 Virtual Memory Produced by Lemlem Kebede Monday, July 16, 2001.
Modified from Silberschatz, Galvin and Gagne Lecture 15 Chapter 8: Main Memory.
Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.
Microprocessors Introduction to RISC Mar 19th, 2002.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer Organization and Architecture
GCSE Computing - The CPU
VIRTUAL MEMORY. Virtual memory technique is used to extents the size of physical memory When a program does not completely fit into the main memory, it.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Memory System Unit-IV 4/24/2017 Unit-4 : Memory System.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
CE Operating Systems Lecture 14 Memory management.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation.
The original MIPS I CPU ISA has been extended forward three times The practical result is that a processor implementing MIPS IV is also able to run MIPS.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Computer Architecture System Interface Units Iolanthe II in the Bay of Islands.
Computer operation is of how the different parts of a computer system work together to perform a task.
Question What technology differentiates the different stages a computer had gone through from generation 1 to present?
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
1 Lecture 8: Virtual Memory Operating System Fall 2006.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
COSC 3330/6308 Second Review Session Fall Instruction Timings For each of the following MIPS instructions, check the cycles that each instruction.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
GCSE Computing - The CPU
Computer Organization
Chapter 2 Memory and process management
Chapter 8: Main Memory.
Central Processing Unit
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 8 11/24/2018.
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 12/1/2018.
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Translation Buffers (TLB’s)
Operating Systems Lecture 3.
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 4/5/2019.
GCSE Computing - The CPU
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.
Computer Architecture Assembly Language
Presentation transcript:

Memory Arithmetic Unit Interface Jason M. Meier Justin S. Teller Tom J. Keeley

Memory Controller Current Paradigm Task 1 CPU: Task 2 MEMORY: CPU MEMORY CTRL: DRAM System Done: Task 1

Active Pages Implementation Used Configurable DRAM - RADRAM Reconfigurable logic implements various memory functions “Active Page” consists of a page of data and a set of associated functions Works on individual DRAM chips Processor-centric and Memory-centric partitioning * Active Pages - Oskin, Chong, Sherwood – ISCA ‘98

MAUI Implementation Task 1 CPU: MEMORY: CPU MEMORY CTRL/MAUI: Task 1 DRAM System Task 2 MAUI Memory Controller MAU Done: Task 1

1) CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus. 2) MC interprets command and places a Read command in the transaction queue. 3) DRAM performs read. 4) Result is stored in appropriate register in the MAUI register file. MAUI Instruction Set LOAD REG CPU: DRAM: R MC/MAUI: DRAM System MAUI Memory Controller MAU MAUI_LD,offset( )

1) CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus. 2) MC interprets command and places integer in the appropriate register in the MAUI register file. MAUI Instruction Set II LOADI REG CPU: DRAM: MC/MAUI: DRAM System MAUI Memory Controller MAU MAUI_LDI,

1) CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty. 2) CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus. 3) MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue. 4) Step 3 repeats for the length of the array. MAUI Instruction Set III MAU_ADD CPU: DRAM: W MC/MAUI: MAUI_ADD,,, CPU DRAM System MAUI Memory Controller MAU RRW 4

Issues: Read & Write Locks

Issues: Address Mapping TLB Virtual Space Physical Space Memory that is Contiguous in Virtual Space may not be Contiguous in Physical Space MAUI assumes consecutive addressing (size register) MAUI operations which cross page boundaries must be split into separate operations for each page Programmer will not know mapping scheme Result: All MAUI operations will need to be privileged instructions, accessed by programs through a system call.

The compiler will be responsible for deciding when MAUI instructions should be used. This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI. Issues: Compiler Issues

Issues: Task Interrupts Task 1 CPU: Task 2 MEMORY: CPU MEMORY CTRL/MAUI: Task 1 DRAM System Task 2 MAUI Memory Controller MAU

Memory maui_ld r1, 0 Transaction Queue BIU maui_ld r1, 0 Example: maui_add I Memory Controller

Memory maui_ld r2, 5 Example: maui_add II Transaction Queue Memory Controller BIU

Memory maui_ld r3, 10 Example: maui_add III Transaction Queue Memory Controller BIU

Memory maui_ld r4, 2 Example: maui_add IV Transaction Queue Memory Controller BIU

Memory maui_add r3, r1, r2 R, 0 R, 5 maui_add r3, r1, r2 Example: maui_add V Transaction Queue Memory Controller BIU

Memory Read 10 D1[0] maui_add r3, r1, r2* Example: maui_add VI Transaction Queue Memory Controller BIU

Memory D2[0] Read 10 maui_add r3, r1, r2* Example: maui_add VII Transaction Queue Memory Controller BIU

Memory R, 1 R, 6 W,10, D1[0]+D2[0] Read 10 maui_add r3, r1, r2* Example: maui_add VIII Transaction Queue Memory Controller BIU

Memory Write 6, D D1[1] maui_add r3, r1, r2* Example: maui_add IX Transaction Queue Memory Controller BIU

Memory D2[1] Write 6, D maui_add r3, r1, r2* Example: maui_add X Transaction Queue Memory Controller BIU

Memory Next Instruction W,10, D1[1]+D2[1] Example: maui_add XI Transaction Queue Memory Controller BIU

Advantages & Disadvantages Advantages Better performance for DRAM latency bound computations Lower latency to DRAM compared to CPU Reduced traffic on front-side bus Concurrent execution Disadvantages MAUI operates at a lower clock frequency Increased compiler complexity Increased fabrication costs (More Logic = More $$) Recently used data may not be cached

Alternative Implementation MAUI Occupies its Own Read & Write Bus CPU DRAM System MAUI MAU Memory Controller MAUI Read & Write Bus Eliminate Contention with CPU for DRAM system resources. Create Circular Data flow resulting in increased performance Need Specialized Triple-Ported DRAM system leading to increased production costs üGOOD X BAD

Simulated on SimpleScalar version 4.0 One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses. Found up to a 43% speedup! Test Setup

Results Total CPU Cycles

Future Enhancements I DRAM System MAUI Memory Controller MAUS MAU Multi-tasking Task 1 CPU: Task 2 MEMORY: MEMORY CTRL/MAUI: Task 1 Task 2 Task 3 Larger Register File More MAUs for Parallelism Small Cache

Future Enhancements II MAU_ADD CPU: DRAM: W MC/MAUI: Better Pipelining RRWRRRRRRWW DRAM System MAUI Memory Controller MAU Larger Register File to Hold Intermediate Results