By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim

Slides:



Advertisements
Similar presentations
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Advertisements

Sungjun Kim Columbia University Edward A. Lee UC Berkeley
Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
A Case for Refresh Pausing in DRAM Memory Systems
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Chapter 9 Memory Basics Henry Hexmoor1. 2 Memory Definitions  Memory ─ A collection of storage cells together with the necessary circuits to transfer.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
Chapter 7 Interupts DMA Channels Context Switching.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Memory access scheduling Authers: Scott RixnerScott Rixner,William J. Dally,Ujval J. Kapasi, Peter Mattson, John D. OwensWilliam J. DallyUjval J. KapasiPeter.
An Efficient Programmable 10 Gigabit Ethernet Network Interface Card Paul Willmann, Hyong-youb Kim, Scott Rixner, and Vijay S. Pai.
Pipelining By Toan Nguyen.
Group 7 Jhonathan Briceño Reginal Etienne Christian Kruger Felix Martinez Dane Minott Immer S Rivera Ander Sahonero.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Memory Technology “Non-so-random” Access Technology:
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Survey of Existing Memory Devices Renee Gayle M. Chua.
1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 10/01/20151Input/Output.
Multicore In Real-Time Systems – Temporal Isolation Challenges Due To Shared Resources Ondřej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Timing Channel Protection for a Shared Memory Controller Yao Wang, Andrew Ferraiuolo, G. Edward Suh Feb 17 th 2014.
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Main Memory CS448.
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
A Study of Cyclops64 Crossbar Architecture and Performance Yingping Zhang April, 2005.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
Modes of transfer in computer
Analyzing Performance Vulnerability due to Resource Denial-Of-Service Attack on Chip Multiprocessors Dong Hyuk WooGeorgia Tech Hsien-Hsin “Sean” LeeGeorgia.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Computer Architecture Lecture 24 Fasih ur Rehman.
Processor Architecture
High-Performance DRAM System Design Constraints and Considerations by: Joseph Gross August 2, 2010.
AN ASYNCHRONOUS BUS BRIDGE FOR PARTITIONED MULTI-SOC ARCHITECTURES ON FPGAS REPORTER: HSUAN-JU LI 2014/04/09 Field Programmable Logic and Applications.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
IT3002 Computer Architecture
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
D ESIGN AND I MPLEMENTAION OF A DDR SDRAM C ONTROLLER FOR S YSTEM ON C HIP Magnus Själander.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.
RAM RAM - random access memory RAM (pronounced ramm) random access memory, a type of computer memory that can be accessed randomly;
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
15-740/ Computer Architecture Lecture 25: Main Memory
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
COMP541 Memories II: DRAMs
Reducing Hit Time Small and simple caches Way prediction Trace caches
Improving Memory Access 1/3 The Cache and Virtual Memory
A Requests Bundling DRAM Controller for Mixed-Criticality System
5.2 Eleven Advanced Optimizations of Cache Performance
Memory Organization.
15-740/ Computer Architecture Lecture 19: Main Memory
DRAM Hwansoo Han.
Presentation transcript:

PRET DRAM Controller: Bank Privatization for Predictability and Temporal Isolation By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim Presented by: Michael Guo

Outline DRAM background information Design Requirement DRAM controller architecture Frontend Backend Design Evaluation Experiment Further Improvement

DRAM Memory Architecture

DRAM Memory Architecture DRAM Cell Consists of a capacitor and a transistor Charge of Capacitor determines the value of the bit Leak charge over time DRAM Array Phase1: Row access followed by column access Phase2: Row access move one rows of array into row buffer

DRAM Memory Architecture DRAM Device and DRAM module Sharing of I/O gating and control logic Sharing of data/address/command buses

Challenges for embedded hard-real time system The platform should be timing predictable to make the verification of timing constraints possible. DRAM: The DRAM cells have to be refreshed periodically Different functions integrated on the platform should be temporally isolated. DRAM: The latency of one task has high dependency on the previous access to the memory.

PTARM PRET architecture Implemented the controller on the PTARM PRET architecture proposed in [4] Figure4: Integration of PTARM core with DMA units, PRET memory controller and dual-ranked DIMM

PTARM PRET architecture Thread-interleaved implementation of ARM instruction set Four pipeline stages and four hardware thread Each thread has access to an shared instruction scratchpad and data scratchpad Thread can interact with DRAM directly or transferring data to and from the scratchpad Summary: no dependencies between pipeline stages, and the execution time of each thread is independent from each other

DRAM Module General structure Proposed structure One resource Independent resources

DRAM Controller Backend Issues commands Implementation

Access Scheme

DRAM Controller Frontend Connects to the Processor PURPOSE: route requests to the right request buffer in the backend and to insert a sufficient amount of refresh commands

Scheduling of Refreshes 64ms timing constrain for the DRAM to refresh all the cells to restore the value Issue a hardware refresh command refresh several rows of a device at once ISSUE: increasing refresh latency Individual row accesses Each refresh accesses one row Advantage: single row access takes less time than the execution of a hardware refresh command Disadvantage: more manual row accesses have to be performed, lead a hit in bandwidth

Store Buffer Does not have to wait until the store has been performed in the memory Hide the store latency from the pipeline Single thread cycle if no preceded store Bigger store buffer could hid latencies of successive stores at the expense of increased complexity in timing analysis.

Latency of Load Instruction Backend latency: BEL Time a command spends in the command buffer before the RAS command has been sent out DRAM Read latency: DRL After RAS sent to the DRAM module, it takes time until the request data is on the data bus Constant, 10+BL/2 cycles (to isolate from other effect) Thread Alignment Latency: TAL Time that the data is buffered in the memory controller Requesting thread can only receive the data in its write-back stage Read latency: RL RL = BEL + DRL + TAL

Latency of Load Instruction By delaying refreshes until a load has been performed, they do not have an impact on the load latency.

Compare with Predator DRAM as one resource Use standard refresh mechanism Pre-computed sequences of SDRAM commands

Latency of DMA Transfer (Small size)

Latency of DMA Transfer (Large size)

EXPERIMENTAL EVALUATION PTARM simulator to interface with both controller to run synthetic benchmark to compare memory access latency Show the effects of interference on memory access latency with a combination of two programs on the other hardware threads.

Demonstrate the temporal isolation achieved by the PRET DRAM controller

Full load

Conclusion and Limitation The controller reduced the worst-case latency at a slight loss of bandwidth Proposed solution has the same amount of hardware thread as the pipeline stages. If there is many thread, there has to many memory partitions The consumption of bandwidth is not well analyzed

Integration with other multiprocessor Challenge Most multi-core processors use DRAM to share data, while local scratchpad are private Approach Consider the four separate resource as one. An access to the single resource would result in four smaller accesses to the resources

QUESTION