02/21/2003 CART 1 On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen.

Slides:

Advertisements

Similar presentations

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.

Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.

Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.

A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U.

HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs Javier Lira ψ Carlos Molina ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs.

1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.

1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)

Non-Uniform Power Access in Large Caches with Low-Swing Wires Aniruddha N. Udipi with Naveen Muralimanohar*, Rajeev Balasubramonian University of Utah.

ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

CS 7810 Lecture 3 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.

The Auction: Optimizing Banks Usage in Non-Uniform Cache Architectures Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research.

1 Lecture 11: Large Cache Design Topics: large cache basics and… An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al.,

Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.

1 Lecture 7: Caching in Row-Buffer of DRAM Adapted from “A Permutation-based Page Interleaving Scheme: To Reduce Row-buffer Conflicts and Exploit Data.

Skewed Compressed Cache

University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.

Computing Systems Memory Hierarchy.

TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.

Non-Uniform Cache Architectures for Wire Delay Dominated Caches Abhishek Desai Bhavesh Mehta Devang Sachdev Gilles Muller.

1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

CMP 301A Computer Architecture 1 Lecture 3. Outline zQuick summary zMultilevel cache zVirtual memory y Motivation and Terminology y Page Table y Translation.

1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.

WMPI 2006, Austin, Texas © 2006 John C. Koob An Empirical Evaluation of Semiconductor File Memory as a Disk Cache John C. Koob Duncan G. Elliott Bruce.

Chapter Twelve Memory Organization

Main Memory CS448.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research.

WMPI 2006, Austin, Texas © 2006 John C. Koob An Empirical Evaluation of Semiconductor File Memory as a Disk Cache John C. Koob Duncan G. Elliott Bruce.

Magnetic Random Access Memory Jonathan Rennie, Darren Smith.

Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.

Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.

By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

ASPLOS’02 Presented by Kim, Sun-Hee.  Technology trends ◦ The rate of frequency scaling is slowing down  Performance must come from exploiting concurrency.

ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009.

1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.

LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research.

1 Scaling Soft Processor Systems Martin Labrecque Peter Yiannacouras and Gregory Steffan University of Toronto FCCM 4/14/2008.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

CS 704 Advanced Computer Architecture

ESE532: System-on-a-Chip Architecture

18-447: Computer Architecture Lecture 23: Caches

Zhichun Zhu Zhao Zhang ECE Department ECE Department

Multilevel Memories (Improving performance using alittle “cash”)

Cache Memory Presentation I

Morgan Kaufmann Publishers Memory & Cache

Lu Peng, Jih-Kwon Peir, Konrad Lai

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

ECE 445 – Computer Organization

CSCI206 - Computer Organization & Programming

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Information Storage and Spintronics 14

Adapted from slides by Sally McKee Cornell University

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Chapter 4 Multiprocessors

Cache - Optimization.

Memory & Cache.

Presentation transcript:

02/21/2003 CART 1 On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen W. Keckler, and Doug Burger Computer Architecture and Technology Lab University of Texas at Austin

02/21/2003 CART 2 Motivation Latency to off-chip memory hundreds of cycles Off-chip memory bandwidth becoming a performance limiting factor MRAM – Emerging memory technology with high bandwidth and low latency Goal of our work - To determine if the performance advantage of MRAM in high performance computing is worth more investment and research MRAM has the potential to provide low latency, high bandwidth

02/21/2003 CART 3 Outline MRAM Memory Description MRAM Memory Hierarchy Results Conclusions

02/21/2003 CART 4 MRAM Cell Magnetoresistive random access memory (MRAM) uses the magnetic tunnel junction (MTJ) to store information MRAM cell composed of a diode and an MTJ stack MTJ stack consists of two ferromagnetic layers separated by a thin dielectric barrier Polarization of one layer fixed, other used for information storage Bit Line Diode MTJ Stack Word Line Pt Co/F e Ni/F e Al 2 O 3 Co/F e Ni/F e Mn/ Fe Pt W Read/Write Current

02/21/2003 CART 5 MRAM Bank Design MRAM cells located at the intersection of each word and bit line Read – Connect current sources to bit lines and selected wordline is pulled low Writes – Polarity of current in the bit lines decides value stored MRAM banks accessed using vias

02/21/2003 CART 6 MRAM Bank Modeling Modified CACTI-3.0 to develop an area and timing tool to model MRAM banks Independently accessible composed of sub- banks Important features –Active area consumed –Delay due to vertical wires –MRAM capacity for a given die size and cell size –Support for multiple layers with sharing SIA 2001 roadmap at 90 nm technology

02/21/2003 CART 7 Chip-Level Architecture

02/21/2003 CART 8 MRAM Design Issues Number of Banks –More banks : Low latency, higher concurrency, higher network traversal time, higher miss rates Cache Line Size –Larger line size : More spatial locality, higher latency Page Placement Policy –Random –Round-robin –Least loaded

02/21/2003 CART 9 Methodology Simulated Processor –Alpha pipeline modified for 8 wide issue –3.8 GHz (10 FO4 inverters per stage) Base SDRAM System –Distributed L2 cache Base MRAM system –Distributed MRAM banks and reduced capacity distributed L2 cache Benchmarks –Memory intensive SPEC CPU2000, Scientific, Speech

02/21/2003 CART 10 Page Placement Policy IPC for 100 banks with different page placement policies Cost Least-Loaded = (L2 Hit Rate * L2 Hit Latency) + (L2 Miss Rate * MRAM Bank Latency) + Current Network Latency to Bank

02/21/2003 CART 11 MRAM Sensitivity MRAM Latency Sensitivity SDRAM Latency : 30 ns

02/21/2003 CART 12 Conclusions Developed an architectural model for exploiting an emerging memory technology, MRAM Analyzed the contribution to performance of the different components in our MRAM system MRAM system performs 15 % than conventional SDRAM