RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization ProcessorMemoryChannel Limited bandwidthHigh energy Carnegie Mellon UniversityIntel.

Slides:



Advertisements
Similar presentations
Ahmad Lashgar, Amirali Baniasadi, Ahmad Khonsari ECE, University of Tehran, ECE, University of Victoria.
Advertisements

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Evaluating an Adaptive Framework For Energy Management in Processor- In-Memory Chips Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas.
Understanding a Problem in Multicore and How to Solve It
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
STT-RAM as a sub for SRAM and DRAM
Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories HanBin Yoon, Justin Meza, Naveen Muralimanohar*, Onur Mutlu,
Energy-efficient Cluster Computing with FAWN: Workloads and Implications Vijay Vasudevan, David Andersen, Michael Kaminsky*, Lawrence Tan, Jason Franklin,
The energy limited connectionless networking architecture.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Page-based Commands for DRAM Systems Aamer Jaleel Brinda Ganesh Lei Zong.
For Fast Block Copy in Dram 1 Fast Block Copy in DRAM  Motivation Exploit the wide bandwidth within DRAM chip  Idea Use DRAM refresh period to.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
Scalable Many-Core Memory Systems Lecture 2, Topic 1: DRAM Basics and DRAM Scaling Prof. Onur Mutlu HiPEAC.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
Green Jobs. A “green job”  Has a direct, essential impact on a product, service or process that results in an environmental benefit.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
Embedded System Lab. 최 길 모최 길 모 Kilmo Choi Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
Инвестиционный паспорт Муниципального образования «Целинский район»
(x – 8) (x + 8) = 0 x – 8 = 0 x + 8 = x = 8 x = (x + 5) (x + 2) = 0 x + 5 = 0 x + 2 = x = - 5 x = - 2.
Computer Architecture: Emerging Memory Technologies (Part I)
Sponsored by the U.S. Department of Defense © 2008 by Carnegie Mellon University page 1 Pittsburgh, PA The Implications of a Single Mobile Computing.
Alice An introduction to programming. History  First created as a project by a student group  Carnegie Mellon University Pittsburgh  Free to anyone.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,
Carnegie Mellon Introduction to Computer Systems / Spring 2009 March 23, 2009 Virtual Memory.
Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University.
pictures_slideshow/article.htm.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Boris Grot, Joel Hestness, Stephen W. Keckler 1 The University of Texas at Austin 1 NVIDIA Research Onur Mutlu Carnegie Mellon University.
DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.
照片档案整理 一、照片档案的含义 二、照片档案的归档范围 三、 卷内照片的分类、组卷、排序与编号 四、填写照片档案说明 五、照片档案编目及封面、备考填写 六、数码照片整理方法 七、照片档案的保管与保护.
공무원연금관리공단 광주지부 공무원대부등 공적연금 연계제도 공무원연금관리공단 광주지부. 공적연금 연계제도 국민연금과 직역연금 ( 공무원 / 사학 / 군인 / 별정우체국 ) 간의 연계가 이루어지지 않고 있 어 공적연금의 사각지대가 발생해 노후생활안정 달성 미흡 연계제도 시행전.
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Cache Memories CENG331 - Computer Organization Instructors:
Жюль Верн ( ). Я мальчиком мечтал, читая Жюля Верна, Что тени вымысла плоть обретут для нас; Что поплывет судно громадней «Грейт Истерна»; Что.
מאת: יעקב דדוש. פיסול –בין יחיד לרבים יחידה 1 לתלמיד המתבונן לפניך שתי יצירות פיסוליות. התבונן וכתוב (בשקופית הבאה) מהם ההבדלים בין הפסלים המוצגים לפניך?
Simple DRAM and Virtual Memory Abstractions for Highly Efficient Memory Systems Thesis Oral Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Phillip.
Genome Read In-Memory (GRIM) Filter: Fast Location Filtering in DNA Read Mapping using Emerging Memory Technologies Jeremie Kim 1, Damla Senol 1, Hongyi.
18-742: Research in Parallel Computer Architecture Intro and Logistics Prof. Onur Mutlu Carnegie Mellon University Fall 2014 August 26, 2014.
Computer System Structures Storage
Memory Scaling: A Systems Architecture Perspective
OCR GCSE Computer Science Teaching and Learning Resources
Cache Memories CSE 238/2038/2138: Systems Programming
18-740: Computer Architecture Recitation 3:
Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Kevin Chang Prashant Nair, Donghyuk Lee, Saugata Ghose, Moinuddin.
Lecture 23: Cache, Memory, Security
Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.
Computer Architecture 2
Today, DRAM is just a storage device
Ambit In-memory Accelerator for Bulk Bitwise Operations
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
מערכות הפעלה מרצה אורח: סיון טולדו סמסטר א' תשע"ב
Session 1A at am MEMCON Detecting and Mitigating
MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen,
PEO Star with hydrophobic core
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.
Energy-Efficient Storage Systems
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011

Lecture 22: Cache Hierarchies, Memory
Mapping the FFT Algorithm to the IBM Cell Processor
RAIDR: Retention-Aware Intelligent DRAM Refresh
Prof. Onur Mutlu ETH Zürich Fall September 2018
Computer Architecture Lecture 30: In-memory Processing
Topic/Title – Introduction Slide
Presentation transcript:

RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization ProcessorMemoryChannel Limited bandwidthHigh energy Carnegie Mellon UniversityIntel Pittsburgh

RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization MemoryChannel Limited bandwidthHigh energy Bulk Data Copy Data Initialization Forking ZeroingCheckpointingVM Cloning Processor Carnegie Mellon UniversityIntel Pittsburgh

RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization MemoryChannel Limited bandwidthHigh energy Unnecessary Data Movement Bulk Data Copy Data Initialization Forking ZeroingCheckpointingVM Cloning Processor Carnegie Mellon UniversityIntel Pittsburgh

RowClone: In-DRAM Bulk Copy & Initialization Source Row Row Buffer Destination Row

Source Row Row Buffer Destination Row 1 Copy from source row to row buffer RowClone: In-DRAM Bulk Copy & Initialization

Source Row Row Buffer Destination Row 1 2 Copy from source row to row buffer Copy from row buffer to destination row RowClone: In-DRAM Bulk Copy & Initialization

Source Row Row Buffer LatencyEnergy 11x74x Destination Row 1 2 Copy from source row to row buffer Copy from row buffer to destination row Very few changes to DRAM (0.01% increase in die area) RowClone: In-DRAM Bulk Copy & Initialization

End-to-end system design to exploit DRAM substrate Several applications that benefit from RowClone

Current Systems RowClone 27% 17% Performance DRAM Energy Efficiency 8-Core System RowClone: In-DRAM Bulk Copy & Initialization End-to-end system design to exploit DRAM substrate Several applications that benefit from RowClone