Today, DRAM is just a storage device

Slides:



Advertisements
Similar presentations
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,
Advertisements

A Case for Subarray-Level Parallelism (SALP) in DRAM Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, Onur Mutlu.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.
1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
Scalable Many-Core Memory Systems Lecture 2, Topic 1: DRAM Basics and DRAM Scaling Prof. Onur Mutlu HiPEAC.
Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
Micro 2012 Closing Remarks Onur Mutlu PC Chair December 3, 2012 Vancouver, BC, Canada.
RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization ProcessorMemoryChannel Limited bandwidthHigh energy Carnegie Mellon UniversityIntel.
Exploiting Compressed Block Size as an Indicator of Future Reuse
The Evicted-Address Filter
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,
Computer Architecture: Main Memory (Part II) Prof. Onur Mutlu Carnegie Mellon University.
Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.
DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.
Simple DRAM and Virtual Memory Abstractions for Highly Efficient Memory Systems Thesis Oral Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Phillip.
Genome Read In-Memory (GRIM) Filter: Fast Location Filtering in DNA Read Mapping using Emerging Memory Technologies Jeremie Kim 1, Damla Senol 1, Hongyi.
Improving Cache Performance using Victim Tag Stores
What is it and why do you need it?
Memory Scaling: A Systems Architecture Perspective
A Case for Core-Assisted Bottleneck Acceleration in GPUs Enabling Flexible Data Compression with Assist Warps Session 2A: Today, 10:20 AM Nandita Vijaykumar.
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
18-740: Computer Architecture Recitation 3:
Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Kevin Chang Prashant Nair, Donghyuk Lee, Saugata Ghose, Moinuddin.
Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University
Prof. Gennady Pekhimenko University of Toronto Fall 2017
Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.
Thesis Oral Kevin Chang
ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality
Prof. Gennady Pekhimenko University of Toronto Fall 2017
Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric.
What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study Saugata Ghose, A. Giray Yağlıkçı, Raghav Gupta, Donghyuk Lee,
Hasan Hassan, Nandita Vijaykumar, Samira Khan,
Genome Read In-Memory (GRIM) Filter Fast Location Filtering in DNA Read Mapping with Emerging Memory Technologies Jeremie Kim, Damla Senol, Hongyi Xin,
GateKeeper: A New Hardware Architecture
The Main Memory system: DRAM organization
Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri.
Ambit In-memory Accelerator for Bulk Bitwise Operations
Genome Read In-Memory (GRIM) Filter:
Achieving High Performance and Fairness at Low Cost
GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping
Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1 Zülal.
Session 1A at am MEMCON Detecting and Mitigating
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen,
Accelerating Approximate Pattern Matching with Processing-In-Memory (PIM) and Single-Instruction Multiple-Data (SIMD) Programming Damla Senol Cali1, Zülal.
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,
15-740/ Computer Architecture Lecture 19: Main Memory
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Open Source Activity Showcase Computational Storage SNIA SwordfishTM
CROW A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hi, my name is Hasan. Today, I will be presenting CROW, which.
Processing Data Where It Makes Sense Enabling Practical In- or Near Memory Computation Onur Mutlu 23.
CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators
Prof. Onur Mutlu ETH Zürich Fall September 2018
Computer Architecture Lecture 6a: ChargeCache
Intelligent Architectures for Intelligent Machines
Presented by Florian Ettinger
Computer Architecture Lecture 30: In-memory Processing
Processing Data Where It Makes Sense in Modern Computing Systems: Enabling In-Memory Computation Onur Mutlu
BitMAC: An In-Memory Accelerator for Bitvector-Based
Presentation transcript:

Today, DRAM is just a storage device

Today, DRAM is just a storage device RowClone: Bulk Data Copy and Initialization Using DRAM MICRO 2013

Today, DRAM is just a storage device RowClone: Bulk Data Copy and Initialization Using DRAM MICRO 2013 Gather-Scatter DRAM: Accelerating Strided Accesses Using DRAM MICRO 2015

Ambit: Accelerating Bulk Bitwise Operations Using DRAM Today, DRAM is just a storage device RowClone: Bulk Data Copy and Initialization Using DRAM MICRO 2013 Gather-Scatter DRAM: Accelerating Strided Accesses Using DRAM MICRO 2015 Ambit: Accelerating Bulk Bitwise Operations Using DRAM MICRO 2017

Bulk Bitwise Operations Database indices Search filters Database scans Cryptography Genome analysis ...

Throughput of bulk bitwise operations limited Database indices Search filters Database scans Cryptography Genome analysis ... Throughput of bulk bitwise operations limited by available memory bandwidth Processor Memory Channel

Ambit Perform bitwise operations completely inside DRAM chips Bitwise AND/OR: Simultaneous activation of three rows Bitwise NOT: Inverters already present in sense amplifiers

Ambit 32X 35X 1% area cost over existing DRAM chips improvement in Perform bitwise operations completely inside DRAM chips Bitwise AND/OR: Simultaneous activation of three rows Bitwise NOT: Inverters already present in sense amplifiers 1% area cost over existing DRAM chips improvement in raw throughput 32X reduction in energy consumed 35X

Ambit 32X 35X 3X-7X 1% area cost over existing DRAM chips Perform bitwise operations completely inside DRAM chips Bitwise AND/OR: Simultaneous activation of three rows Bitwise NOT: Inverters already present in sense amplifiers 1% area cost over existing DRAM chips improvement in raw throughput 32X reduction in energy consumed 35X performance improvement in real-world applications 3X-7X

Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, Todd C. Mowry Session 3A – Tuesday, 11 AM