Scalable High Performance Main Memory System Using PCM Technology

Slides:



Advertisements
Similar presentations
Handling Resistance Drift in Phase Change Memory - Device, Circuit, Architecture, and System Solutions Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan,
Advertisements

Rethinking Database Algorithms for Phase Change Memory
Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
Arjun Suresh S7, R College of Engineering Trivandrum.
A Case for Refresh Pausing in DRAM Memory Systems
Fabián E. Bustamante, Spring 2007
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Nikos Hardavellas, Northwestern University
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
†The Pennsylvania State University
SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
Moinuddin K. Qureshi ECE, Georgia Tech
Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
Moinuddin K. Qureshi ECE, Georgia Tech ISCA 2012 Michele Franceschini, Ashish Jagmohan, Luis Lastras IBM T. J. Watson Research Center PreSET: Improving.
Presented by Anas Mazady, Cameron Fulton, Nicholas Williams University of Connecticut Department of Electrical and Computer Engineering Thursday, April.
Ovonic Unified Memory.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Defining Anomalous Behavior for Phase Change Memory
Storage Class Memory Architecture for Energy Efficient Data Centers Bruce Childers, Sangyeun Cho, Rami Melhem, Daniel Mossé, Jun Yang, Youtao Zhang Computer.
Sangyeun Cho Hyunjin Lee
© 2007 IBM Corporation HPCA – 2010 Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin Qureshi Michele Franceschini and.
COEN/ELEN 180 Storage Systems Memory Hierarchy. We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of.
EXTRAPOLATION PITFALLS WHEN EVALUATING LIMITED ENDURANCE MEMORY Rishiraj Bheda, Jesse Beu, Brian Railing, Tom Conte Tinker Research.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative
© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.
Hang Zhang1, Xuhao Chen1, Nong Xiao1,2, Fang Liu1
CMSC 611: Advanced Computer Architecture
CS 704 Advanced Computer Architecture
Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
Seth Pugsley, Jeffrey Jestes,
Reducing Memory Interference in Multicore Systems
18-447: Computer Architecture Lecture 23: Caches
18-447: Computer Architecture Lecture 34: Emerging Memory Technologies
Samira Khan University of Virginia Oct 23, 2017
Samira Khan University of Virginia Sep 17, 2017
HPE Persistent Memory Microsoft Ignite 2017
Morgan Kaufmann Publishers Memory & Cache
CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (
Lecture 11: DMBS Internals
Lecture: Memory, Multiprocessors
Row Buffer Locality Aware Caching Policies for Hybrid Memories
Lecture 6: Reliability, PCM
Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li
Horizontally Partitioned Hybrid Main Memory with PCM
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Samira Khan University of Virginia Mar 27, 2019
Prof. Onur Mutlu Carnegie Mellon University 9/21/2012
CS 295: Modern Systems Storage Technologies Introduction
Architecting Phase Change Memory as a Scalable DRAM Alternative
Presentation transcript:

Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY International Symposium on Computer Architecture (ISCA-2009) 18-Sep-18

Main Memory Capacity Wall More cores in system  More concurrency  Larger working set Demand for main memory capacity continues to increase Main Memory System consisting of DRAM are hitting: 1. Cost wall: Major % of cost of large servers is main memory 2. Scaling wall: DRAM scaling to small technology is challenge 3. Power wall: IBM P670 Server Processor Memory Small (4 proc, 16GB) 384 Watts 314 Watts Large (16 proc, 128GB) 840 Watts 1223 Watts Source: Lefurgy et al. IEEE Computer 2003 Need a practical solution to increase main-memory capacity

The Technology Hierarchy More capacity by cheaper, denser, (slower) technology High-Performance Disk Memory System L1(SRAM) EDRAM DRAM PCM Flash HDD 21 23 25 27 29 211 213 215 217 219 221 223 Typical access latency in processor cycles (@ 4 GHz) Phase Change Memory (PCM) promising candidate for large capacity main memory

Outline Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

What is Phase Change Memory? Phase change material (chalcogenide glass) exists in two states: 1. Amorphous: high resistivity 2. Crystalline: low resistivity Materials can be switched between states reliably, quickly, large number of times PCM stores data in terms of resistance Low resistance (SET state) = 1 High resistance (RESET state) = 0 Bit Line Word Line N I

How does PCM work ? SET RESET Low resistance High resistance 103-104 W Temperature Switching by heating using electrical pulses SET: sustained current to heat cell above Tcryst RESET: cell heated above Tmelt and quenched Tmelt Tcryst Time [ns] Large Current SET Low resistance 103-104 W Small Current RESET High resistance Access Device Memory Element 106-107 W Photo Courtesy: Bipin Rajendran, IBM

Key Characteristics of PCM + Scales better than DRAM, small cell size Prototypes as small as 3nm x 20 nm fabricated and tested [Raoux+ IBMJRD’08] + Can store multiple bits/cell  More density in the same area Prototypes with 2 bits/cell in ISSCC’08. >2 bits/cell expected soon. + Non-Volatile Memory Technology Data retention of 10 years  Power implications, system implications Challenges: More latency compared to DRAM. Limited Endurance (~10 million writes per cell) Write bandwidth constrained, so better to write less often.

Outline Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

Hybrid Memory System PCM Main Memory DRAM Buffer Processor Flash Or DATA W Processor DRAM Buffer DATA T Flash Or HDD T=Tag-Store Made of SRAM Page size DRAM buffer PCM Write Queue Hybrid Memory System: DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth PCM as main-memory to provide large capacity at good cost/power

Lazy Write Architecture Problem: Double PCM writes to dirty pages on install DRAM Buffer PCM Flash/Disk Processor WRQ For example: Daxpy Kernel: Y[i] = Y[i] + X[i] Baseline has 2 writes for Y[i] and 1 for X[i] Lazy write has 1 write for Y[i] and 1 for X[i]

Line Level Write Back Problem: Not all lines in a dirty page are dirty Solution: Dirty bits per line in DRAM buffer and write-back only dirty lines from DRAM to PCM Num Writes to Each Line (Mln) db1 db2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Average Line_id Problem: With LLWB, not all lines in dirty pages are written uniformly

Fine Grained Wear Leveling Solution: Fine Grained Wear Leveling (FGWL) -When a page gets allocated page is rotated by a random shift value -The rotate value remains constant while page remains in memory -On replacement of a page, a new random value is assigned for a new page -Over time, the write traffic per line becomes uniform. Average Average Num Writes to Each Line (Mln) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Line_id db1 db2 FGWL makes writes across lines in a dirty page uniform

Outline Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

Evaluation Framework Trace Driven Simulator: 16-core system (simple core), 8GB DRAM main-memory at 320 cycles HDD (2 ms) with Flash (32 us) with Flash hit-rate of 99% Workloads: Database workloads & Data parallel kernels 1. Database workloads: db1 and db2 2. Unix utilities: qsort and binary search 3. Data Mining : K-means and Gauss Seidal 4. Streaming: DAXPY and Vector Dot Product Assumption: PCM 4X denser & 4X slower than DRAM  32GB @ 1280 cycle read latency

Reduction in Page Faults Benefit from capacity Need >16GB Streaming

Impact on Execution Time PCM with DRAM buffer performs similar to equal capacity DRAM storage

Impact of PCM Latency DRAM-8GB PCM-32GB HYBRID (1+32)GB DRAM-32GB Hybrid memory system is relatively insensitive to PCM Latency

Power Evaluations Significant Power and Energy savings with PCM based hybrid memory system

Outline Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

Impact of Write Endurance B  Bytes/Cycle written to PCM S  PCM capacity in bytes Wmax  Max writes per PCM cell Assuming uniform writes to PCM Endurance (in cycles) = (S/B).Wmax F  Frequency of System (4GHz) Y = Number of years (lifetime) There are 225 seconds in a year Num. cycles in Y years = Y. F.225 For a 4GHz System, a 32GB PCM written at 1 Byte per Cycle Y = Wmax 4 million Y = (S/B). Wmax F.225 If Wmax = 10 million, PCM will last for 2.5 years

Lifetime Results Table shows average bytes per cycle written to PCM and Average lifetime of PCM assuming Wmax = 10 million Configuration Avg. Bytes/Cycle Avg. Lifetime 1GB DRAM + 32GB PCM 0.807 3.0 yrs + Lazy Write 0.725 3.4 yrs + Line Level Write Back 0.316 7.6 yrs + Bypass Streaming Apps 0.247 9.7 yrs Proposed filtering techniques reduce write traffic to PCM by 3.2X, increasing its lifetime from 3 to 9.7 years

Outline Introduction What is PCM ? Hybrid Memory System Evaluation Lifetime Analysis Summary

for system enhancement & related OS issues. Summary Need more main memory capacity: DRAM hitting power, cost, scaling wall PCM is an emerging technology – 4x denser than DRAM but with slower access time and limited write endurance We propose a Hybrid Memory System (DRAM+PCM) that provides significant power and performance benefits Proposed write filtering techniques reduce writes by 3x and increase PCM lifetime from 3 years to 9 years Not touched in this talk but important: Exploiting non-volatile memories for system enhancement & related OS issues.

Thanks!