Moinuddin K. Qureshi ECE, Georgia Tech

Slides:



Advertisements
Similar presentations
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Advertisements

Flash storage memory and Design Trade offs for SSD performance
A Case for Refresh Pausing in DRAM Memory Systems
Computer Organization and Architecture
Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures June 14 th 2014 Prashant J. Nair - Georgia Tech David A. Roberts- AMD Research.
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
STT-RAM as a sub for SRAM and DRAM
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
Prashant Nair Dae-Hyun Kim Moinuddin K. Qureshi
© 2007 IBM Corporation HPCA – 2010 Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin Qureshi Michele Franceschini and.
Reducing Refresh Power in Mobile Devices with Morphable ECC
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches
© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
ECE/CS 552: Main Memory and ECC © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Carnegie Mellon University, *Seagate Technology
CS203 – Advanced Computer Architecture Main Memory Slides adapted from Onur Mutlu (CMU)
F AULT S IM : A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems* Memory Reliability Tutorial: HPCA-2016 Prashant.
1 Paolo Bianco Storage Architect Sun Microsystems An overview on Hybrid Storage Technologies.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
Solid State Disk Prof. Moinuddin Qureshi Georgia Tech.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Reducing Memory Interference in Multicore Systems
ASR: Adaptive Selective Replication for CMP Caches
18-447: Computer Architecture Lecture 34: Emerging Memory Technologies
DuraCache: A Durable SSD cache Using MLC NAND Flash Ren-Shuo Liu, Chia-Lin Yang, Cheng-Hsuan Li, Geng-You Chen IEEE Design Automation Conference.
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
Samira Khan University of Virginia Oct 9, 2017
Scalable High Performance Main Memory System Using PCM Technology
Lecture 15: DRAM Main Memory Systems
Lecture: Memory, Multiprocessors
Subject Name: Embedded system Design Subject Code: 10EC74
Part V Memory System Design
Lecture: DRAM Main Memory
Lecture: Memory Technology Innovations
Lecture 6: Reliability, PCM
Horizontally Partitioned Hybrid Main Memory with PCM
Lecture 22: Cache Hierarchies, Memory
Samira Khan University of Virginia Mar 27, 2019
Prof. Onur Mutlu Carnegie Mellon University 9/21/2012
RAIDR: Retention-Aware Intelligent DRAM Refresh
Architecting Phase Change Memory as a Scalable DRAM Alternative
Presentation transcript:

Moinuddin K. Qureshi ECE, Georgia Tech Memory Scaling is Dead, Long Live Memory Scaling Le Memoire Scaling est mort, vive le Memoire Scaling! Moinuddin K. Qureshi ECE, Georgia Tech At Yale’s “Mid Career” Celebration at University of Texas at Austin, Sept 19 2014

Main memory system must scale to maintain performance growth The Gap in Memory Hierarchy L1(SRAM) EDRAM DRAM Flash HDD ????? 21 23 25 27 29 211 213 215 217 219 221 223 Typical access latency in processor cycles (@ 4 GHz) Misses in main memory (page faults) degrade performance severely Main memory system must scale to maintain performance growth

The Memory Capacity Gap Trends: Core count doubling every 2 years. DRAM DIMM capacity doubling every 3 years Lim+ ISCA’09 Memory capacity per core expected to drop by 30% every two years

Challenges for DRAM: Scaling Wall DRAM does not scale well to small feature sizes (sub 1x nm) Increasing error rates can render DRAM scaling infeasible

Two Roads Diverged … Important to investigate both approaches Architectural support for DRAM scaling and to reduce refresh overheads Find alternative technology that avoids problems of DRAM DRAM challenges Important to investigate both approaches

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

Reasons for DRAM Faults Unreliability of ultra-thin dielectric material In addition, DRAM cell failures also from: Permanently leaky cells Mechanically unstable cells Broken links in the DRAM array DRAM Cells Q DRAM Cell Capacitor (tilting towards ground) Mechanically Unstable Cell DRAM Cell Capacitor Charge Leaks Permanently Leaky Cell Broken Links Permanent faults for future DRAMs expected to be much higher

Row and Column Sparing incurs large area overheads DRAM chip (organized into rows and columns) have spares Laser fuses enable spare rows/columns Entire row/column needs to be sacrificed for a few faulty cells Replaced Rows DRAM Chip: After Row/Column Sparing Replaced Columns Deactivated Rows and Columns Spare Columns Faults Spare Rows DRAM Chip: Before Row/Column Sparing Row and Column Sparing incurs large area overheads

SECDED not enough at high error-rate (what about soft-error?) Commodity ECC-DIMM Commodity ECC DIMM with SECDED at 8 bytes (72,64) Mainly used for soft-error protection For hard errors, high chance of two errors in same word (birthday paradox) For 8GB DIMM  1 billion words Expected errors till double-error word = 1.25*Sqrt(N) = 40K errors  0.5 ppm SECDED not enough at high error-rate (what about soft-error?)

Faulty Bits per word (8B) Dissecting Fault Probabilities At Bit Error Rate of 10-4 (100ppm) for an 8GB DIMM (1 billion words) Faulty Bits per word (8B) Probability Num words in 8GB 99.3% 0.99 Billion 1 0.7% 7.7 Million 2 26 x 10-6 28 K 3 62 x 10-9 67 4 10-10 0.1 Most faulty words have 1-bit error  The skew in fault probability can be leveraged for low cost resilience Tolerate high error rates with commodity ECC DIMM while retaining soft-error resilience

ArchShield stores the error mitigation information in memory ArchShield: Overview Inspired from Solid State Drives (SSD) to tolerate high bit-error rate Expose faulty cell information to Architecture layer via runtime testing Replication Area Fault Map ArchShield Main Memory Most words will be error-free 1-bit error handled with SECDED Multi-bit error handled with replication Fault Map (cached) ArchShield stores the error mitigation information in memory

ArchShield: Yield Aware Design When DIMM is configured, runtime testing is performed. Each 8B word gets classified into one of three types: No Error 1-bit Error Multi-bit Error (Replication not needed) SECDED can correct soft error SECDED can correct hard error Need replication for soft error Word gets decommissioned Only the replica is used (classification of faulty words can be stored in hard drive for future use) Tolerates 100ppm fault rate with 1% slowdown and 4% capacity loss

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

Emerging Technology to aid Scaling Phase Change Memory (PCM): Scalable to sub 10nm Resistive memory: High resistance (0), Low resistance (1) Advantages: scalable, has MLC capability, non volatile (no leakage) PCM is attractive for designing scalable memory systems. But …

Challenges for PCM Key Problems: Higher read latency (compared to DRAM) Limited write endurance (~10-100 million writes per cell) Writes are much slower, and power hungry Replacing DRAM with PCM causes: High Read Latency, High Power High Energy Consumption How do we design a scalable PCM without these disadvantages?

Hybrid Memory: Best of DRAM and PCM PCM Main Memory DATA Processor DRAM Buffer DATA T Flash Or HDD T=Tag-Store PCM Write Queue Hybrid Memory System: DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth PCM as main-memory to provide large capacity at good cost/power Write filtering techniques to reduces wasteful writes to PCM

Latency, Energy, Power: Lowered Hybrid memory provides performance similar to iso-capacity DRAM Also avoids the energy/power overheads from frequent writes

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

Ideal for each workload to have the policy that works best for it Workload Adaptive Systems Different policies work well for different workloads No single replacement policy works well for all workloads Or, the prefetch algorithm Or, the memory scheduling algorithm Or, the coherence algorithm Or, any other policy (write allocate/no allocate?) Unfortunately: systems are designed to cater to average case (a policy that works good enough for all workloads) Ideal for each workload to have the policy that works best for it

Adaptive Tuning via Runtime Testing Say we want to select between two policies: P0 and P1 Divide the cache in three: Dedicated P0 sets Dedicated P1 sets Follower sets (winner of P0,P1) n-bit saturating counter misses to P0-sets: counter++ misses to P1-set: counter-- Counter decides policy for Followers: MSB = 0, Use P0 MSB = 1, Use P1 + miss – P0-sets Follower Sets P1-sets n-bit cntr monitor  choose  apply (Set Dueling: using a single counter) Adaptive Tuning can allow dynamic policy selection at low cost

Outline Introduction ArchShiled: Yield Aware (arch support for DRAM ) Hybrid Memory: reduce Latency, Energy, Power Adaptive Tuning of Systems to Workloads Summary

The solution for all computer architecture problems is: Challenges for Computer Architects End of: Technology Scaling, Frequency Scaling, Moore’s Law, ???? How do we address these challenges: The solution for all computer architecture problems is: Yield Awareness Hybrid memory: Latency, Energy, Power reduction for PCM Workload adaptive systems: low cost “Adaptivity Through Testing”

The solution for all computer architecture problems is: Challenges for Computer Architects End of: Technology Scaling, Frequency Scaling, Moore’s Law, ???? How do we address these challenges: The solution for all computer architecture problems is: Yield Awareness Hybrid memory: Latency, Energy, Power reduction for PCM Workload adaptive systems: get low cost Adaptivity Through Testing Happy 75th Yale !