Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures June 14 th 2014 Prashant J. Nair - Georgia Tech David A. Roberts- AMD Research.

Slides:



Advertisements
Similar presentations
RAID Redundant Arrays of Independent Disks Courtesy of Satya, Fall 99.
Advertisements

Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
A Case for Refresh Pausing in DRAM Memory Systems
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee, Rajeev Balasubramonian,
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD Fundamental Latency Trade-offs in Architecting DRAM Caches MICRO 2012.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.
Moinuddin K. Qureshi ECE, Georgia Tech
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM.
Xiaodong Wang  Dilip Vasudevan Hsien-Hsin Sean Lee University of College Cork  Georgia Tech Global Built-In Self-Repair for 3D Memories with Redundancy.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Prashant Nair Dae-Hyun Kim Moinuddin K. Qureshi
L i a b l eh kC o m p u t i n gL a b o r a t o r y Yield Enhancement for 3D-Stacked Memory by Redundancy Sharing across Dies Li Jiang, Rong Ye and Qiang.
Reducing Refresh Power in Mobile Devices with Morphable ECC
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM and Storage Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
Checksum Strategies for Data in Volatile Memory Authors: Humayun Arafat(Ohio State) Sriram Krishnamoorthy(PNNL) P. Sadayappan(Ohio State) 1.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches
© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 5: Refresh, Chipkill Topics: refresh basics and innovations, error correction.
1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors.
FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR DAVID A. ROBERTS, AMD RESEARCH PRASHANT J. NAIR, GEORGIA INSTITUTE OF TECHNOLOGY
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
F AULT S IM : A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems* Memory Reliability Tutorial: HPCA-2016 Prashant.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Reducing Memory Interference in Multicore Systems
ECE232: Hardware Organization and Design
RAID Redundant Arrays of Independent Disks
ISPASS th April Santa Rosa, California
Computer Architecture & Operations I
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
Lecture 15: DRAM Main Memory Systems
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture 28: Reliability Today’s topics: GPU wrap-up Disk basics RAID
DRAM Bandwidth Slide credit: Slides adapted from
Lecture: Memory Technology Innovations
Lecture 6: Reliability, PCM
Virtual Memory Hardware
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories
Use ECP, not ECC, for hard failures in resistive memories
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
DRAM Hwansoo Han.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
RAIDR: Retention-Aware Intelligent DRAM Refresh
Presentation transcript:

Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures June 14 th 2014 Prashant J. Nair - Georgia Tech David A. Roberts- AMD Research Moinuddin K. Qureshi – Georgia Tech

Current memory systems are inefficient in energy and bandwidth Growing demand for efficient DRAM memory system 3D DRAM - Stacking dies for efficiency and bandwidth The bright side: – performance and power The dark side : (newer failure modes: eg. TSVs) – Reliability 2 Protect against newer failure modes to derive benefits of stacking Introduction Performance Power Ignoring Reliability Providing Reliability IDEAL: Providing Reliabilty Stacked DRAM

Small and large granularity DRAM faults occur with equal likelihood Bit Faults Word Faults Column Faults Row Faults Bank Faults Multi-Bank Faults (due to TSV faults) 3 3D-stacking needs to tolerate large faults efficiently Large Faults Are Common Single DRAM Die (Top View) Banks TSVs Stacked Memory DRAM Dies ECC-Die

4 Data Reliability Incurs Costs We need fault tolerance without impractical overheads Single DRAM Die (Top View) Banks TSVs Ensure Reliability : Stripe Data to implement ChipKill/SECDED – For instance, a 64B cache line can be striped across 8 banks (8B/bank) – Use 1 additional bank for ECC (possibly in another DRAM die) Cost – Activate 8 banks : 8X bank activation power, 8X DRAM parallelism Data : 8Bytes

Swap Bad TSVs with good ones (TSV-SWAP)* Parity based ECC within stack (3 dimensional parity)* Bimodal Sparing of Faulty Regions (Dual Granularity Sparing)* * Resilience study with FAULTSIM using projected data from field studies 5 Three solutions work in conjunction to enable high performance, low power and robust stacked memory Citadel : Gist of Schemes DRAM Dies ECC Die 3 Dimensional ParityTSV SWAP Dual Granularity Sparing

Introduction Scheme - 1 : TSV-SWAP Scheme - 2 : Three Dimensional Parity (3DP) Scheme - 3 : Dynamic Dual Granularity Sparing (DDS) Citadel = Schemes [1+2+3] Summary 6 Outline

Swap faulty TSVs with pre-decided standby TSVs (data TSVs) Data for standby TSVs replicated in ECC space 7 TSV-SWAP provides almost ideal TSV fault tolerance TSV-SWAP DRAM Bank Row Decoder Column Decoder Addr. TSVs Data TSVs (standby TSV) Faulty Addr TSV Address TSV fault: 50% memory unavailable Address TSV fault: 50% memory unavailable Faulty Data TSV Few Bit-Lines Unavailable SWAP TSV FIT: 1430 Data striped Across Banks No TSV SWAP No TSV Fault

Introduction Scheme - 1 : TSV-SWAP Scheme - 2 : Three Dimensional Parity (3DP) Scheme - 3 : Dynamic Dual Granularity Sparing (DDS) Citadel = Schemes [1+2+3] Summary 8 Outline

Detect using CRC32 + Correct using parity across 3 dimensions: – A parity bank for the stack – A parity row per die – A parity row across dies per bank Demand Cache Dimension 1 parity in LLC for performance 9 Three dimensions help in multi-fault handling Three Dimensional Parity (3DP) DRAM Dies ECC Die Die 1 Die 2 Die 8 Parity Bank (Dimension 1) Parity Row Dimension 2 Parity Row (Dimension 3) 3DP: 7X stronger than ChipKill Baseline: Across Channels

Introduction Scheme - 1 : TSV-SWAP Scheme - 2 : Three Dimensional Parity (3DP) Scheme - 3 : Dynamic Dual Granularity Sparing (DDS) Citadel = Schemes [1+2+3] Summary 10 Outline

Banks Faulty Die Spare Banks ECC Die CRC32 + Data of Standby TSVs Based on likelihood, faults have two granularities: – small (bit, row, word) and large (col, bank) use bimodal sparing 11 Dynamic Dual Granularity Sparing Use an entire spare row Bit Fault Word Fault Bank fault Use a spare bank Dual Grain (row or bank) sparing efficiently uses spare area

Introduction Scheme - 1 : TSV-SWAP Scheme - 2 : Three Dimensional Parity (3DP) Scheme - 3 : Dynamic Dual Granularity Sparing (DDS) Citadel = Schemes [1+2+3] Summary 12 Outline

13 Citadel provides 700X more resilience, consuming only 4% additional power and 1% additional execution time Citadel : Results Citadel: 700X stronger than ChipKill Both systems employ TSV-SWAP Configuration:  8-core CMP with 8MB LLC (shared)  HBM like: 2 ‘8GB’ Stacks, DDR  8 Data Dies and 1 ECC Die  8 Banks/Channel, 8 Channels/Stack Baseline: No Stripe

Introduction Scheme - 1 : TSV-SWAP Scheme - 2 : Three Dimensional Parity (3DP) Scheme - 3 : Dynamic Dual Granularity Sparing (DDS) Citadel = Schemes [1+2+3] Summary 14 Outline

3D stacking can enable efficient DRAM However, reliability concerns overshadow the benefits of stacking Citadel enables robust and efficient Stacked DRAM by: – TSV SWAP to dynamically swap out faulty TSVs with good TSVs – Handling multiple-faults using 3DP – Isolating faults using DDS Citadel enables designers to provide all benefits of stacking at orders of magnitude higher resilience. 15 Summary