Use ECP, not ECC, for hard failures in resistive memories

Slides:



Advertisements
Similar presentations
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
Advertisements

RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Error Correcting Memory
Thank you for your introduction.
Computer Engineering II
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
SSDs: advantages exhibit higher speed than disks drive down power consumption offer standard interfaces like HDDs do.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†
1 Error Correction Coding for Flash Memories Eitan Yaakobi, Jing Ma, Adrian Caulfield, Laura Grupp Steven Swanson, Paul H. Siegel, Jack K. Wolf Flash Memory.
Coding for Flash Memories
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
RAID Systems CS Introduction to Operating Systems.
Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
Microprocessor-based systems Curse 7 Memory hierarchies.
Lecture 7: PCM, Cache coherence
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
Embedded System Lab. Daeyeon Son Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1, Gulay Yalcin 2, Onur Mutlu 1, Erich F. Haratsch.
Error Correction and Partial Information Rewriting for Flash Memories Yue Li joint work with Anxiao (Andrew) Jiang and Jehoshua Bruck.
Virtual Memory 1 1.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
CS 147 Virtual Memory Prof. Sin Min Lee Anthony Palladino.
Storage and File structure COP 4720 Lecture 20 Lecture Notes.
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
Digital Design Lecture 13
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
 The emerged flash-memory based solid state drives (SSDs) have rapidly replaced the traditional hard disk drives (HDDs) in many applications.  Characteristics.
CS Introduction to Operating Systems
RAID.
CMPE Database Systems Workshop June 16 Class Meeting
Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1
Understanding Modern Flash Memory Systems
An Adaptive Data Separation Aware FTL for Improving the Garbage Collection Efficiency of Solid State Drives Wei Xie and Yong Chen Texas Tech University.
NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah,
Part V Memory System Design
Operating Systems Review.
Lecture 6: Reliability, PCM
TECHNICAL SEMINAR PRESENTATION
Storage Systems Sudhanva Gurumurthi.
CSE 451: Operating Systems Autumn 2005 Memory Management
Recovery System.
Milestone 2 Enhancing Phase-Change Memory via DRAM Cache
LAB 7.
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Virtual Memory 1 1.
2019 2학기 고급운영체제론 ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks 3 # 단국대학교 컴퓨터학과 # 남혜민 # 발표자.
Presentation transcript:

Use ECP, not ECC, for hard failures in resistive memories S. Schechter, G. H. Loh, K. Strauss, and D. Burger Presented at ISCA 2010

ECP: Error Correction Pointers With equivalent space overhead, ECP provides longer life time … for byte addressable NVM (such as PCM) … where all of the bit errors are wear-induced hard error … assuming hard errors are detected immediately after write

ECP scheme (Fig. 1)

Evaluation set up Schemes compared Experiment environment SEC: widely used ECC in DRAM Pairing: pair partially failed pages Wilkerson: similar to ECP Perfect code: theoretical limit on ECC Experiment environment System description Used a simple in-house simulator with inter-row wear-leveling Memory is organized as 512-bit rows Measured number of surviving pages in system over number of writes Page failing over time exacerbate wear on surviving pages Cell lifetime modelling Random life time is assigned for each cell with normal distribution of mean: 108, variance: 0.25 Synthetic workload pattern Actual write region is narrower than 512-bit row For each bit within the write region, 50% chance of bit change For ECC based protection, any bit change in data causes 50% chance of bit change for ECC parities

Evaluation results (Fig. 3 and 4)

Other issues addressed in this work Intra-row wear-leveling Comparison to optimal ECP Layering ECP Optimizing memory for correction Discussion Hardware implementation Orthogonality and composability Transient errors

Our take aways (1) Problems/solutions associated with hard errors in this work are not directly applicable to our domain because… No in-place update allowed ECC does not exacerbate wear Bit randomization used as intra-row wear-leveling Transient errors are still a big issue in flash Cannot determine the exact location of stuck-at fault cell We do not have an adequate solution for dealing with hard errors at the moment Program/erase reports “fail” whenever a stuck-at fault is detected … and then FTL maps out the entire block ECC remains inefficient in dealing with hard errors

Our take aways (2) What’s the space efficiency for data protection in our domain? For 24 bit protection for 8192 bit data ECC (our current BCH engine) 336 bits of parity (42B) ECP 337 bits of parity (43B) Perfect coding (theoretical bound) 234 bits of parity (30B)

Our take aways (3) Soft error vs. hard error Effective capacity vs. over-provisioning Wear-leveling in the presence of replaceable components

Wilkerson’s scheme (Fig. 2)

Intra-row wear-leveling (Fig. 5)

Intra-row wear-leveling (Fig. 6)

Optimal ECP (Table 3)

Layered ECP (Fig. 7)

Layered ECP (Table 4)

Hardware implementation (Fig. 9)