Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures.

Slides:



Advertisements
Similar presentations
Zombie Memory: Extending Memory Lifetime by Reviving Dead Blocks
Advertisements

Computer Engineering II
Computer Organization and Architecture
Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†
1 The Basic Memory Element - The Flip-Flop Up until know we have looked upon memory elements as black boxes. The basic memory element is called the flip-flop.
Avishai Wool lecture Introduction to Systems Programming Lecture 8.3 Non-volatile Memory Flash.
Computer Organization and Architecture
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.
Ovonic Unified Memory.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
Lecture 7: PCM, Cache coherence
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Post-Manufacturing ECC Customization Based on Orthogonal Latin Square Codes and Its Application to Ultra-Low Power Caches Rudrajit Datta and Nur A. Touba.
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.
MODULE 5: Main Memory.
P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.
Chapter 3 Internal Memory. Objectives  To describe the types of memory used for the main memory  To discuss about errors and error corrections in the.
Internal Memory.
RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.
Memory Cell Operation.
Semiconductor Memory Types
Memory Devices 1. Memory concepts 2. RAMs 3. ROMs 4. Memory expansion & address decoding applications 5. Magnetic and Optical Storage.
Digital Circuits Introduction Memory information storage a collection of cells store binary information RAM – Random-Access Memory read operation.
1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
Chapter 5 Internal Memory. contents  Semiconductor main memory - organisation - organisation - DRAM and SRAM - DRAM and SRAM - types of ROM - types of.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Gunjeet Kaur Dronacharya Group of Institutions. Outline I Random-Access Memory Memory Decoding Error Detection and Correction Read-Only Memory Programmable.
Computer Architecture Chapter (5): Internal Memory
Chapter 5 - Internal Memory 5.1 Semiconductor Main Memory 5.2 Error Correction 5.3 Advanced DRAM Organization.
Memory and Programmable Logic
Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1
William Stallings Computer Organization and Architecture 7th Edition
Types of RAM (Random Access Memory)
Improving Memory Access 1/3 The Cache and Virtual Memory
Internal Memory.
William Stallings Computer Organization and Architecture 7th Edition
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Lecture 15: DRAM Main Memory Systems
William Stallings Computer Organization and Architecture 8th Edition
Computer Architecture
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
BIC 10503: COMPUTER ARCHITECTURE
Lecture 6: Reliability, PCM
Use ECP, not ECC, for hard failures in resistive memories
LAB 7.
William Stallings Computer Organization and Architecture 8th Edition
Hardware Main memory 26/04/2019.
4-Bit Register Built using D flip-flops:
Presentation transcript:

Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures may have on this presentation.

2 Charge-Based DRAM Cell The End of Charge-Based Memory

3 Resistive Memory Cell Metal (to bit line) Metal (to sensor line) Use ECP, not ECC, for Hard Failures in Resistive Memories Stuart Schechter Gabriel Loh Karin Strauss Doug Burger and introducing (in the ape suit)

4 Resistive Memory Cell Metal (to bit line) Metal (to sensor line)

5 Phase-Change Memory Cell Metal (to bit line) Metal (to sensor line) Phase-Change Memory (PCM)

6 Phase-Change Memory Cell Metal (to bit line) heating element Metal (to sensor line) Phase-Change Memory (PCM)

7 Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Phase-Change Memory (PCM)

8 Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Phase-Change Memory (PCM)

Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Nonvolatile Scalability not impacted by capacitance limits Cells may be written individually Slower, with more energy intensive writes 2x slower for reads, 10x slower for writes Phase-Change Memory vs. DRAM

The heating element loses resistivity, or Expansion/contraction causes detachment – Mean expected lifetime 10 8 writes (but varies) Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Hard Failures in Resistive Memories After 10 8 writes together, we’re not connecting like we used to. I just don’t feel the heat anymore.

Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) DRAM cells encounter soft (transient), errors —May occur between write and future read PCM cells encounter hard (permanent) failures — Occur at write time (detectable by verifying read) — Increase in frequency over product lifetime Phase-Change Memory vs. DRAM

Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Living with Hard Failures in Memory Cells Assume we’re already doing these things 1.Accept failure of some fraction of pages — Map failed pages out of logical memory 2.Wear-level data pages/blocks, & within blocks —Shift/rotate data randomly (intervals/locations) 3.Differential writes — Write only cells with values that change 4.Correct errors when possible

Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Error Correction Schemes A Page Must Be Retired When… The first cell within a page fails No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes 8 chips 8 bits/chip SEC/SECDED 64 bits 7/8 bits 10.9%/12.5% overhead We use this 12.5% overhead limit for all schemes in our study

No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes A Page Must Be Retired When… 8 chips 8 bits/chip SEC/SECDED 64 bits 7/8 bits 10.9%/12.5% overhead A block within the page suffers a second error

No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 8 chips 8 bits/chip 64 bits Error Correction Schemes Error Correcting Pointers Correction Chip

Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correcting Pointers

Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correcting Pointers

0110 … correction pointer data cells 1 R replacement cell 1 correction entry 1 Full? 1 0 Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 ECP 1

0110 … R 1 0 Full? data cellscorrection entries Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

… R R Full? data cellscorrection entries Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Error Correction Schemes … data cellscorrection entries 1 Full? 4 A row within the page suffers more errors than it has correction entries* No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

*What if correction entry fails? Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 There’s a precedence rule for that No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

… R R 1 0 Full? data cellscorrection entries Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

… R R Full? 4 data cellscorrection entries Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Error Correction Schemes … data cellscorrection entries 1 Full? 4 A row within the page suffers more errors than it has correction entries* * * No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes Wilkerson, Gao, Alameldeen, Chishti, Khellah, & Lu, ISCA 2008 For fixing errors induced by running SRAM caches at low voltages

Error Correction Schemes 0110 … R1 0 R data cellscorrection entries 0001 SEC No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Error Correction Schemes A row within the page suffers more errors than it has correction entries* No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

8 chips 8 bits/chip Parity Bits 64 bits 8 bits 12.5% overhead Error Correction Schemes Ipek, Condit, Nightingale, Burger, & Moscibroda, ASPLOS 2009 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Error Correction Schemes Page Byte Paired Page No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Error Correction Schemes Page 160 errors occur within page. (an average of 5 errors per 1024 bits) No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes 8 chips 8 bits/chip SEC/SECDED 8 accesses x 64 bits = 512 bit block 8 transfers x 8 bits 64 bits Multiple error correction – 576 bits of storage – 512 data bits – At most 9 corrections possible (Hamming bound)

Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A 576-bit block (holding 512 data bits) suffers its 10 th error A Page Must Be Retired When…

4kByte (32Kbit) page size 32Byte (512 bit) row size 1 Rank 8 Chips per rank x8 Bit lines per chip 10 8 mean writes until memory cell failure.25 coefficient of variance for cell lifetimes Experimental Parameters

Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Half capacity lost on first error (when pages are paired)

Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Results Write Modification Width bit block (16 x 32 bit words) Entire 512 bit region modified (each bit flips with p=0.5)

No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code Error Correction Schemes A Review of Hamming Codes Flipping just one data bit changes, on average, half of the hamming code bits

Results Write Modification Width bit block (16 x 32 bit words) 256 bit modification128bit mod

Coefficient of variance = 0.25 Write Modification Width of 512 Bits No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Write Modification Width of 256 Bits No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Write Modification Width of 128 Bits No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Perfect codes would always win if you leveled the wear of the correction bits and data bits Fair Treatment of Perfect Codes

No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Coefficient of variance = 0.25 Write Modification Width of 128 Bits Internal Wear Leveling

Coefficient of variance = 0.25 Write Modification Width of 256 Bits Internal Wear Leveling No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Coefficient of variance = 0.25 Write Modification Width of 512 Bits Internal Wear Leveling No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

Correction pointers >> Pairing Correction pointers >> SEC over small blocks – Precedence rules >> SEC over correction entries Correction pointers >= MEC over large blocks – Lower computational cost For memory with both hard and transient errors – Use ECP below, on chip (for hard errors) – Use SEC above, off chip (for soft errors) Conclusion

Backup Slide for Responding to Questions You didn’t expect we’d believe this… did you?

56 ? I’m sorry dear, but if that’s the best talk we’ll be capable of even after millions of years of evolution, I think its best we not reproduce.