Presentation is loading. Please wait.

Presentation is loading. Please wait.

Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures.

Similar presentations


Presentation on theme: "Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures."— Presentation transcript:

1 Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures may have on this presentation.

2 2 Charge-Based DRAM Cell The End of Charge-Based Memory

3 3 Resistive Memory Cell Metal (to bit line) Metal (to sensor line) Use ECP, not ECC, for Hard Failures in Resistive Memories Stuart Schechter Gabriel Loh Karin Strauss Doug Burger and introducing (in the ape suit)

4 4 Resistive Memory Cell Metal (to bit line) Metal (to sensor line)

5 5 Phase-Change Memory Cell Metal (to bit line) Metal (to sensor line) Phase-Change Memory (PCM)

6 6 Phase-Change Memory Cell Metal (to bit line) heating element Metal (to sensor line) Phase-Change Memory (PCM)

7 7 Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Phase-Change Memory (PCM)

8 8 Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Phase-Change Memory (PCM)

9 Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Nonvolatile Scalability not impacted by capacitance limits Cells may be written individually Slower, with more energy intensive writes 2x slower for reads, 10x slower for writes Phase-Change Memory vs. DRAM

10 The heating element loses resistivity, or Expansion/contraction causes detachment – Mean expected lifetime 10 8 writes (but varies) Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Hard Failures in Resistive Memories After 10 8 writes together, we’re not connecting like we used to. I just don’t feel the heat anymore.

11 Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) DRAM cells encounter soft (transient), errors —May occur between write and future read PCM cells encounter hard (permanent) failures — Occur at write time (detectable by verifying read) — Increase in frequency over product lifetime Phase-Change Memory vs. DRAM

12 Phase-Change Memory Cell Metal (to bit line) heating element chalcogenide (phase change material) Metal (to sensor line) Living with Hard Failures in Memory Cells Assume we’re already doing these things 1.Accept failure of some fraction of pages — Map failed pages out of logical memory 2.Wear-level data pages/blocks, & within blocks —Shift/rotate data randomly (intervals/locations) 3.Differential writes — Write only cells with values that change 4.Correct errors when possible

13 Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

14 Error Correction Schemes A Page Must Be Retired When… The first cell within a page fails No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

15 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes 8 chips 8 bits/chip SEC/SECDED 64 bits 7/8 bits 10.9%/12.5% overhead We use this 12.5% overhead limit for all schemes in our study

16 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes A Page Must Be Retired When… 8 chips 8 bits/chip SEC/SECDED 64 bits 7/8 bits 10.9%/12.5% overhead A block within the page suffers a second error

17 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 8 chips 8 bits/chip 64 bits Error Correction Schemes Error Correcting Pointers Correction Chip

18 Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correcting Pointers

19 Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correcting Pointers

20 0110 … 100 511510509508321 0000001 8765321 0 4 0 0 correction pointer data cells 1 R replacement cell 1 correction entry 1 Full? 1 0 Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 ECP 1

21 0110 … 100 511510509508321 0000001 8765321 0 4 0 0 1 R 1 0 Full? 5321 0 1 0 data cellscorrection entries 0000 4 Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

22 0001 0110 … 100 511510509508321 0000001 8765321 0 4 0 0 1 R 1 5321 0 1 0 1111110 8765321 1 4 1 0 0 R 0 0010 4 0 Full? data cellscorrection entries Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

23 Error Correction Schemes 0001 0110 … 100 511510509508321 5321 0 1 0 data cellscorrection entries 1 Full? 4 A row within the page suffers more errors than it has correction entries* No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

24 *What if correction entry fails? Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 There’s a precedence rule for that No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

25 0001 0110 … 100 511510509508321 0000001 8765321 0 4 0 0 0 R 1 321 0 1 0 0000001 8765321 0 4 0 0 1 R 1 0 Full? 0001 5 0010 4 data cellscorrection entries Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

26 0001 0110 … 100 511510509508321 0000001 8765321 0 4 1 0 1 R 1 5321 0 1 0 0000001 8765321 0 4 0 0 1 R 1 0010 1 0 Full? 4 data cellscorrection entries Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

27 Error Correction Schemes 0001 0110 … 100 511510509508321 5321 0 1 0 data cellscorrection entries 1 Full? 4 A row within the page suffers more errors than it has correction entries* * * No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

28 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes Wilkerson, Gao, Alameldeen, Chishti, Khellah, & Lu, ISCA 2008 For fixing errors induced by running SRAM caches at low voltages

29 Error Correction Schemes 0110 … 10 511510509508321 0000000 8765321 0 4 0 R1 0 R0 1 5321 0 0 4 1000 1 data cellscorrection entries 0001 SEC No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

30 Error Correction Schemes A row within the page suffers more errors than it has correction entries* No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

31 8 chips 8 bits/chip Parity Bits 64 bits 8 bits 12.5% overhead Error Correction Schemes Ipek, Condit, Nightingale, Burger, & Moscibroda, ASPLOS 2009 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

32 00000100 0 00110110 1 11000000 2 00000100 3 00000101 4 00000100 5 11111111 6 1 0 0 1 0 1 0 0 00000000 7 Error Correction Schemes Page Byte 00000100 0 00110110 1 11000000 2 00000100 3 00000101 4 00000100 5 11111111 6 1 0 0 1 0 1 0 000000000 7 Paired Page No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

33 Error Correction Schemes Page 160 errors occur within page. (an average of 5 errors per 1024 bits) No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A Page Must Be Retired When…

34 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Error Correction Schemes 8 chips 8 bits/chip SEC/SECDED 8 accesses x 64 bits = 512 bit block 8 transfers x 8 bits 64 bits Multiple error correction – 576 bits of storage – 512 data bits – At most 9 corrections possible (Hamming bound)

35 Error Correction Schemes No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 A 576-bit block (holding 512 data bits) suffers its 10 th error A Page Must Be Retired When…

36 4kByte (32Kbit) page size 32Byte (512 bit) row size 1 Rank 8 Chips per rank x8 Bit lines per chip 10 8 mean writes until memory cell failure.25 coefficient of variance for cell lifetimes Experimental Parameters

37 Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

38 Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

39 Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Half capacity lost on first error (when pages are paired)

40 Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

41 Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

42 Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

43 Coefficient of variance = 0.25 Results No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

44 Results Write Modification Width 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 512 bit block (16 x 32 bit words) Entire 512 bit region modified (each bit flips with p=0.5)

45 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 12345678910111213141516 0 000 0 0 0 00 0 0 000 0 0000000 0 Error Correction Schemes A Review of Hamming Codes Flipping just one data bit changes, on average, half of the hamming code bits

46 Results Write Modification Width 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 01011111 11011000 00000010 11111000 512 bit block (16 x 32 bit words) 256 bit modification128bit mod

47 Coefficient of variance = 0.25 Write Modification Width of 512 Bits No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

48 Coefficient of variance = 0.25 Write Modification Width of 256 Bits No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

49 Coefficient of variance = 0.25 Write Modification Width of 128 Bits No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

50 Perfect codes would always win if you leveled the wear of the correction bits and data bits Fair Treatment of Perfect Codes

51 No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9 Coefficient of variance = 0.25 Write Modification Width of 128 Bits Internal Wear Leveling

52 Coefficient of variance = 0.25 Write Modification Width of 256 Bits Internal Wear Leveling No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

53 Coefficient of variance = 0.25 Write Modification Width of 512 Bits Internal Wear Leveling No correctionPairing 8 SEC 64 ECP 6 Wilkerson 4 Perfect_Code 9

54 Correction pointers >> Pairing Correction pointers >> SEC over small blocks – Precedence rules >> SEC over correction entries Correction pointers >= MEC over large blocks – Lower computational cost For memory with both hard and transient errors – Use ECP below, on chip (for hard errors) – Use SEC above, off chip (for soft errors) Conclusion

55 Backup Slide for Responding to Questions You didn’t expect we’d believe this… did you?

56 56 ? I’m sorry dear, but if that’s the best talk we’ll be capable of even after millions of years of evolution, I think its best we not reproduce.


Download ppt "Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures."

Similar presentations


Ads by Google