Presentation is loading. Please wait.

Presentation is loading. Please wait.

Use ECP, not ECC, for hard failures in resistive memories

Similar presentations


Presentation on theme: "Use ECP, not ECC, for hard failures in resistive memories"— Presentation transcript:

1 Use ECP, not ECC, for hard failures in resistive memories
S. Schechter, G. H. Loh, K. Strauss, and D. Burger Presented at ISCA 2010

2 ECP: Error Correction Pointers
With equivalent space overhead, ECP provides longer life time … for byte addressable NVM (such as PCM) … where all of the bit errors are wear-induced hard error … assuming hard errors are detected immediately after write

3 ECP scheme (Fig. 1)

4 Evaluation set up Schemes compared Experiment environment
SEC: widely used ECC in DRAM Pairing: pair partially failed pages Wilkerson: similar to ECP Perfect code: theoretical limit on ECC Experiment environment System description Used a simple in-house simulator with inter-row wear-leveling Memory is organized as 512-bit rows Measured number of surviving pages in system over number of writes Page failing over time exacerbate wear on surviving pages Cell lifetime modelling Random life time is assigned for each cell with normal distribution of mean: 108, variance: 0.25 Synthetic workload pattern Actual write region is narrower than 512-bit row For each bit within the write region, 50% chance of bit change For ECC based protection, any bit change in data causes 50% chance of bit change for ECC parities

5 Evaluation results (Fig. 3 and 4)

6 Other issues addressed in this work
Intra-row wear-leveling Comparison to optimal ECP Layering ECP Optimizing memory for correction Discussion Hardware implementation Orthogonality and composability Transient errors

7 Our take aways (1) Problems/solutions associated with hard errors in this work are not directly applicable to our domain because… No in-place update allowed ECC does not exacerbate wear Bit randomization used as intra-row wear-leveling Transient errors are still a big issue in flash Cannot determine the exact location of stuck-at fault cell We do not have an adequate solution for dealing with hard errors at the moment Program/erase reports “fail” whenever a stuck-at fault is detected … and then FTL maps out the entire block ECC remains inefficient in dealing with hard errors

8 Our take aways (2) What’s the space efficiency for data protection in our domain? For 24 bit protection for 8192 bit data ECC (our current BCH engine) 336 bits of parity (42B) ECP 337 bits of parity (43B) Perfect coding (theoretical bound) 234 bits of parity (30B)

9 Our take aways (3) Soft error vs. hard error
Effective capacity vs. over-provisioning Wear-leveling in the presence of replaceable components

10 Wilkerson’s scheme (Fig. 2)

11 Intra-row wear-leveling (Fig. 5)

12 Intra-row wear-leveling (Fig. 6)

13 Optimal ECP (Table 3)

14 Layered ECP (Fig. 7)

15 Layered ECP (Table 4)

16 Hardware implementation (Fig. 9)


Download ppt "Use ECP, not ECC, for hard failures in resistive memories"

Similar presentations


Ads by Google