Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, Onur Mutlu

Similar presentations


Presentation on theme: "Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, Onur Mutlu"— Presentation transcript:

1 Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery
Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, Onur Mutlu Carnegie Mellon University, LSI Corporation 2015 IEEE 21st International Symposium on High Performance Computing Architecture (HPCA 2015) Hello everyone, my name is Nicolas Wicki and today I will present to you the paper I prepared: Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery written by these five people in collaboration with Carnegie Mellon University and LSI Corporation. Carnegie Mellon is a university located in Pittsburgh, Pennsylvania and LSI is a company designing semiconductors and software for data storage based in San Jose, California. It was published at the HPCA 2015.

2 Why flash memory? Nicolas Wicki
First of all, why would we even want to research flash memory?

3 Devices using Flash Memory
Smartphone USB flash stick Solid State Drive Computer Devices using Flash Memory Flash memory is widely deployed. For example in USB flash sticks, solid state drives, smartphones, computers, etc. Considering that almost everyone in this room is using multiple mentioned devices, one can say that the importance of flash memory is assured and it might pay off to look at it in deeper detail. For that, I will give you some theoretical background to understand the technicalities surrounding this paper.

4 NAND Flash Memory Flash Memory Controller Device
We erase in blocks consisting of multiple pages. We program and read in pages. Block Page 0 Page 255 When we use flash memory, we normally have some device for example a USB stick or an SSD. This device consists of raw flash memory chips and a flash controller. Erasing from NAND flash memory is done in blocks and programming or reading in pages. When we want to write an already written page, we copy all data from it to an erased page and append our data. After programming we erase the old page. All of these operations are executed and managed by the flash controller who will use error correction codes to correct wrong data. 4-16 KB

5 NAND Flash Memory Flash Memory Controller Device
We erase in blocks consisting of multiple pages. We program and read in pages. Block 1010 10 10 1111 When we use flash memory, we normally have some device for example a USB stick or an SSD. This device consists of raw flash memory chips and a flash controller. Erasing from NAND flash memory is done in blocks and programming or reading in pages. When we want to write an already written page, we copy all data from it to an erased page and append our data. After programming we erase the old page. All of these operations are executed and managed by the flash controller who will use error correction codes to correct wrong data.

6 NAND Flash Memory Flash Memory Controller Device
We erase in blocks consisting of multiple pages. We program and read in pages. Block 1010 10 1010 When we use flash memory, we normally have some device for example a USB stick or an SSD. This device consists of raw flash memory chips and a flash controller. Erasing from NAND flash memory is done in blocks and programming or reading in pages. When we want to write an already written page, we copy all data from it to an erased page and append our data. After programming we erase the old page. All of these operations are executed and managed by the flash controller who will use error correction codes to correct wrong data.

7 NAND Flash Memory Flash Memory Controller Device
We erase in blocks consisting of multiple pages. We program and read in pages. After programming we erase old copied pages. Flash controller manages operations. ECC controller corrects data. Block 1111 ECC 1010 When we use flash memory, we normally have some device for example a USB stick or an SSD. This device consists of raw flash memory chips and a flash controller. Erasing from NAND flash memory is done in blocks and programming or reading in pages. When we want to write an already written page, we copy all data from it to an erased page and append our data. After programming we erase the old page. All of these operations are executed and managed by the flash controller who will use error correction codes to correct wrong data.

8 Substrate Flash Memory Cells Source Drain +10 V Control Gate
A cell stores charge as electrons in the floating gate. We program cells by applying a high positive voltage to our control gate. The positive charge attracts electrons through the tunnel oxide from the substrate. Gate Oxide Floating Gate Tunnel Oxide Substrate Here, we have a flash memory cell which stores charge in the floating. If we want to program our cell, we apply a high positive voltage to our control gate which will attract electrons from the substrate. These electrons pass through the tunnel oxide with help of a tunneling effect to the floating gate.

9 Substrate Erasing Cells Source Drain -20 V Control Gate Floating Gate
We erase cells by applying a high negative voltage. Gate Oxide Floating Gate Tunnel Oxide Substrate Here, we have a flash memory cell which stores charge in the floating. If we want to program our cell, we apply a high positive voltage to our control gate which will attract electrons from the substrate. These electrons pass through the tunnel oxide with help of a tunneling effect to the floating gate.

10 Trap Assisted Tunnelling
Source Drain Control Gate Repeated program/erase cycles trap electrons in the tunnel oxide. An electric field is created by the trapped charge. Charge from the floating gate leaks to the substrate Gate Oxide Floating Gate Tunnel Oxide Substrate Here, we have a flash memory cell which stores charge in the floating. If we want to program our cell, we apply a high positive voltage to our control gate which will attract electrons from the substrate. These electrons pass through the tunnel oxide with help of a tunneling effect to the floating gate.

11 Substrate Charge De-Trapping Source Drain Control Gate Floating Gate
Trapped charge leaves tunnel oxide. Increases floating gate charge. Gate Oxide Floating Gate Tunnel Oxide Substrate Here, we have a flash memory cell which stores charge in the floating. If we want to program our cell, we apply a high positive voltage to our control gate which will attract electrons from the substrate. These electrons pass through the tunnel oxide with help of a tunneling effect to the floating gate.

12 Substrate Charge De-Trapping Source Drain Control Gate Floating Gate
Trapped charge leaves tunnel oxide. Increases floating gate charge. Or decreases if de-trapping towards the substrate. Gate Oxide Floating Gate Tunnel Oxide Substrate Here, we have a flash memory cell which stores charge in the floating. If we want to program our cell, we apply a high positive voltage to our control gate which will attract electrons from the substrate. These electrons pass through the tunnel oxide with help of a tunneling effect to the floating gate.

13 Substrate Charge De-Trapping Source Drain Control Gate Floating Gate +
Trapped charge leaves tunnel oxide. Increases floating gate charge. Or decreases if de-trapping towards the substrate. Leaves a positive charge that attracts charge from the floating gate. Gate Oxide Floating Gate + + Tunnel Oxide + Substrate Here, we have a flash memory cell which stores charge in the floating. If we want to program our cell, we apply a high positive voltage to our control gate which will attract electrons from the substrate. These electrons pass through the tunnel oxide with help of a tunneling effect to the floating gate.

14 Substrate Charge De-Trapping Source Drain Control Gate Floating Gate
Trapped charge leaves tunnel oxide. Increases floating gate charge. Or decreases if de-trapping towards the substrate. Leaves a positive charge that attracts charge from the floating gate. Gate Oxide Floating Gate Tunnel Oxide Substrate Here, we have a flash memory cell which stores charge in the floating. If we want to program our cell, we apply a high positive voltage to our control gate which will attract electrons from the substrate. These electrons pass through the tunnel oxide with help of a tunneling effect to the floating gate.

15 Retention Loss 1 Cell Cell 1 Use charge as indicator for bit values.
1 Read Reference Voltage Use charge as indicator for bit values. Assign 0 to a high charge and 1 to a low charge. Read reference voltage separates differently charge cells. Charge leaks over time caused by trap assisted tunnelling or charge de-trapping. Changed values introduce retention errors. Cell Cell 1 over Time multiple P/E cycles To store bits in cells, we use charge as an indicator. For that we assign 0 to a cell with high charge and 1 to a cell with low charge. The read read reference voltage separates differently charged cells into 0 and 1 values. Over time and multiple program/erase cycles, cells leak charge and value interpretations change resulting in retention loss.

16 Threshold Voltage Distribution
Probability Density Function 1 Threshold Voltage Vref 1 Threshold Voltage Distribution Now, we look at threshold voltage distributions. Here, we have the relative quantity of cells in relation to the corresponding threshold voltage level. If we sample the voltage levels of flash cells, we will get a distribution where most values are near the mean resembling a Gaussian distribution. We define a low voltage as a bit value of 1 and a high voltage as a bit value of 0. We have the read reference voltage V ref to separate the two values.

17 Threshold Voltage Distribution in Multi Level Cell Flash Memory
1 Probability Density Function Threshold Voltage Vref-1 Vref-2 Vref-3 11 10 00 01 Threshold Voltage Distribution in Multi Level Cell Flash Memory If we now consider multi level cells, we can store 2 bits per cell resulting in 4 distributions, one per bit combination. To distinctively read the voltage levels, we need now 3 read reference voltages V ref 1 to 3 indicating the border between each level.

18 Read-Retry No Yes Read page from flash memory
ECC Adjust read reference voltage No Data corrected? Since we might fail reading data with a fixed read reference voltage, modern NAND flash memory uses a technique called read retry. This technique works as follows: We read the page from memory with a fixed read reference voltage. Then, we pass it to our error correction code controller. We control if the data is correct, but if the error correction was not successful we increase or decrease our read reference voltage and retry the error correction. Otherwise, we can forward the data. Forward Data Yes

19 Executive Summary Problem Goal Method Result
Density of flash memory rises and diminishes lifetime. Correcting errors increases read latency. Goal Deepen understanding of voltage threshold distributions of flash memory. Improve both lifetime and system performance. Recover non-correctable data. Method Retention Optimized Reading Improved Read-Retry Retention Failure Recovery Result Lifetime improvement by 64%. Read latency reduction by 70.4%. Raw bit error rate drop by 50%. Now, I will present to you a short summary about this paper. The problem is that the density of flash memory continuously rises which leads to diminishing lifetime. They are correcting error-prone data which constitutes performance loss. The goal of this paper is to deepen their understanding of voltage threshold distributions of flash memory. They also want to improve lifetime and system performance, and be able to recover non-correctable data. For that, they introduce two mechanisms: Retention Optimized Reading and Retention Failure Recovery. And from that they get a lifetime improvement by 64%, a performance improvement by 70.4% and a raw bit error rate drop by 50%.

20 Problem Multi level cell Lifetime Read Latency
Higher error rate due to smaller threshold windows. Lifetime Retention errors: Limit the time flash memory can be read from. May lead to loosing data. Read Latency Introduce overhead by error correction codes. Increase number of read-retries. So, what are the problems this paper attempts to solve? Flash memory lifetime is limited through retention errors. Since we are correcting errors, we get an overhead by using error correction codes and read-retries. Therefore, the overall memory read latency is negatively affected.

21 Goal Building a strong understanding, characterization, and analysis of threshold voltage distribution over retention age. Introduce a dynamic technique improving lifetime and read latency. Devise a new mechanism to recover non-correctable data. What goal do we pursue with this paper? First, we would like to build a strong understanding, characterization, and analysis of how the threshold voltage distribution of flash memory distorts over retention age. We want to introduce a dynamic technique which improves lifetime and performance of flash memory. And we want to devise a new additional mechanism to recover data that can otherwise not be corrected by error correction code.

22 FPGA-Based Flash Memory Testing Platform
Different amounts of program/erase cycles for multiple groups of flash memory. Data of retention ages ranging from 0 to 40 days. All experiments were conducted under room temperature (20°C). In the paper, they used this FPGA-based flash memory testing platform for their experiments. They used different amounts of program/erase cycles for multiple groups of flash memory and recorded data for retention ages from 0 to 40 days. All experiments were conducted under room temperature at 20°C. Source: Y. Cai et al., "FPGA-Based Solid-State Drive Prototyping Platform", FCCM 2011

23 Retention Optimized Reading
Nicolas Wicki Retention Optimized Reading Now, I will present to you the first mechanism devised in this paper called Retention Optimized Reading.

24 Threshold Voltage Distribution over Time
1 Probability Density Function 3 read reference voltages 4 states Erased P1 P2 P3 11 10 00 01 Vref-1 Vref-2 Vref-3 Threshold Voltage Distribution over Time First, I start by introducing their approach. This is our graph from before. Now, we give each of these states a name. We will call them Erased, P1, P2, and P3. Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo

25 Threshold Voltage Distribution over Time
1 Probability Density Function Erased state distribution can be neglected. P1 P2 P3 Vref-2 Vref-3 10 00 01 Threshold Voltage Threshold Voltage Distribution over Time For further simplification, we will ignore the erased state, because retention loss has close to no effect on the read reference voltage between states Erased and P1. Over time the distributions change which leads to widening and shifts to the left of the distributions. Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo

26 Threshold Voltage Distribution over Time
1 Probability Density Function Distribution shifts cause raw bit errors. P1 P2 P3 Vref-2 Vref-3 10 00 01 Threshold Voltage Raw Bit Errors Threshold Voltage Distribution over Time Now, threshold voltages are wrongly read to be in a different state which leads to raw bit errors. It is important to note that the left shift of the P3 distribution is more pronounced than the left shift of the P2 distribution. Also, the P1 distribution almost stays the same. The shift amount is related to the amount of charge stored in the cells where more charge leads to stronger leakage. Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo

27 Threshold Voltage Distribution over Time
1 Probability Density Function Optimized read reference voltages minimize raw bit errors. P1 P2 P3 OPT2 Vref-2 OPT3 Vref-3 10 00 01 Threshold Voltage Minimal raw bit error Threshold Voltage Distribution over Time To counteract the retention loss, we adjust our read reference voltages V ref 2 and 3. These newly adjusted optimal read reference voltages will be referred to as OPT 2 and 3. This introduces minimal raw bit errors. Source: Slides adapted from Data Retention in MLC NAND Flash Memory… Yixin Luo

28 Retention Optimized Reading
read last page; use Vdefault use Vdefault; Vdefault++ #errors > record ECC; if #errors < record ECC; if #errors < record new Vdefault = Vrecord record = voltage, errors record = voltage, errors Now, I will explain to you how retention optimized reading works. We use the default read reference voltage to read the last page of a block. We check with our error correction code the number of errors. If we had less errors than before, which is always the case the first time we read, we record the number of errors and the corresponding voltage. Then we decrease our voltage and repeat this process. If we encounter an error increase, we start from our default voltage again, and analogously to before iteratively INCREASE the voltage. If we again encounter an error increase, we stop and store the now optimized voltage as our default voltage. Vdefault-- Vdefault++

29 Decrease threshold voltage
Improved Read-Retry OPT is from last page of a block and cells have lowest retention age.  smallest charge leakage Read page with OPT ECC Decrease threshold voltage No Data corrected? You already know the classic approach by read-retry. Now, we have some subtle differences. We use an optimized read reference voltage. We still send our data to the error correction code and check if the data is corrected. If not, we can now simply decrease the voltage instead of doing decrease and increase. This is because, if you remember, we chose the last page of a block as our OPT which will have the highest charge since it was programmed last. Then, we repeat until we get readable data and forward the data. Forward Data Yes

30 Raw Bit Error Rate to Program/Erase Cycles
ECC threshold OPT = optimized read reference voltage Raw Bit Error Rate to Program/Erase Cycles Here, we have the raw bit error rate in comparison to the program/erase cycles. We read 7-day old data with 0 to 7 day optimized read reference voltages. In this graph, we observe the increase of raw bit errors over the number of program/erase cycles. The first window called Stage-0 includes program erase cycles from 0 to about This is the point from where on our error correction code cannot correct the 7-day old data read by the 0-day OPT. Note, that the 0-day OPT is the same as a fixed read reference voltage and that in this windown ECC can correct all raw bit errors such that no read-retry is needed. Then, we transition to Stage-1 reaching from to p/e cycles where our 7-day optimized read reference voltage reaches its limitations. Stage-2 contains raw bit error rates not correctable by ECC and therefore out of reach for us. Source: Y. Cai et al., “Data retention in MLC NAND flash memory:… “ in IEEE 21st Int. Symp. HPCA, 2015

31 Evaluation Stage-0 Stage-1
Both provide 64% lifetime increase over baseline. 2.4% latency reduction compared to baseline 70.4% latency reduction compared to naïve read-retry P/E cycles x1000 Percent Points Now I will show you the results of this paper. We compared 3 techniques to read data from flash memory, the fixed read reference voltage, naive read-retry, and retention optimized reading. It showed that naïve read-retry and read optimized reading result in a lifetime improvement of 64% each compared to the approach using a fixed read reference voltage. When we consider performance, we look at the situation of our approach in Stage-0 and Stage-1. In Stage-0, we achieved a performance improvement of only 2.4% compared to the fixed read reference voltage. Remember, in Stage-0 all errors are correctable by our error correction code and no read-retries needed. In Stage-1, however we achieve a performance improvement by 70.4% over naïve read-retry. Stage-0 Stage-1 Nicolas Wicki

32 Evaluation We have a storage overhead of 768 KB out of 512 GB.  % overhead Execution overhead depends on program/erase cycles, retention age and amount of data written. Retention Age P/E Cycles Latency 1 day 8000 3 seconds 7 days 15 seconds 30 days 23 seconds This leads to a storage overhead of 768 KB out of 512 GB. When we look at the performance overhead, we have to differentiate between the number of p/e cycles, the retention age and the amount of data written. Here, we consider 1, 7 and 30 day old data each with 8000 p/e cycles and assuming the whole SSD is written. This will lead to pre-optimize latencies of 3, 15 and 23 seconds respectively. Note that we run the algorithm only when starting up the flash memory and not for every read. Assuming flash capacity is full (512 GB).

33 Retention Failure Recovery
Let me introduce the second mechanism: Retention Failure Recovery

34 Fast and Slow Leaking Cells
Fast Leaking Cells Separate cells into fast and slow leaking cells. Over the same time t fast leaking cells leak more charge than slow leaking cells. Threshold separating cells is the average threshold voltage shift. Time t Cell Cell Average Threshold Voltage Shift Slow Leaking Cells To explain the mechanism, we first need to classify fast and slow leaking cells. We record the threshold voltage shifts of our cells over a time t. Fast leaking cells loose more charge over that time than slow leaking cells. Every cell with a higher charge shift than our average threshold voltage shift counts as a fast leaking cell and every cell below as a slow leaking cell. Time t Cell Cell

35 Retention Failure Recovery
OPT P2 P3 Failed Data Backup Data Find Risky Cells 4 Cells correctly in P3 1 Cells correctly in P2 Find Fast & Slow Leaking Cells 3 Slow leaking cells from P2 wrongly in P3 2 Fast leaking cells from P3 wrongly in P2 Flip type 2 and 3 cells Let me now present to you the mechanism Retention Failure Recovery. From the states Erased to P3, we only consider misread cells between states P2 and P3 since the other voltage shifts are unremarkable. We first determine all data that was not correctable by our error correction code. We store a backup copy of the failed data. Then, we find all risky cells. Risky cells lie around the optimized read reference voltage. There are cells correctly read in the state P2, fast leaking cells read to be in P2, but should actually be in P3, slow leaking cells read to be in P3, but actually of state P2, and cells correctly read to be in P3. Since we found the risky cells, we need to identify the fast and slow leaking cells. For that, as already explained before, we observe the behavior over time and are then able to identify fast and slow leaking cells. Now, we know now which cells are of type 2 or 3 and flip them since the probability is high that they were wrongly read.

36 Evaluation Raw bit error rate is expected to drop by 50%.
Retention Failure Recovery Raw bit error rate is expected to drop by 50%. Evaluation How does this mechanism perform? Let me first explain the evaluation graph. Here, we have the raw bit error rate, compared to the number of program/erase cycles. We expect retention failure recovery to reduce the raw bit error rate by 50%. Source: Y. Cai et al., “Data retention in MLC NAND flash memory:… “ in IEEE 21st Int. Symp. HPCA, 2015

37 Executive Summary Problem Goal Method Result
Density of flash memory rises and diminishes lifetime. Correcting errors increases read latency. Goal Deepen understanding of voltage threshold distributions of flash memory. Improve both lifetime and system performance. Recover non-correctable data. Method Retention Optimized Reading Improved Read-Retry Retention Failure Recovery Result Lifetime improvement by 64%. Read latency reduction by 70.4%. Raw bit error rate drop by 50%. Let us do a little recap of what they have worked on during this paper. The problem is that the density of flash memory continuously rises and that this diminishes flash memory lifetime. And that we need to correct error-prone data which constitutes performance loss. Our goal was to deepen our understanding of voltage threshold distributions of flash memory. We improved lifetime and system performance, and are now able to recover before non-correctable data. For that, we introduced two mechanisms: Retention Optimized Reading and Retention Failure Recovery. And from that we got a maintained lifetime improvement of 64%, but a performance improvement for older data of 70.4% and an expected raw bit error rate drop by 50%.

38 Strengths Retention optimized reading enhances memory lifetime under low overhead. Retention failure recovery decreases raw bit error rate. Mechanisms complement each other, but can be implemented individually. We may adjust ECC capabilities to increase power efficiency. Paper Presents a simple and intuitive algorithm. Conducts research with high potential impact. Now, we will analyze this paper for a bit. First, I will talk about the strengths of this paper. It presents a simple and intuitive algorithm. Retention optimized reading enhances memory lifetime under low overhead costs. The other mechanism, retention failure recovery, improves error correction code capabilities. We can use both mechanisms simultaneously complementing each other or use them individually. Since raw bit errors are being reduced we don’t need error correction code capabilities to be as extensive. And I think this paper conducts research in a field with high potential impact.

39 Weaknesses How does temperature affect threshold voltage shifts?
How many flash memory devices were used? How does retention failure recovery affect storage overhead? The paper is hard to understand in detail and covers a lot of topics. Why was retention optimized reading not compared to adaptive voltage threshold?1 The paper has many similarities with previously published papers. Figure explanations are quite sparsely provided. I also need to mention some weaknesses that occurred to me. We conducted our experiments under room temperature, but how do increased temperatures affect threshold voltage shifts? I found the paper hard to understand in detail and it covers a wide variety of topics. In the introduction of the paper, they mentioned other mechanisms trying to solve the same problems as retention optimized reading. Why were they not compared during the evaluation? Also, I had a hard time understanding the figures since some information was omitted. 1Papandreou et al., ”Using Adaptive Read Voltage Thresholds to Enhance the Reliability of MLC NAND…”, Proceedings of the 24th edition of the great lakes symposium on VLSI, 2014

40 Key Takeaways Retention errors limit flash memory lifetime.
Read-retry increases read latency. We gained a clear understanding of threshold voltage distributions. Retention optimized reading improves lifetime and read latency. Retention failure recovery reduces errors. Now, let us consider some key takeaways. We know that retention errors drastically limit flash memory lifetime. Read-retry increases read latency. We should now understand how threshold voltage distributions behave over time and program/erase cycles. Retention optimized reading does improve lifetime and reduces read latency. Retention Failure Recovery reduces the amount of errors.

41 Open Discussion In what order should we assign our 2 bit values to our 4 states? They are often assigned this way: Erased - 11, P1 - 10, P2 - 00, P Because if the threshold voltage were to shift to the left we only get one bit error. OPT 00 11 I am at the end of my presentation and want to encourage you to ask me any questions you might have. If no questions remain, I will ask you some questions. In what order should we assign our 2 bit values to our 4 states? Is there any benefit from different arrangements? They are assigned as shown. Why is that? This is because we want to avoid off by two bit errors. This will happen if we place for example the values 00 and 11 next to each other. If the threshold voltage distribution shifts to the left, we will get faulty data leading to two bit errors at once.

42 Open Discussion How should we assign our 2 bit values to pages? 01 00
10 11 Cell Row Index LSB of the 217 cells MSB of the 217 cells Page 0 Page 2 1 Page 1 Page 4 2 Page 3 Page 6 127 Page 253 Page 255 Another question is, how we should assign our 2 bit values to pages. Are there any ideas? In MLC flash memory architecture, we assign a page to the most significant bit of a cell and another page to the least significant bit. In this table you see how a whole block of 255 pages are assigned to the LSB and MSB in MLC flash memory. Part of this arrangement is caused by the intention to avoid cell-to-cell interference which occurs when we program a page. As long as only one bit is written, the error rate is similar to single level cell. Source: Table adapted from Wang, Wei, et al. "Reducing MLC flash memory retention errors through programming initial step only.”, MSST 31st Symposium on. IEEE, 2015

43 Open Discussion We have seen that reducing the number of read-retries has a great impact on read latency. Can you think of yet another method to reduce the number of read-retries? My idea would be to use binary search implemented into our current read-retry mechanism.

44 Decrease/increase threshold voltage
Improved Read-Retry Read page with OPT ECC Decrease/increase threshold voltage by half No Data corrected? Forward Data Yes

45 Additional Papers Bez et al., “Introduction to Flash Memory”, 2003
Cai et al., “Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory Lifetime”, 2012 Cai et al., “Error Analysis and Retention-Aware Error Management For NAND Flash Memory”, 2013 Cai et al., “Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization, Analysis, and Modeling”, 2013 Papandreou et al., “Using Adaptive Read Voltage Thresholds to Enhance the Reliability of MLC NAND Flash Memory Systems”, 2014 Aslam et al., “Read and Write Voltage Signal Optimization for Multi-Level-Cell (MLC) NAND Flash Memory”, 2016 Coutet et al., “Influence of temperature of storage, write and read operations on multiple level cells NAND flash memories”, 2018 If you are particularly interested in my topic I can recommend some further papers covering similar topics from before this paper was published, but also following research in this field. A lot of these were referenced in my paper, so reading them is not too much an effort if you already understand what I presented.

46 Big Thanks to Giray & Mohammed for their support.
No, really, thanks.

47 Backup Slides Why is it called flash? Fujio Masuoka’s colleague Shoji Ariizumi suggested the word "flash" because the erasure process reminded him of the flash of a camera.

48 Flash Correct-and-Refresh
Choose a block to be refreshed Yes Read page Read page with fixed read reference voltage. Error correction informs about range of actual voltage threshold. Identify cells in a wrong state. Identify right shift errors and left shift errors. Left shift errors are caused by retention loss. Right shift errors are cause by cell-to-cell interference when programming other cells. Error correction Page number++ No Cell threshold voltage comparison Re-program in place Yes Last page in block? Until now, we used flash correct and refresh to extend the lifetime of our flash memory. At first, we start by choosing a block we want to refresh. Then we read the first page from that block with our fixed read reference voltage. By using error correction codes, we identify cells with faulty charge. From that we can deduct the threshold voltage the faulty cells should have. We classify the errors as left shift errors caused by retention errors or as right shift errors caused by programming interference. If the amount of right shift errors is below a certain threshold, we re-program the cells in place. Otherwise we re-map that page to a new block. If this was the last page, we choose a new block, otherwise we increment the page number and move on. At the time it was first implemented, they recorded a lifetime improvement by a factor of 46. Question to my TA’s: How come we can here increase the threshold voltage of single cells but not generally erased cells? Is it because we do not know which ones are erased and which ones are written? Maybe this slide is not really needed? # Right shift errors < threshold Re-map to the new block No Source: Figure adapted from Y. Cai et al., “Flash Correct-and-Refresh:…“, 2012.

49 Fast and Slow Leaking Cells
To understand retention failure recovery, we need to know how to characterize slow and fast leaking cells. For that we record threshold voltage shifts of cells over 40 days. In this graph, we have the average threshold voltage shift in the y axis and the number of days in the x axis. We only look at the P2 and the P3 state since the P1 state changes unremarkably.


Download ppt "Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, Onur Mutlu"

Similar presentations


Ads by Google