Tri-Level-Cell Phase Change Memory (PCM): Toward an Efficient and Reliable Memory System Nak Hee Seong Sungkap Yeo Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332 nakhee.seong@gmail.com {sungkap, leehs}@gatech.edu Presented By: Anand Dhole Shalini Satre
Contents PCM Background and Motivation Tri-Level-Cell (3LC) PCM 3LC PCM in Practice Evaluation Conclusion
PCM Background and Motivation Tri-Level-Cell (3LC) PCM 3LC PCM in Practice Evaluation Conclusion
Phase Change Memory (PCM) Promising alternative memory technology Two states Crystalline (SET) Amorphous (RESET) Multi-level-cell PCM Intermediate states Store more data per cell
Single Level Cell (SLC) [1] Set Reset High resistivity Low resistivity
SLC vs MLC Two Storage Levels Four Storage Levels 002 012 1 102 112 1 002 012 102 112 2LC or SLC = one bit per cell 4LC = two bits per cell
SLC PCM SET RESET i i t t # of Cells 103 106 103 Difference
MLC PCM # of Cells 1k 1M i i t i t t Storage Level 0 Level 1 SET RESET i i t i t t # of Cells Storage Level 0 Level 1 Level 2 Level 3 1k 1M
Error Model Critical problems Resistance Drift Soft errors Resistance of PCM cell increases over time Soft errors Not permanent failure Have solutions to resolve Soft error caused by resistance drift Error rate is proportional to initial resistance value Error rate is negligible in SLC PCM In MLC PCM, resistance drift at intermediate levels Iterative-writing mechanism Degrades write latency For 4LC, 4x~8x slower than that of SLC [1]
Programmed Boundaries Resistance Drift [1] T = 1 # of Cells SET RESET Storage Level 0 Level 1 Level 2 Level 3 Decision Boundaries Programmed Boundaries
Resistance Drift T = 2 # of Cells Storage Level 0 Level 1 Level 2 SET RESET Storage Level 0 Level 1 Level 2 Level 3
Resistance Drift T = 4 # of Cells Storage Level 0 Level 1 Level 2 SET RESET Storage Level 0 Level 1 Level 2 Level 3
Drift-induced Soft Errors!!! Resistance Drift T = 8 # of Cells SET RESET Storage Level 0 Level 1 Level 2 Level 3 Drift-induced Soft Errors!!!
Drifted Resistance Power Law Equation
Proposed Solution Proposed tri-level-cell PCM Soft error rate matches that of DRAM Gain performance of SLC PCM
PCM Background and Motivation Tri-Level-Cell (3LC) PCM 3LC PCM in Practice Evaluation Conclusion
Background and Motivation Flash Memory w.r.t. PCM Switching mem. ele. requires more voltage & time. Degrades more rapidly More susceptible to radiation PCM w.r.t NAND Better read/write latency. Consumes significantly less read/write energy. PCM Advantages Higher information density. Cheaper when in mass production.
Background and Motivation cont… MLC PCM Many intermediate states between SET and RESET E.g. 8LC PCM stores three bits per cell Soft error rate(SER) is higher than that of DRAM SER increases over time along with resistance Error correction Methods Time-aware error correction scheme Scrub mechanism
Background and Motivation cont… Time-aware error correction scheme [3] Uses extra cells for storing predefined reference resistance values While reading, reference values are used to compensate the resistance drift in corresponding cell. Reduced SER from 10-3 ~ 10-1 to 10-4 ~ 10-2
Background and Motivation cont… Scrub Mechanism [2] Reduced 99.6% of uncorrectable errors Memory controller spend more time in scrubbing DRAM-style self refresh [3] Cells with correct information also gets refreshed Higher chip-level power Frequent write decreases lifespan Slower responsiveness
PCM Background and Motivation Tri-Level-Cell (3LC) PCM 3LC PCM in Practice Evaluation Conclusion
3LC PCM Each cell has three storage levels Removed most error-prone state from 4LC PCM i.e. Third storage level Drift is proportional to resistance Removes errors generated by third as well as most of the errors generated by second storage level
3LC PCM ≠ three bits per cell Binary System Ternary System Two Storage Levels 1 Three Storage Levels 03 13 23 2LC or SLC = one bit per cell Four Storage Levels 3LC 002 012 102 112 ~ 1.5 bits per cell ≠ three bits per cell 4LC = two bits per cell Binary System Ternary System
PCM Background and Motivation Tri-Level-Cell (3LC) PCM 3LC PCM in Practice Evaluation Conclusion
3LC PCM 4-level cell PCM Tri-level cell PCM unreliable Removing the most error-prone state i i i t t t L2 L0 L1
Bandwidth Expanded 3LC PCM Relaxing programming range Reducing programming latency Increasing write bandwidth SET RESET i i t i t i or t t # of Cells L1 L1 L2 L0
Configuration variable of 4LC PCM Storage Levels Data Log10 R α µR ϭR µα ϭα 01 3.0 1/6 0.001 0.4 x µα 1 11 4.0 0.02 2 10 5.0 0.06 3 00 6.0 0.10 Configuration variable of 3LC PCM Storage Levels Log10 R α µR ϭR µα ϭα 3.0 1/6 0.001 0.4 x µα 1 4.0 0.02 2 6.0 0.10
Efficient Conversion Method [1] In theory 11 bits of binary = 2048 states 7 ternary cells = 2187 states ~94% utilization Proposed approach 3 bits of binary = 8 states 2 ternary cells = 9 states ~89% utilization Notation: <3,2> conversion
Number Mapping Method Binary Ternary 00 000 01 10 001 010 100 02 11 20 011 101 110 12 21 111 22 Binary Ternary
ECC for Tri-Level-Cell PCM Single Bit Error Single Bit Error Binary Ternary Legacy ECC for binary can be used Simple (72, 64) Hamming Code Memory controller requires minimal change
PCM Background and Motivation Tri-Level-Cell (3LC) PCM 3LC PCM in Practice Evaluation Conclusion
Drift Induced Error Rate Elapsed Time (s) 3LC PCM BE-3LC PCM BE-3LC PCM + (72,64) ECC 215 (9 hours) (too small) 220 (12 days) 3.60E-16% 225 (1 year) 1.28E-10% 2.66E-15%
Information Density Bits Per Cell Number of Correctable Bits Data block size- 256 bits Bits Per Cell Number of Correctable Bits
PCM Background and Motivation Tri-Level-Cell (3LC) PCM 3LC PCM in Practice Evaluation Conclusion
Conclusion [1] Results (over 4LC PCM) 105 lower soft error rates 36.4% performance improvement Results (over SLC PCM) 1.33x higher information density
References Nak Hee Seong, Sungkap Yeo, Hsien-Hsin S. Lee, "Tri-Level-Cell Phase Change Memory: Toward an Efficient and Reliable Memory System",ISCA'13 M. Awasthi, M. Shevgoor, K. Sudan, B. Rajendran, R. Balasubramonian, and V. Srinivasan, “Efficient Scrub Mechanisms for Error-Prone Emerging Memories,” in Proceedings of the International Symposium on High Performance Computer Architecture, 2012.vol. 19, no. 8, pp. 1357–1367, 2011 W. Xu and T. Zhang, “A time-aware fault tolerance scheme to improve reliability of multilevel phase-change memory in the presence of significant resistance drift,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 8, pp. 1357– 1367, 2011. T. Nirschl, J. Phipp, T. Happ, G. Burr, B. Rajendran, M. Lee, A. Schrott, M. Yang, M. Breitwisch, C. Chen et al., “Write strategies for 2 and 4-bit multi-level phase-change memory,” in IEEE International Electron Devices Meeting (IEDM), 2007, pp. 461– 464. N. Papandreou, H. Pozidis, T. Mittelholzer, G. Close, M. Breitwisch, C. Lam, and E. Eleftheriou, “Drift-tolerant multilevel phase-change memory,” in 2011 3rd IEEE International Memory Workshop (IMW). IEEE, pp. 1–4. R. Hamming, “Error detecting and error correcting codes,” Bell System Technical Journal, vol. 29, no. 2, pp. 147–160, 1950.