Embedded System Lab. Daeyeon Son Understanding the robustness of SSDs under power fault Mai Zheng*, Joseph Tucek**, Feng Qin*, Mark Lillibridge** The Ohio.

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

Paper by: Yu Li, Jianliang Xu, Byron Choi, and Haibo Hu Department of Computer Science Hong Kong Baptist University Slides and Presentation By: Justin.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Thank you for your introduction.
Journaling of Journal Is (Almost) Free Kai Shen Stan Park* Meng Zhu University of Rochester * Currently affiliated with HP Labs FAST
File Systems.
Lecture Objectives: 1)Explain the limitations of flash memory. 2)Define wear leveling. 3)Define the term IO Transaction 4)Define the terms synchronous.
International Conference on Supercomputing June 12, 2009
Boost Write Performance for DBMS on Solid State Drive Yu LI.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
Embedded Real-Time Systems Design Selecting memory.
Chapter 12 File Management Systems
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Yu Cai1, Erich F. Haratsch2 , Onur Mutlu1 and Ken Mai1
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
Origianal Work Of Hyojun Kim and Seongjun Ahn
13.6 Representing Block and Record Addresses
Motivation SSDs will become the primary storage devices on PC, but NTFS behavior may not suitable to flash memory especially on metadata files. When considering.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu
2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.
Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:
Embedded System Lab. Jung Young Jin The Design and Implementation of a Log-Structured File System D. Ma, J. Feng, and G. Li. LazyFTL:
XP Practical PC, 3e Chapter 6 1 Protecting Your Files.
Embedded System Lab. Daeyeon Son Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories Yu Cai 1, Gulay Yalcin 2, Onur Mutlu 1, Erich F. Haratsch.
Wei-Shen, Hsu 2013 IEE5011 –Autumn 2013 Memory Systems Solid State Drive with Flash Memory Wei-Shen, Hsu Department of Electronics Engineering National.
RL78 Code & Dataflash.
Chapter 5 Input/Output 5.1 Principles of I/O hardware
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Lecture 22 SSD. LFS review Good for …? Bad for …? How to write in LFS? How to read in LFS?
Carnegie Mellon University, *Seagate Technology
Chapter 8: Installing Linux The Complete Guide To Linux System Administration.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Data Retention in MLC NAND FLASH Memory: Characterization, Optimization, and Recovery. 서동화
Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.
Embedded System Lab. 최 진 화최 진 화 Kilmo Choi 최길모 A Study of Linux File System Evolution L. Lu, A. C. Arpaci-Dusseau, R. H. ArpaciDusseau,
Carnegie Mellon University, *Seagate Technology
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
File-System Management
Internal Parallelism of Flash Memory-Based Solid-State Drives
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
What you should know about Flash Storage
Multiple Platters.
CS 554: Advanced Database System Notes 02: Hardware
Introduction I/O devices can be characterized by I/O bus connections
Disks.
RAID RAID Mukesh N Tekwani
Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques Yu Cai, Saugata Ghose, Yixin Luo, Ken.
COS 518: Advanced Computer Systems Lecture 8 Michael Freedman
Hadoop Technopoints.
Overview Continuation from Monday (File system implementation)
TECHNICAL SEMINAR PRESENTATION
PARAMETER-AWARE I/O MANAGEMENT FOR SOLID STATE DISKS
Introduction to Computer Systems
RAID RAID Mukesh N Tekwani April 23, 2019
Chapter 13: I/O Systems.
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Introduction to Operating Systems
Seminar on Enterprise Software
Presentation transcript:

Embedded System Lab. Daeyeon Son Understanding the robustness of SSDs under power fault Mai Zheng*, Joseph Tucek**, Feng Qin*, Mark Lillibridge** The Ohio State University*, HP Labs** Daeyeon Son

Daeyeon Son Embedded System Lab. Introduction Power failure problem has been caused in many environments to use the computer. Computer including disk is basically operated by electric power. In the SSD case, electron in the floating gate on flash cell is served by threshold voltage. During data transfer to flash cell, if blackout is occurred reliability of data is not guaranteed. In this paper, it can to find the problem of data state upon power failure situation in variety commodity SSDs.

Daeyeon Son Embedded System Lab. Power fault About the power fault  The cause of power fault are very variety and different.  In blackout situation, if datacenter didn’t prepare the uninterruptable power supply, computer in datacenter will be the shutdown momently.  Data in disk will be unstable and lost in the blackout. Ref. Southern California Edison +

Daeyeon Son Embedded System Lab. Power fault Architecture of a solid-state drive  Solid-state drive has firmware about operating in device.  ‘SSD Controller’ is central position on SSD for transfer the data.  Data of the host is stored by ‘SSD Controller’ to flash memory.  ‘FTL(File Translation Layer)’ is applied by firmware in the SSD.  User-level program can’t trace about file translation layer on device-level.

Daeyeon Son Embedded System Lab. Power fault Step of transfer the data in SSDs  Power fault can occur anywhere on data processing step of SSDs.  But, it can't know the error point of device without seeing the data flow design exactly. It’s only abstraction the flow of data in SSD. Time Map table sync from DRAM to NAND Sync Roll back up to this point Aggressive roll back point Map table updates in DRAM Updates Next Sync Power Loss

Daeyeon Son Embedded System Lab. Power fault SSDs are not ‘Evangelion’. However, it has a battery(Super cap.) similarly.  When cut the power by the sudden power supply in computer, SSDs use battery for flush data from cache to flash cell.  Data emergency transfer time is very short and it is only use to flush the data for apply to FTL.  But, If data will be not written in the SSDs unfortunately, The manager of computer will cry above the desk. SSDs can transfer data during sometime in power fault situation. Evangelion can move during 5 minutes in power fault situation.

Daeyeon Son Embedded System Lab. Symptom of power fault in SSDs In this paper, authors was guessing about type of error the power fault in SSDs.  If data in devices is not protected, it is synchronization problem. 1. Any data is not written by power loss. 2. Metadata of data is crashed like the mapping table problem. 3. Electron in floating gate is not enough on target value. 4. The data writing is not completed due to work like a sponge. 5. Electric supply error caused by hardware problem. How to observe the realtime power fault situation?

Daeyeon Son Embedded System Lab. Symptom of power fault in SSDs Detail Explanation (1)

Daeyeon Son Embedded System Lab. Symptom of power fault in SSDs Detail Explanation (2)

Daeyeon Son Embedded System Lab. Symptom of power fault in SSDs Unserializability  Identifying the ‘Unserializability’ type needs more than two records.  An unserialized write is a violation of the synchronous constraint of write requests issued by on thread or between multiple threads.

Daeyeon Son Embedded System Lab. Test Workload Workload Format  Workload header length is the 512 bytes and all record has continuous arrange of header. (512 bytes is programming unit in some SSDs)  Seed is used for input variable into random number to product raw block number.  Raw block number is the original 64bit random number.  Block number is ‘LBN(Logical Block Number)’ that be able to check in user-level on system architecture. (Workload size = 512 byte * N)  Worker ID is tid of the threads that use for multiple I/O.  Operation Count is the writing count on worker identifier.

Daeyeon Son Embedded System Lab. Test Workload Workload Format  Checksum is made by ‘CRC(Cyclic Redundancy Check)’, it can check the original data state.  Timestamp is record of written time the data.  Marker is unique string for check boundary of header.  But, the standard random number generation algorithm is affected by time. And, time is not stable.  It use the random number on hash function in counter mode for generate unique sequence. Seed : 3, 8, 2, 4, 2, 5, 7, ………, 9, 1 Worker ID : 0, 0, 1, 1, 1, 2, 2, ………, 0, 2 Op_cnt : 0, 1, 0, 1, 2, 0, 1, ………, 9, 7 Hash Function the CTR ModeIt can trace the order exactly Very important~!!

Daeyeon Son Embedded System Lab. Test Workload Data protection from compressed by SSD  Some SSDs have function of the auto compression on advanced file translation layer.  Data through the compression can cause change the original workload frame.  It can avoid using random mask for temporary method by the ‘XOR’ operation. Original Data Compressed Data

Daeyeon Son Embedded System Lab. Test Workload Workload Type  Concurrent random writes Multiple-Thread I/O Memory starting point is random. Workload length is random. Memory Map

Daeyeon Son Embedded System Lab. Test Workload Workload Type  Concurrent sequential writes Multiple-Thread I/O Memory starting point is random. Workload length is continuous. Memory Map

Daeyeon Son Embedded System Lab. Test Workload Workload Type  Single-threaded sequential writes Single-Thread I/O Memory starting point is random. Workload length is continuous. Memory Map

Daeyeon Son Embedded System Lab. Test Environment Making the device for purpose the test.  Power connection to SSD is only supported by adapter in computer.  But, SSD of S-ATA is configured the non-unified both power cable and data transfer cable.  New power cable that made is connected by scheduler in software and cut the power of the SSD aperiodically.  Power fault test performs more than a thousand count.

Daeyeon Son Embedded System Lab. Test Environment Power fault injection

Daeyeon Son Embedded System Lab. Test Environment Components of the framework 1. Scheduler Manage about the testing framework. 2. Checker Verify the result that is applied by workloads on SSDs. 3. Worker Through multiple threads to inject the workload. 4. Switcher Control the power adapter cable on SSDs. I/O Scheduler : noop Write option : 1. O_SYNC 2. O_DIRECT For data synchronization For bypass the cache No operation Hardware Layer Software Layer

Daeyeon Son Embedded System Lab. Test Environment Target Device  In generally, Vendors offer the specification on their SSDs.  Specification of SSDs includes many information about hardware design.  But, we can’t find about the data connection structure that is called the 'FTL'.  File translation layer is very important component in SSDs.  Logical page number is connected to physical page number through ‘FTL’.  Vendors never opens the firmware source. Because it’s related on device performance.

Daeyeon Son Embedded System Lab. Result Summary of test result  It didn’t occur the ‘flying writes’ on SSDs. Because the file translation layer is very stable in all target devices.  But, SSD#3 was shutdown on some power fault situation. It has metadata corruption problem into SSD.  SSD#1 has critical problem about dead device. Data in solid-state disk didn’t recovery.

Daeyeon Son Embedded System Lab. Result Summary of test result  In figure 7, it can find reason why we decided workload header length.  ‘Shorn write’ is occurred by programming unit in some SSDs.  Workload test needs to follow the specifications on programming unit in flash cells on SSDs.  In figure 8, SSDs that was occurred by power fault isn't related with price and performance.

Daeyeon Son Embedded System Lab. Result Summary of test result  In figure 9, it can observe the very different behavior on file I/O synchronization problem.  SSD#4 has critical problem on serialization error in device firmware.  Serialization error is caused by control ability about transfer the data. But, many SSDs can’t recovery the data on sudden power loss.

Daeyeon Son Embedded System Lab. Conclusion This paper proposes a methodology to automatically expose the bugs in block devices such as SSDs that are triggered by power faults. The block-level behavior of SSDs exposed in experiments has important implications for the design of storage systems. Vendors of SSDs need to improve the recovery performance in power fault situation.

Daeyeon Son Embedded System Lab. References 1. Tseng, Hung-Wei, Laura Grupp, and Steven Swanson. "Understanding the impact of power loss on flash memory." Proceedings of the 48th Design Automation Conference. ACM, Jung, Sanghyuk, and Yong Ho Song. "Data loss recovery for power failure in flash memory storage systems." Journal of Systems Architecture 61.1 (2015): Meza, Justin, et al. "A Large-Scale Study of Flash Memory Failures in the Field.“, Verma, Rajat, et al. "Failure-atomic updates of application data in a linux file system." Proceedings of the 13th USENIX Conference on File and Storage Technologies. USENIX Association, Ma, Haozhi, et al. "Word line program disturbance based data retention error recovery strategy for MLC NAND Flash." Solid-State Electronics 109 (2015): Cai, Yu, et al. "Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery." DSN, Bouganim, Luc, Björn Jónsson, and Philippe Bonnet. "uFLIP: Understanding flash IO patterns." arXiv preprint arXiv: (2009).