Storage Systems Sudhanva Gurumurthi
Abstraction Split up the design problem into several layers At each intermediate layer: Use what its lower layer provides to do something Hide the characteristics of the lower layer to the one above. Functionality Physics 2
Associating Meaning to Physical Phenomena Presence or absence of current defines operating states No Current => “Off” Current Flowing => “On” Abstraction: Switch We can implement this abstraction using a transistor 3
Abstracting Electricity via Transistors A processor is: A collection of transistors connected by wires, where each transistor is either in the on/off state 4
Requirements for Storage Need a non-volatile medium for housing data Retain data when external power is removed Needs to keep the data for many years Low Cost $/GB
Data Representation for Storage Electrical Representation: No Current => “Off” Current Flowing => “On” This representation does not work for storage due to the non-volatility requirement
Magnetism! Use magnetic polarity to represent data Ferromagnetism Retain magnetization even after the external magnetic field is removed. Ferromagenetism – Materials whose magnetic dipole moments remain aligned even after the external magnetic field is removed. Ferroelectric material – Dielectric materials that can be given a permanent electrical polarization even after the external electric field is removed. Image Source: http://upload.wikimedia.org/wikipedia/commons/thumb/d/d8/Bar_magnet.jpg/800px-Bar_magnet.jpg
Hysteresis Loop Logic “1” Logic “0” Ferromagenetism – Materials whose magnetic dipole moments remain aligned even after the external magnetic field is removed. Ferroelectric material – Dielectric materials that can be given a permanent electrical polarization even after the external electric field is removed. Logic “0” Image Source: http://www.ndt-ed.org/EducationResources/CommunityCollege/MagParticle/Physics/HysteresisLoop.htm
Hard Disk Drive (HDD) Faraday’s Law Magnetic Induction
A Magnetic ‘Bit’ Logic “0” Logic “1” Region of grains of uniform magnetic polarity Logic “1” Boundary between regions of opposite magnetization Source: http://www.hitachigst.com/hdd/research/storage/pm/index.html
Abstracting the Magnetics Bits are grouped into 512-byte sectors To read data on a different track, we need to move the arm (seek) Source: http://www.victimoftechnology.com/hard/harddriveadventure.html
Disk Seeks Arm Platter
Disk Seek “Flight Plan” Speedup Arm accelerates Coast Arm moves at maximum velocity (long seeks) Slowdown Arm brought to rest near desired track Settle Head is adjusted to reach the access the desired location
Using the Disk Drive Hardware/Software Interface of the disk drive (Architecture) Instructions/Commands to read and write data from/to various sectors Memory and Registers to buffer the data between the electronics and the platters Other electronic components: Error Correcting Code Motor drivers
Memory Mapped I/O Can read and write to disk just like normal memory through address ranges. The addresses to these devices may not need to go through address translation OS is the one accessing them and protection does not need to be enforced There is no swapping/paging for these addresses.
Reading Data from Disk 0x00…0 Memory Bus Main Memory 0x0ff..f 0x100…0 I/O Bus RAM Controller 0x1ff….f
Reading a Sector from Disk Processing On the CPU Store [Command_Reg], READ_COMMAND Store [Track_Reg], Track # Store [Sector_Reg], Sector # /* Device starts operation */ L: Load R, [Status_Reg] cmp R, 0 jeq /* Data now available in disk RAM */ For i = 1 to sectorsize Memtarget[i] = MemOnDisk[i] CPU Overhead! Instead, block/switch to other process and let an interrupt wake you up. Again too much CPU overhead!
Direct-Memory Access (DMA) Store [Command_Reg], READ_COMMAND Store [Track_Reg], Track # Store [Sector_Reg], Sector # Store [Memory_Address_Reg], Address /* Device starts operation */ P(disk_request); … /* Operation complete and data is now in required memory locations*/ Assume that the DMA controller is integrated into the disk drive Called when DMA raises interrupt after Completion of transfer ISR() { V(disk_request); }
Memory Technologies Volatile Flash Memory Non-Volatile
Solid State Disk (SSD) Disks that use Flash Memory No moving parts Less power and heat Quiet operation Shock and vibration tolerant Drop-in replacement for disks Image Source: L. Waldock, “Intel X-25M Solid-State Drive”, Reghardware, September 2008.
The Architecture of an SSD Source: Agrawal et al., “Design Tradeoffs for SSD Performance”, USENIX 2008
Characteristics of Flash Memory Reads/Writes done at the page granularity Page Size: 2-4 Kilobytes Writes can be done only to pages in the erased state In-place writes are very inefficient Erases done at a larger block granularity Block Size: 32-128 pages Time for Page-Read < Page-Program < Block-Erase Limited endurance due to programs and erases These issues are handled by the Flash Translation Layer (FTL) inside the SSD
Logical Block Map Each write to a logical disk Logical Block Address (LBA) happens to a different physical Flash page Need a LBA -> Flash page mapping table Mapping table stored in SSD DRAM and reconstructed when booting Target flash page for a LBA write is chosen from an allocation pool of free blocks
Cleaning Writes leave behind blocks with stale copies of the page (superseded pages) Invoke garbage collection to erase these blocks and add them to the allocation pool page-size < block-size => copy non-superseded pages in the block to another block before erasure
Cleaning and Wear-Leveling Choose blocks that have the highest number of superseded pages to reduce the number of copies SSD capacity is overprovisioned to mask garbage collection overheads Wear-Leveling Distribute the program/erase cycles evenly over all the blocks in the SSD