Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems
Modified by Dr. Neerja Mhaskar for CS 3SH3

Storage Management Main memory too small to accommodate data and programs. Secondary storage provides additional space for programs to reside permanently. File system provides the mechanism to store/access data Main goals of an operating system’s I/O subsystem Provide the simplest interface possible to the rest of the system. Optimize I/O for maximum concurrency and efficiency.

Storage devices Main types of storage devices used:
Magnetic disks (Hard Disk Drives/HDD) - provide bulk of secondary storage on modern computers. Solid state disks (SSD) - Nonvolatile memory used like a hard drive and primarily used in storage arrays (arrays of HDDs or SSDs) and small high efficiency devices. Can be more reliable than HDDs – as no moving parts. Much faster than HDDs as no seek time or rotational latency. However, more expensive, shorter life span and less capacity Magnetic Tapes - Mainly used for backup, storage of infrequently-used data, transfer medium between systems Relatively permanent and holds large quantities of data Access time slow and Random access ~1000 times slower than disk

Magnetic disks Disk platters that look like CDS
Both sides coated with magnetic material, and information is stored by recording it magnetically on the platter. Disks divided into concentric rings called tracks Tracks divided into sectors Data on a disk drive is read by a Read write head. Typically, one head per surface on a separate arm. Read write heads controlled by a common disk arm or arm assembly, which moves all heads as a unit. The set of tracks that are at one arm position makes up a cylinder A particular physical block of data is specified by providing the head-sector-cylinder number at which it is located.

Magnetic Disks Cont… When disk is in use, the drive motor spins it at high speed Common drives spin at 5,400, 7,200, 10,000, and 15,000 RPM Disk speeds based on the following two components: Transfer rate is rate at which data flow between drive and computer Positioning time (random-access time) consists of two parts: Seek time – is time taken for the disk arm to move the heads to the cylinder containing the desired sector Rotational latency – is the additional time for the disk to rotate the desired sector to the disk head. These are several milliseconds. Several megabytes of data transferred per second

Magnetic Disks Cont… Head crash results from disk head making contact with the disk surface -- damages the disk. A disk drive is attached to a computer via wires called I/O bus Host controller in computer uses bus to talk to disk controller built into drive or storage array to transfer data.

Disk Structure Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer Logical blocks of size approx. 512 bytes Low-level formatting creates different size logical blocks Mapping of logical blocks: The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially Sector 0 is the first sector of the first track on the outermost cylinder Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost Modern HDDs have 1000s of cylinder and 100s of sectors in each track.

Disk Attachment Disk drives can be attached either directly to a particular host ( a local disk ) or to a network. Host-attached storage: Local disks are accessed through I/O Ports Network-attached storage: Connects storage devices to computers using a remote procedure call, RPC, interface. Allows several computers in a group to access the shared storage.

Disk Scheduling To use the disk drives efficiently, it needs fast access time and large disk bandwidth. Disk bandwidth = Total number of bytes transferred/ Total time between the first request for service and the completion of the last transfer. Managing the order in which disk I/O requests are serviced can improve both the access time and disk bandwidth. When I/O requests arrive, they are placed in a queue and serviced based on the disk scheduling algorithms.

Disk Scheduling Cont… Several algorithms exist to schedule the servicing of disk I/O requests. We study the following: First Come First Served - FCFS Shortest Seek Time First - SSTF SCAN C-SCAN LOOK C-LOOK These scheduling algorithms are illustrated with the following request queue consisting of cylinder numbers (0-199): 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53

FCFS I/O requests scheduled on first-come, first-served basis.
Simplest and fair Below illustration shows total head movement of 640 cylinders.

SSTF Shortest Seek Time First selects the request with the minimum seek time from the current head position In other words, SSTF chooses the pending request closest to the current head position. SSTF scheduling is a form of SJF scheduling -- may cause starvation of some requests Illustration shows total head movement of 236 cylinders

SSTF Cont…

SCAN The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. The head continuously scans back and forth across the disk. Need to know the direction of head movement in addition to the head’s current position. SCAN algorithm sometimes called the elevator algorithm Illustration shows total head movement of 236 cylinders But note that if requests are uniformly dense, largest density at other end of disk and those wait the longest

C-SCAN C-SCAN algorithm is a variant of SCAN designed to provide a more uniform wait time than SCAN. The head moves from one end of the disk to the other, servicing requests as it goes When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip Treats the cylinders as a circular list that wraps around from the last cylinder to the first one Total number of cylinders?

C-SCAN Cont…

LOOK and C-LOOK LOOK a version of SCAN, C-LOOK a version of C-SCAN
Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk Total number of cylinders?

C-LOOK Example

Selecting a Disk-Scheduling Algorithm
To choose a disk scheduling algorithm is tricky as it is based on the factors listed below: Load on the disk – if load is heavy use SCAN and C-SCAN Number and types of request File-allocation method Rotational latency (sometimes same as seek times), difficult for OS to compute it. The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary SSTF is common Either SSTF or LOOK is a reasonable choice for the default algorithm

Disk Management The operating system is responsible for several other aspects of disk management apart from disk scheduling. Disk Formatting Booting from disk Bad-block recovery

Disk Formatting To use a disk to hold files, the following steps are followed: Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write Each sector can hold header information, plus data, plus error correction code (ECC) Usually 512 bytes of data but can be selectable OS records its own data structures on the disk. This is done in two steps: Partition the disk into one or more groups of cylinders, each treated as a logical disk Logical formatting or “creating a file system” - the operating system stores the initial file-system data structures onto the disk. To increase efficiency most file systems group blocks into clusters Disk I/O done in blocks File I/O done in clusters

Booting from Disk System is booted using the bootstrap program
The bootstrap is stored in ROM - inflexible Bootstrap loader program stored in ROM instead Full bootstrap program stored in boot blocks at fixed location on disk. A disk that has a boot partition is called a boot disk or system disk

Boot Block Bootstrap program is the initial program to run on a system
It is stored in ROM – inflexible (as it is read only) Bootstrap loader program stored in ROM instead Full bootstrap program stored in boot blocks at fixed location on disk. A disk that has a boot partition is called a boot disk or system disk

Bad-block recovery Disk controller maintains a list of bad blocks
During low level formatting, some sectors are set aside (not visible to OS) as spare sectors. When bad block detected, content restored from backup onto one of the spare sectors. Any reference to the bad sector is forward to the spare sector. This is called sector sparing or forwarding

Swap-Space Management
Swap-space — Virtual memory uses disk space as an extension of main memory Less common now due to memory capacity increases Linux uses multiple swap spaces Swap-space can be carved out of the normal file system, or, more commonly, it can be in a separate disk partition (raw) with no file system. On raw partitions, a swap-space storage manager is used to allocate and deallocate the blocks from the raw partition. This manager uses algorithms optimized for speed Internal fragmentation may increase (This is OK as data in swap space is short lived.) Some systems allow multiple swap spaces

RAID Structure A variety of disk-organization techniques collectively called RAID are commonly used to address the performance and reliability issues. RAID – Redundant Array of Inexpensive (now Independent) Disks Provides reliability via redundancy Provides improved performance via parallelism Frequently combined with NVRAM to improve write performance

RAID Structure – Reliability via redundancy
Mean time to failure (MTF): the average time for a disk to fail The more disks a system has, the greater the likelihood that one of them will go bad at any given time. Therefore, RAID has increased mean time to failure. Duplicating data onto two or more disks, reduces MTF. To lose data all copies of data have to be damaged simultaneously. If data on one disk is duplicated onto a second disk, then to lose data, the second disk would have to go bad before the first disk is repaired. Mean time to repair (MTR): the average time it takes to replace a failed disk and to restore the data on it Data mirroring: duplicating identical data on multiple disks

Mean time to data loss with Example
Mean time to data loss (MTDL): based on MTF and MTR. If N mirrored disks, then MTDL = (MTF)N/N*MTR Suppose MTF of a single disk = 100,000 hours Failures of two disks are independent MTR = 10 hours All disks are mirrored (that is each disk has a duplicate or a copy) MTDL of the mirrored system = 100, 0002 / (2 ∗ 10) = 500 ∗ 106 hours = 57,000 years!

RAID – Improved performance via Parallelism
Disk mirroring increases number of reads per unit time Striping data across the disks increases transfer rate. Data striping is of two types: Bit-level striping - splitting the bits of each byte across multiple disks Block-level striping – stripping the blocks of a file across multiple disks. Most common With n disks, block i of a file goes to disk (i mod n) + 1. Parallelism in a disk system achieved through striping Increase throughput of multiple page accesses by load balancing. Reduce response time of large accesses.

RAID Levels Mirroring provides high reliability, but it is expensive.
Striping provides high data-transfer rates, but it does not improve reliability. Numerous schemes provide redundancy at lower cost by using disk striping combined with “parity” bits. These schemes are called RAID levels There are six different levels. Each level has different cost–performance trade-offs Its important to note that RAID protects against physical errors, but not against any number of bugs or other errors that could write erroneous data.

RAID Levels 0,1 2 RAID 0: Striping (at the level of blocks) only, with no redundancy. Raid 1: Mirroring only, no striping. RAID 2: (a.k.a memory system style error-correcting code (ECC) organization) Bit level stripping with error correction bits on additional disks If one of the disks fails, the remaining bits of the byte and the associated error-correction bits can be read from other disks and used to reconstruct the damaged data. Typically need less number of disks to store error correction bits than the number of disks used to store data. The number of disks required is a function of the error-correcting algorithms, and the means by which the particular bad bit(s) is(are) identified.

Memory systems style ECC
Memory systems detect certain errors by using parity bits. Each byte may have a parity bit associated with it that records whether the number of bits in the byte set to 1 is even (parity = 0) or odd (parity = 1). If one of the bits in the byte is damaged (either a 1 becomes a 0, or a 0 becomes a 1), the parity of the byte changes and thus does not match the stored parity. Similarly, if the stored parity bit is damaged, it does not match the computed parity. Thus, all single-bit errors are detected by the memory system. Error-correcting schemes store two or more extra bits and can reconstruct the data if a single bit is damaged

RAID 3 RAID 3 (bit-interleaved parity organization): similar to RAID 2. Unlike memory systems disk controllers can precisely detect an error in reading a sector. Therefore, a single parity bit suffices for error correction and detection. Hence single Parity disk used with multiple data disks. Advantages: Data transfer rate for reading or writing a single block is N times as fast as with RAID level 1, since reads and writes of a byte are spread out over multiple disks with N-way striping of data. Disadvantages: RAID level 3 supports fewer I/Os per second, since every disk has to participate in every I/O request. Performance problem with RAID 3 (infact, with all parity based RAID levels) is overhead of computing and writing the parity.

RAID 4, 5, 6 RAID 4 (block-interleaved parity organization):
Uses block-level striping, and in addition keeps a parity block on a separate disk for corresponding blocks from N other disks. RAID 5 (block-interleaved distributed parity): Similar to RAID 4, except Spreads data and parity among all N+1 disks, rather than storing data in N disks and parity in one disk. For each block, one of the disks stores the parity and the others store data. A parity block cannot store parity for blocks in the same disk RAID 5 is the most common parity RAID system. RAID 6 – Similar to RAID level 5, but but stores extra redundant information to guard against multiple disk failures. Instead of parity, error-correcting codes such as the Reed–Solomon codes are used.

Read and Write - RAID 4, 5, 6 Data-transfer rate for each access is slower -- As block read accesses only one disk, but allows other requests to be processed by the other disks. Higher overall I/O rate – As multiple read accesses can proceed in parallel Transfer rates for large reads and writes high - Since all the disks can be read/written (both data and parity) in parallel. Small independent writes are time consuming – As they cannot be performed in parallel. An operating system write of data smaller than a block requires that the block be read, modified with the new data, and written back. The parity block has to be updated as well. This is known as the read-modify-write cycle.

RAID Levels C = copy of data P = error correcting bits

RAID (0 + 1) and (1 + 0) Refer to a combination of RAID 0 and RAID 1.
RAID 0 provides performance RAID 1 provides reliability. In RAID 0 + 1, a set of disks are striped, and then the stripe is mirrored to another, equivalent stripe. In RAID disks are mirrored in pairs and then the resulting mirrored pairs are striped Both provide better performance than RAID 5.

Stable-Storage Implementation
Stable storage means data is never lost (due to failure, etc.) To implement stable storage: Replicate information on more than one nonvolatile storage media with independent failure modes Update information in a controlled manner to ensure that we can recover the stable data after any failure during data transfer or recovery

End of Chapter 10

Chapter 10: Mass-Storage Systems

Similar presentations

Presentation on theme: "Chapter 10: Mass-Storage Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 10: Mass-Storage Systems

Similar presentations

Presentation on theme: "Chapter 10: Mass-Storage Systems"— Presentation transcript:

Similar presentations

About project

Feedback