Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow, much slower than RAM very mechanical High rate of failure Sizes have gone up A LOT!
RAI D Redundant Array of Inexpensive (or Independent) Disks Improve performance by using multiple disks in parallel. Have several disks seem like one. Disks also have tendency to fail - more disks, more failures likely - providing more reliability would be a good thing too.
RAI D Comes in several different levels (0-6) providing varying degrees of performance and/or redundancy at varying costs. Striping (RAID-0) – no redundancy - offers high data transfer and I/O throughput - suffers lower reliability and availability than a single disk.
RAI D Mirroring (RAID-1) – uses equal amount of disk capacity to store original and its mirror. - all writes also go to the mirror - provides redundancy of data and offers protection against loss in the event of physical disk failure. - reads can be done round-robin for better performance. - can have multiple mirrors (n-way)
RAI D Can combine RAID0 and RAID1 Commonly done. 0+1 or RAID10 example: stripe six disks and have six more for a mirror. What happens when a disk goes bad in a mirror and has to be replaced? How likely is data loss?
RAI D RAID-2 uses bitwise striping across disks and used additional disks to hold Hamming code check bits. Can correct single-bit errors Can detect double-bit errors Used in CM-2, not much else
RAI D RAID-3 uses a parity disk to provide redundancy. - Stripes data across all but one disk in array. Uses other disk to store parity info. (XOR) - Can recover from a single data disk failure. - How is that possible? How to figure out data stored on failed disk?
RAI D RAID-4 attempts to provide higher rate of data transfer by spreading I/O load as evenly as possible across all disks in the array. Maps data and uses parity the same as RAID3 by striping the data across all disks and XORing the data for the info on the parity disk. The difference between RAID3 and 4 is that 3 access all the disks at one time and 4 access each disk independently.
RAI D The RAID4 way allows the array to execute multiple I/O requests simultaneously while RAID3 can only execute one I/O request at a time. RAID4 performs reads much better than writes. The parity disk can become a bottleneck for writes as all writes update it. Need to fix parity disk bottleneck.
RAI D RAID-5 is similar to RAID-4 except the parity is spread throughout the disks. This does away with the parity disk bottleneck. Need n + 1 disks D0 D2 P2 D1 P1 D4 P0 D3 D5
RAI D RAID-6 is similar to RAID-5 except two parity checks are done for more reliability. Can withstand two disk failures without losing data. Need n + 2 disks Software RAID and Hardware RAID
Disk scheduling Disks are slow with mechanical movements involved. 3 factors to time - seek time – moving arm to right cylinder/track - rotational delay - data transfer time seek time usually dominates - try to reduce average time - try to reduce the head movements
Disk scheduling Assume a queue of disk block requests on different cylinders. - try to optimize seek time - minimize cylinders traversed First-Come First-Served (FCFS) Serve requests in order they come in Shortest Seek First (SSF) – handle closest request next. Cylinders at edges could suffer starvation
Disk scheduling SCAN or elevator algorithm - also used in buildings with elevators - go in one direction and handle each request you come across - turn around and go in the other direction - alternatively always go in one direction and go back to 0 after reaching end - C-SCAN – Circular SCAN
I/O devices In Unix look like files - can be read, written, etc with sys calls - /dev, /devices - block and character special files - major and minor device numbers Berkeley sockets for networking TCP, UDP