IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?
IBM Systems & Technology Group © 2004 IBM Corporation 2Nick Jones The Plan Introduction Types of failure The probability of failure The true cost of failure Addressing the problem Putting it into perspective Questions & summary
IBM Systems & Technology Group © 2004 IBM Corporation 3Nick Jones Types of failure Consider a pessimists view of a hard disk Two ways in which a drive can fail –It reports the failure –It lies
IBM Systems & Technology Group © 2004 IBM Corporation 4Nick Jones The probability of failure Mean Time Between Failure 1,200,000 hours Drive failure doesnt sound to be too big a problem… …but then consider the number of drives
IBM Systems & Technology Group © 2004 IBM Corporation 5Nick Jones The true cost of failure It will never happen to me Increased disk size means increased data loss A few statistics
IBM Systems & Technology Group © 2004 IBM Corporation 6Nick Jones Addressing the problem Make backups Add extra information to the disk Add extra disks
IBM Systems & Technology Group © 2004 IBM Corporation 7Nick Jones Addressing the problem: Extra information Error Correcting Code (ECC) on the disk drive Client dataDrive ECC Data seen by the drive
IBM Systems & Technology Group © 2004 IBM Corporation 8Nick Jones Addressing the problem: Extra information Longitudinal Redundancy Check (LRC) in addition to ECC Block LRCClient dataDrive ECC
IBM Systems & Technology Group © 2004 IBM Corporation 9Nick Jones Addressing the problem: Extra disks The idea was published in 1988 A Case for Redundant Arrays of Inexpensive Disks by Patterson, Gibson & Katz
IBM Systems & Technology Group © 2004 IBM Corporation 10Nick Jones RAID 0: Striping ABCDEFGABCDEFG M I E A N J F B O K G C P L H D RAID array Data striped across member disks
IBM Systems & Technology Group © 2004 IBM Corporation 11Nick Jones RAID 1: Mirroring ABCDEFGABCDEFG D C B A D C B A RAID array Data mirrored across member disks
IBM Systems & Technology Group © 2004 IBM Corporation 12Nick Jones RAID 10: Striping & Mirroring ABCDEABCDE K H E B K H E B Data striped across mirrored pairs of disks J G D A J G D A L I F C L I F C
IBM Systems & Technology Group © 2004 IBM Corporation 13Nick Jones XOR based parity Bitwise operator If the two inputs are the same, the output is 0 If the two inputs are different, the output is 1 Bit 1Bit 2XOR result
IBM Systems & Technology Group © 2004 IBM Corporation 14Nick Jones XOR based parity: An example Data Parity xxxxxxxx
IBM Systems & Technology Group © 2004 IBM Corporation 15Nick Jones XOR based parity: An example Data Parity xxxxxxx
IBM Systems & Technology Group © 2004 IBM Corporation 16Nick Jones RAID 4: Parity ABCDEFGABCDEFG J G D A K H E B L I F C P4 P3 P2 P1 RAID array Data striped across disks, with one parity disk
IBM Systems & Technology Group © 2004 IBM Corporation 17Nick Jones Coping with failure ABCDEFGABCDEFG J G D A K H E L I F C P4 P3 P2 P1 Error reading E –Read D, F & P2 –XOR them to reconstruct E –Write reconstructed E E B
IBM Systems & Technology Group © 2004 IBM Corporation 18Nick Jones Coping with failure ABCDEFGABCDEFG J G D A K H E L I F C P4 P3 P2 P1 Drive loss –Replace the drive –Rebuild the data –Redundancy restored B
IBM Systems & Technology Group © 2004 IBM Corporation 19Nick Jones RAID 5: Rotate parity ABCDEFGABCDEFG J H F P1 K I P2 A L P3 D B P4 G E C RAID array Data striped across disks, with parity rotating
IBM Systems & Technology Group © 2004 IBM Corporation 20Nick Jones RAID 6: More parity ABCDEFGABCDEFG M K P2 A N L Q2 B O P3 E C P Q3 F D P4 I G P1 Q4 J H Q1 Data striped across disks, with 2 rotating parities
IBM Systems & Technology Group © 2004 IBM Corporation 21Nick Jones Putting it into perspective Cannot survive on RAID alone Avoid a single point of failure –Fire, flood, power loss Split your array across two sites Human error Backups still have a place
IBM Systems & Technology Group © 2004 IBM Corporation 22Nick Jones Summary Want to avoid any single point of failure Disk drives do fail RAID protects against drive failure Mirroring & parity RAID isnt the ultimate solution