Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.

Slides:



Advertisements
Similar presentations
IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?
Advertisements

1 Lecture 22: I/O, Disk Systems Todays topics: I/O overview Disk basics RAID Reminder: Assignment 8 due Tue 11/21.
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
RAID Redundant Array of Inexpensive Disks Presented by Greg Briggs.
1 RAID Overview n Computing speeds double every 3 years n Disk speeds cant keep up n Data needs higher MTBF than any component in system n IO.
1 Lecture 18: RAID n I/O bottleneck n JBOD and SLED n striping and mirroring n classic RAID levels: 1 – 5 n additional RAID levels: 6, 0+1, 10 n RAID usage.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB 265.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
Disk & RAID The first HDD (1956) IBM 305 RAMAC 4 MB 50x24 disks 1200 rpm 100 ms access 35k$/y rent Included computer & accounting software.
CSE 451: Operating Systems Spring 2012 Module 20 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570 ©
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
RAID Redundant Array of Independent Disks
1 Advanced Database Systems Dr. Fatemeh Ahmadi-Abkenari September 2013.
CSCE430/830 Computer Architecture
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
RAID Technology. Use Arrays of Small Disks? 14” 10”5.25”3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Katz and Patterson.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
High Performance Computing Course Notes High Performance Storage.
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Midterm 2: April 28th Material:   Query processing and Optimization, Chapters 12 and 13 (ignore , 12.7, and 13.5)   Transactions, Chapter.
Lecture 5: Wrap-up RAID Flash memory Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
S.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
RAID Systems CS Introduction to Operating Systems.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
Lecture 4 1 Reliability vs Availability Reliability: Is anything broken? Availability: Is the system still available to the user?
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
RAID REDUNDANT ARRAY OF INEXPENSIVE DISKS. Why RAID?
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Copyright © Curt Hill, RAID What every server wants!
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
RAID Disk Arrays Hank Levy. 212/5/2015 Basic Problems Disks are improving, but much less fast than CPUs We can use multiple disks for improving performance.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
CS Introduction to Operating Systems
A Case for Redundant Arrays of Inexpensive Disks (RAID)
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
Vladimir Stojanovic & Nicholas Weaver
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
RAID Disk Arrays Hank Levy 1.
ICOM 6005 – Database Management Systems Design
Lecture 28: Reliability Today’s topics: GPU wrap-up Disk basics RAID
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2010 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID Redundant Array of Inexpensive (Independent) Disks
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Winter 2004 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

Lecture 4: A Case for RAID (Part 2) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California

Smaller & Inexpensive Disks 25% annual reduction in size; 40% annual drop in price 1 inch in height, weighs 1 ounce (16 grams) 1 GB, Year 2008 IBM $125 Size of a refrigerator, 550 pounds (250 Kg) 1 GB, Year 1980 IBM $40,000

Inexpensive Disks Less than 9 Cents / Gigabyte of storage

Challenge: Managing Data is Expensive Cost of Managing Data is $100K/TB/Year : High availability: Down time is estimated at thousands of dollars per minute. Data loss results in lost productivity: 20 Megabytes of accounting data requires 21 days and costs $19K to reproduce. 50% of companies that lose their data due to a disaster never re- open; 90% go out of business in 2 years!

Challenge: Managing Data is Expensive Cost of Managing Data is $100K/TB/Year : High availability: Down time is estimated at thousands of dollars per minute. Data loss results in lost productivity: 20 Megabytes of accounting data requires 21 days and costs $19K to reproduce. 50% of companies that lose their data due to a disaster never re- open; 90% go out of business in 2 years! RAID

MTTF, MTBF, MTTR, AFR MTBF: Mean Time Between Failures Designed for repairable devices Number of hours since the system was started until its failure. MTTF: Mean Time To Failures Designed for non-repairable devices such as magnetic disk drives Disks of 2008 are more than 40 times more reliable than disks of MTTR: Mean Time To Repair Number of hours required to replace a disk drive, AND Reconstruct the data stored on the failed disk drive. AFR: Annualized Failure Rate Computed by assuming a temperature for the case (40 degrees centigrade), power-on-hours per year (say 8,760, 24x7), and 250 average motor start/stop cycles per year.

Focus on MTTF & MTTR MTTF: Mean Time To Failures Designed for non-repairable devices such as magnetic disk drives Disks of 2008 are more than 40 times more reliable than disks of MTTR: Mean Time To Repair Number of hours required to replace a disk drive, AND Reconstruct the data stored on the failed disk drive.

Assumptions MTTF of a disk is independent of other disks in a RAID. Assume: The MTTF of a disk is once every 100 years, and An array of 1000 such disks. The MTTF of any single disk in the array is once every 37 days.

RAID RAID organizes D disks into nG groups where each group consists of G disks and C parity disks. Example: D = 8 G = 4 C = 1 nG = 8/4 = 2 Disk 1Disk 2Disk 3Disk 4Parity 1Disk 5Disk 6Parity 2Disk 7Disk 8 Parity Group 1Parity Group 2

RAID RAID organizes D disks into nG groups where each group consists of G disks and C parity disks. Example: D = 8 G = 4 C = 1 nG = 8/4 = 2 Disk 1Disk 2Disk 3Disk 4Parity 1Disk 5Disk 6Parity 2Disk 7Disk 8 Parity Group 1Parity Group 2

RAID With 1 Group With G disks in a group and C check disks, a failure is encountered when: A disk in the group fails, AND A second disk fails before the failed disk of step 1 is repaired. MTTF of a group of disks with RAID is:

RAID With 1 Group (Cont…) Probability of another failure: MTTR includes the time required to: Replace the failed disk drive, Reconstruct the content of the failed disk. Performing step 2 in a lazy manner increases duration of MTTR. And the probability of another failure. What happens if we increase the number of data disks in a group?

RAID with nG Groups With nG groups, the Mean Time To Failure of the RAID is computed in a similar manner:

Review RAID 1 and 3 were presented in the previous lecture. Here is a quick review.

RAID 1: Disk Mirroring Contents of disks 1 and 2 are identical. Redundant paths keep data available in the presence of either a controller or disk failure. A write operation by a CPU is directed to both disks. A read operation is directed to one of the disks. Each disk might be reading different sectors simultaneously. Tandems architecture Controller 1Controller 2 CPU 1 Disk 1Disk 2

RAID 3: Small Blocks Reads Bit-interleaved. Bad news: Small reads of less than the group size, requires reading the whole group. E.g., read of one sector, requires read of 4 sectors. One parity group has the read rate identical to one disk Disk 1Disk 2Disk 3Disk 4Parity 0101

RAID 3: Small Block Reads Given a large number of disks, say D=12, enhance performance by constructing several parity groups, say 3. With G (4) disks per group and D (say 8), the number of read requests supported by RAID 3 when compared with one disks is the number of groups (2). Number of groups is D/G. Disk 1Disk 2Disk 3Disk 4Parity 1Disk 5Disk 6Parity 2Disk 7Disk 8 … Parity Group 1Parity Group 2

Any Questions?

A Few Questions? Assume one instance of RAID-1 organization. What are the values for: D G C nG

A Few Questions? Assume one instance of RAID-1 organization. What are the values for: D=1 G=1 C=1 nG=1

A Few Questions? Assume one instance of RAID-1 organization. What are the values for: D=1 G=1 C=1 nG=1 Is the availability characteristics of the following Level 3 RAID better than RAID 1? Disk 1Disk 2Disk 3Disk 4Parity 1 Parity Group

RAID 4 Enhances performance of small reads/writes/read-modify-write. How? Interleave data across disks at the granularity of a transfer unit. Minimum size is a sector. Parity block ECC1 is an exclusive or of the bits in blocks a, b, c, and d. Disk 1Disk 2Disk 3Disk 4Parity Block aBlock bBlock cBlock dECC 1

RAID 4 Small read retrieves its block from one disk. Now, 4 requests referencing blocks on different data disks may proceed in parallel. When compared with 1 disk, throughput of a D disk system is D times higher. Disk 1Disk 2Disk 3Disk 4Parity Block aBlock bBlock cBlock dECC 1

RAID 4: Failures (Cont…) If Disk 2 fails, a small read for Block b retrieves blocks a, c, d, and ECC 1 from disks 1, 3, 4, and Parity disks to compute the missing block. What is throughput relative to one disk now? Once Disk 2 is replaced with a new one, its content is constructed either eagerly or in a lazy manner. System cannot be too lazy because we want to minimize MTTR. Disk 1Disk 2Disk 3Disk 4Parity Block aBlock bBlock cBlock dECC 1

RAID 4: Failures (Cont…) If the Parity disk fails, read of data blocks may proceed as in normal mode of operation. Once the Parity disk is replaced, content of new Parity disk is constructed either eagerly or lazily. Disk 1Disk 2Disk 3Disk 4Parity Block aBlock bBlock cBlock dECC 1

RAID 4: Small Writes Performance of small writes is improved. To write Block b: Read the old Block b and old parity block ECC1, Compute the new parity using the old Block b, new Block b, and the old parity: New parity = (old block xor new block) xor old parity ECC1 A write requires 4 accesses: 2 reads and 2 writes. Disk 1Disk 2Disk 3Disk 4Parity Block aBlock bBlock cBlock dECC 1

RAID 4: Bottlenecks For writes, parity disk is a bottleneck. Two different writes to Block b and g must read ECC1 and ECC2 from the Parity disk. A queue will form on the Parity disk. Performance of small writes is same as RAID 3, D/2G. Disk 1Disk 2Disk 3Disk 4Parity Block aBlock bBlock cBlock dECC 1 Block eBlock fBlock gBlock hECC 2

RAID 4: Summary

RAID 5: Resolve the Bottleneck Distribute data and check blocks across all disks. Disk 1Disk 2Disk 3Disk 4 Block aBlock bBlock cBlock dECC 1 Block eBlock fBlock g Block h ECC 2 Disk 5 Block iBlock jECC 3Block kBlock l Block mECC 4Block n Block p Block o ECC 5Block qBlock r Block t Block s

RAID 5: Resolve the Bottleneck Write of Blocks a and j may proceed in parallel now. Disk 1Disk 2Disk 3Disk 4 Block aBlock bBlock cBlock dECC 1 Block eBlock fBlock g Block h ECC 2 Disk 5 Block iBlock jECC 3Block kBlock l Block mECC 4Block n Block p Block o ECC 5Block qBlock r Block t Block s

RAID 5: Read Performance Check disks service read requests. With D disks broken into nG groups, number of parity disks is nG*C. nG = D/G. When compared with one disk, the throughput of a D disk system is D + CD/G times higher. Disk 1Disk 2Disk 3Disk 4 Block aBlock bBlock cBlock dECC 1 Block eBlock fBlock g Block h ECC 2 Disk 5 Block iBlock jECC 3Block kBlock l

RAID 5: Write Performance For writes, read the referenced block and its parity block. Compute the new parity block. Write the new data block and its parity block. Continue to use the parity disk. With D disks broken into nG groups, number of parity disks is nG*C. nG = D/G. When compared with one disk, the throughput of a D disk system is D/4 + (CD/G)/4 times higher. Disk 1Disk 2Disk 3Disk 4 Block aBlock bBlock cBlock dECC 1 Block eBlock fBlock g Block h ECC 2 Disk 5 Block iBlock jECC 3Block kBlock l

RAID 5: R-M-W Performance For R-M-W, read and write of the data block comes for free. the referenced block is already retrieved. Must perform one extra disk I/O to read they parity block. Compute the new parity block. Write the new data block and its parity block. Continue to use the parity disk. With D disks broken into nG groups, number of parity disks is nG*C. nG = D/G. When compared with one disk, the throughput of a D disk system is D/2 + (CD/G)/2 times higher. Disk 1Disk 2Disk 3Disk 4 Block aBlock bBlock cBlock dECC 1 Block eBlock fBlock g Block h ECC 2 Disk 5 Block iBlock jECC 3Block kBlock l

RAID 5: Summary

Significant improvement in the performance of small writes/R-M-W:

RAID Summary If your workload consists of small R-M-W operations, which RAID would you choose?