1 A Case for Redundant Arrays of Inexpensive Disks Patterson, Gibson and Katz (Seminal paper) Chen, Lee, Gibson, Katz, Patterson (Survey) Circa late 80s..

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
1 Lecture 18: RAID n I/O bottleneck n JBOD and SLED n striping and mirroring n classic RAID levels: 1 – 5 n additional RAID levels: 6, 0+1, 10 n RAID usage.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Redundant Array of Independent Disks
Raid dr. Patrick De Causmaecker What is RAID Redundant Array of Independent (Inexpensive) Disks A set of disk stations treated as one.
CSCE430/830 Computer Architecture
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
CSE521: Introduction to Computer Architecture Mazin Yousif I/O Subsystem RAID (Redundant Array of Independent Disks)
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
EECC551 - Shaaban #1 Lec # 13 Winter Magnetic Disk CharacteristicsMagnetic Disk Characteristics I/O Connection StructureI/O Connection Structure.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
1 Storage (cont’d) Disk scheduling Reducing seek time (cont’d) Reducing rotational latency RAIDs.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
Disks CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
RAID Systems CS Introduction to Operating Systems.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Storage Systems CSE 598d, Spring 2007 Lecture 5: Redundant Arrays of Inexpensive Disks Feb 8, 2007.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
1 Database Systems Storage Media Asma Ahmad 21 st Apr, 11.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
Redundant Array of Independent Disks
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
RAID Disk Arrays Hank Levy. 212/5/2015 Basic Problems Disks are improving, but much less fast than CPUs We can use multiple disks for improving performance.
Αρχιτεκτονική Υπολογιστών Ενότητα # 6: RAID Διδάσκων: Γεώργιος Κ. Πολύζος Τμήμα: Πληροφορικής.
RAID Systems Ver.2.0 Jan 09, 2005 Syam. RAID Primer Redundant Array of Inexpensive Disks random, real-time, redundant, array, assembly, interconnected,
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
Abstract Increases in CPU and memory will be wasted if not matched by similar performance in I/O SLED vs. RAID 5 levels of RAID and respective cost/performance.
Introduction to RAID Rogério Perino de Oliveira Neves Patrick De Causmaecker
Part IV I/O System Chapter 12: Mass Storage Structure.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CS Introduction to Operating Systems
A Case for Redundant Arrays of Inexpensive Disks (RAID) -1988
A Case for Redundant Arrays of Inexpensive Disks (RAID)
Multiple Platters.
Disks and RAID.
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
RAID Disk Arrays Hank Levy 1.
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
RAID Disk Arrays Hank Levy 1.
RAID RAID Mukesh N Tekwani April 23, 2019
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

1 A Case for Redundant Arrays of Inexpensive Disks Patterson, Gibson and Katz (Seminal paper) Chen, Lee, Gibson, Katz, Patterson (Survey) Circa late 80s.. MIPS = 2 year-1984 Joy’s Law There seems to be plenty of main-memory available (multi mega- bytes per machine). To achieve a balanced system, Secondary Storage Systems have to match the above developments. Caches (built with SRAM technology) feed CPUs with instructions fast and provide a bridge between the top two layers of the memory hierarchy SLED (Single Large Expensive Disk) had shown modest improvement… –Seek times improved from 20ms in 1980 to 10ms in 1994 –Rotational speeds increased from 3600/minute in 1980 to 7200 in 1994

2 Amdahl’s Law S = effective speedup = 1 / [ (1-f) + (f/k) ] f = fraction of work done in faster mode k = speedup while in faster mode Example: suppose application does I/O during 10% of its time When computers become 10X faster, Amdahl’s law says that effective speedup is only 5X ! When computers become X faster, Amdahl’s law says that speedup will be ONLY 10X !!!

3 Data Sheet Comparison of Data between two Disk Units of the time… IBM 3380Conner CP ’’ in diameter3.5’’ in diameter 7,500 Megabytes100 Megabytes $135,000$1, IO’s/sec20-30 IO’s/sec 3 MB/sec1MB/sec 24 cube feet.03 cube feet Key Observation (Inexpensive vs. expensive disks): Number of I/Os differ only moderately

4 Core of the Proposal Build I/O systems as ARRAYS of inexpensive disks. – Stripe data across multiple disks and access them in parallel to achieve both higher data transfer rates on large data accesses and… – higher I/O rates on small data accesses Idea not entirely new… – Prior very similar proposals [Kim 86, Livny et al, 87, Salem & Garcia-Molina 87] 75 inexpensive disks have potentially 12 times the I/O bandwidth of an IBM 3380 with lower power consumption and cost!

5 RAID = Reliability May Suffer? MTTF: mean time to failure MTTF for a single disk unit is long.. – For IBM 3380 is estimated to be 30,000 hours ( > 3 years) – For CP 3100 is around 30,000 hours as well.. For an array of 100 CP3100 disk the… MTTF = MTTF_for_single_disk / Number_of_disk_in_the_Array I.e., 30,000 / 100 = 30 hours!!! (or once a day!) That means that we are going to have failures very frequently

6 A better solution: RAID Idea Idea: make use of extra disks for reliability! Core contribution of paper (in comparison with prior work): –Provide a full taxonomy (RAID-levels) –Qualitatively outlines the workloads that are “good” for every classification –RAID ideas are applicable to both hardware and software implementations

7 Basis for RAID Taxonomy Two RAID aspects taken into consideration: –Data striping : leads to enhanced bandwidth –Data redundancy : leads to enhanced reliability –Mirroring, parity, or other encodings –but also throughput issues for accesses of different size, as we know …

8 Data Striping: Data striping distributes data transparently over multiple disks to make them appear as a single fast large disk. Allows multiple I/Os to happen in parallel. Granularity of data interleaving Fine grained (bit interleaved) –Relatively small units; all I/O requests access all of disks in the disk array. High transfer rates Only one logical I/O request at a time All disks must waste time positioning for each request: bad! Coarse grained (block-interleaved) –Interleave in relatively large units so that small I/O requests only need a small number of disks while large requests can access all disks in the array

9 Data Redundancy Method for computing redundant information –Parity (3,4,5), Hamming (2) or Reed-Solomon (6) codes Method for distributing redundant information –Concentrate on small number of disks vs. distribute uniformly across all disks –Uniform distribution avoids hot spots and other load balancing issues.

10 RAID Level 0 Non-redundant RAID Level –0 Strip sectors Lowest cost Use for low-reliability applications Best “write” bandwidth performance – –does not need to update redundant information. Does not necessarily have best read performance: –Redundancy schemes that duplicate data, such as mirroring, may perform better on reads by selectively scheduling requests on the disk with the shortest expected seek and rotational delays.

11 Mirrored RAID Level 1 Uses twice as many disks as a nonredundant disk array. Strip sectors of data. Whenever data written to disk its written to a redundant one as well. When retrieved, retrieve from disk with shorter queueing, seek and rotational delays. Can reduce by 45%. Reliability: excellent Trades capacity for performance (expensive!) Recovery? (simple) Appropriate applications: Availability and transaction rate more important than storage efficiency.

12 RAID Levels 2 and 3 Parallel Access –All member disks participate in every I/O request. –Spindles synchronized. Data Striping Used –Small strips

13 RAID 2 Goal: improve cost of recovery from failed components Striping is done at the word level. Use Memory Error Correcting Codes (ECC) [Hamming 50] –Error-correcting code calculated across corresponding bits on each data disk; bits of code stored in corresponding bit positions on multiple parity disks. –Hamming code: corrects single-bit errors and detect double-bit errors. On a single read/write, all disks accessed. For large writes, as good as 1. Appropriate for supercomputers Bad for small data transfers Still costly. –# of redundant disks ~ log(total number of disks). More efficient as number of data disks increases. Recovery: If a disk fails, several of the parity component will have inconsistent values, and the failed component is the one held in common by each incorrect subset. –Multiple redundant disks needed to identify the failed disk; only one needed for recovery. Drives have to be rotationally synchronized in order to get the benefits!

14 RAID Level 3 Simplified, cheaper version of RAID 2 –Requires only a single redundant disk –Employs parallel access, with data distributed in small strips. Most disk controllers can tell if the disk failed. Bit-interleaved parity Each read request reads from all data disks and each write request Cannot do much about the “random” error But it can easily protect against a bad drive.. (Recovery is easy). Very high-grained interleaving RAID 2 and 3 are of practical value mostly for supercomputer applications – high bandwidth Simpler to implement than levels 4,5,6(although they complete the discussion in terms of RAID levels)

15 RAID - Level 4 Block-Interleaved Parity Striping entity is a block –Data interleaved across disks in blocks of arbitrary size rather than in bits. –Size of blocks = “striping unit.” Read requests smaller than the striping unit access only a single data disk. Block-level parity used Write requests must update the requested data blocks and must compete and update the parity block. –For Large writes that touch blocks on all disks, calculate parity directly. –For small write requests that update only one data disk, parity computed by noting differences. No synchronized drives required to read a block.. If a disk crashes, then recovery is straightforward. Small write requests require 4 disk I/Os. Parity disk gets most of the “traffic” (bottleneck)

16 RAID Level-5 Block-Interleaved Distributed Parity. Distribute parity blocks No single disk becomes the bottleneck Best small read, large read and large write performance of any redundant disk arrays. Still high cost for small WRITES Reconstruction of a failed drive is not as easy as in other levels

17 RAID Level 6 P+Q Redundancy Parity can correct any single self-identifying failure. –As disk arrays get larger, multiple failures are possible, stronger codes needed. –When a disk fails in a parity-protected disk array, recovering contents of failed disk requires succesful reading of contents of all nonfailed disks. Probability of a random bit error is high. –Need stronger error-correcting codes. P+Q Redundancy Reed-Solomon codes to protect against 2 failures using 2 disks. Otherwise similar RAID-5

18 Small Writes in RAID 5: why it is a Problem A single WRITE is translated into five operations: 1.READ old data 2.READ old parity block 3.Compute XOR (new data with old data block) 4.WRITE new data block 5.WRITE new parity block Exclusive OR

19 Solving Small Write Problem [ Stodolsky, Gibson, 1993] Parity Logging: –Increases throughput for workloads emphasizing small, random write accesses in a redundant disk array by logging changes to parity in a segmented log for efficient application later. –Log segmentation allows log operations that are large enough to be efficient yet small enough to allow in-memory application of a log segment.

20 Issues What happens when new disks are added into the system? – Got to change layout – Got to rearrange data In addition, many spare disks may be wasted.. Solution to the above is the HP Auto RAID [AutoRAID, SOSP 95] – Core idea: mirror active data (hot) RAID-5 for not very active data (cold) – Assumptions: Only part of data is “active” any any moment in time. Working set changes slowly (so that migration is enabled).