Disk Failures Xiaqing He ID: 204 Dr. Lin.

Slides:



Advertisements
Similar presentations
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
Advertisements

DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk.
Data Representation Recovery from Disk Crashes – 13.4 Presented By: Deepti Bhardwaj Roll No. 223_103 SJSU ID:
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
RAID and mirroring Track SA-E AfNOG workshop May 15, 2009 Cairo, Egypt (Slides by Phil Regnauld)
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 13 Natasha Alechina.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
Copyright © Curt Hill, RAID What every server wants!
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Barcodes! Felipe Voloch These notes and the barcode program are available at /barcode.html.
- Disk failure ways and their mitigation - Priya Gangaraju(Class Id-203)
The concept of RAID in Databases By Junaid Ali Siddiqui.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
Disk Failures Skip. Index 13.4 Disk Failures Intermittent Failures Organizing Data by Cylinders Stable Storage Error- Handling.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
CS Introduction to Operating Systems
RAID.
CS4470 Computer Networking Protocols
Transactions and Reliability
Multiple Platters.
What every server wants!
External Memory.
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
Vladimir Stojanovic & Nicholas Weaver
CS 554: Advanced Database System Notes 02: Hardware
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
RAID Disk Arrays Hank Levy 1.
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2010 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Overview Continuation from Monday (File system implementation)
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID Disk Arrays Hank Levy 1.
RAID RAID Mukesh N Tekwani April 23, 2019
Disk Failures Disk failure ways and their mitigation
CSE 451: Operating Systems Winter 2004 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

Disk Failures Xiaqing He ID: 204 Dr. Lin

Content Focus on : “How to recover from disk crashes” -- common term RAID = redundancy array of independent disks Mirroring—RAID level 1; Parity checks--RAID 4; Improvement--RAID 5; RAID 6;

1) Mirroring The simplest scheme to recovery from Disk Crashes Mirror: making two or more copied of the data on different disks -- save data in case of one disk will fail; --divide data on several disks and let access to several blocks at once

For mirroring, the only way data can be lost if there is a second (mirror/redundant) disk crash while the first (data) disk crash is being repaired. Possibility: Suppose: One disk_mean time to failure : 10 years; One of the two disk_mean time to failure : 5 years; The process of replacing the failed disk: 3 hours=1/2920 year; So: the possibility of the mirror disk will fail=1/10 * 1/2920 =1/29,200; The possibility of data loss by mirroring: 1/5 * 1/2920 = 1/146,000

2)Parity Blocks Disadvanges of Mirroring: uses so many redundant disks RAID level 4: uses only one redundant disk How this one redundant disk works? -- modulo-2 sum; -- the j th bit of the redudant disk is the modulo-2 sum of the j th bits of all the data disks. Example

Example Data disks: Disk1: 11110000 Disk2: 10101010 Disk3: 00111000 Redundant disk: Disk4: 01100010

cont. RAID 4 Reading Similar with reading blocks from any disk; Writing: 1) change the data disk; 2) change the corresponding block of the redundant disk; Why? --hold the parity checks for the corresponding blocks of all the data disks

cont. RAID 4_ writing For a total N data disks: 1) naïve way: read N data disks and compute the modulo-2 sum of the corresponding blocks; rewrite the redundant disk according to modulo-2 sum of the data disks; 2) better way: Take modulo-2 sum of the old and new version of the data block which was rewriten; Change the position of the redundant disk which was 1’s in the modulo-2 sum; Example

Example Data disks: Disk1: 11110000 Disk2: 10101010  01100110 Modulo-2 sum of the old and new version of disk 2: 11001100 So, we need to change the positions1,2,5,6 of the redundant disk. Redundant disk: Disk4: 01100010 - 10101110

Cont. RAID4_failure recovery Redundant disk crash: -- swap a new one and recompute data from all the data disks; One of Data disks crash: -- swap a new one; -- recompute data from the other disks including data disks and redundant disk; How to recompute? -- take modulo-2 sum of all the corresponding bits of all the other disks

3) An Improvement: RAID 5 Why need a improvement? -- Shortcoming of RAID level 4: suffers from a bottelneck defect (when updating data disk need to read and write the redundant disk); Principle of RAID level 5 (RAID 5): --treat each disk as the redundant disk for some of the blocks; Why it is feasible? The rule of failure recovery for redundant disk and data disk is the same: take modulo-2 sum of all the corresponding bits of all the other disks So, there is no need to retreat one as redundant disk and others as data disks

Cont. RAID 5 How to recognize which blocks of each disk treat this disk as redundant disk? -- if there are n+1 disks which were labeld from 0 to N, then we can treat the ith cylinder of disk J as redundant if J is the remainder when I is divided by n+1; Example;

Cont. RAID 5_example N=3; The first disk, labeled as 0 : 4,8,12…; The second disk, labeled as 1 : 1,5,9…; The third disk, labeled as 2 : 2,6,10…; ………. Suppose all the 4 disks are equally likely to be written, for one of the 4 disks, the possibility of being written: 1/4 + 3 /4 * 1/3 =1/2 N=m : 1/m +(m-1)/m * 1/(m-1) = 2/m

4) Coping with multiple disk crashes RAID 6 – deal with any number of disk crashes if using enough redundant disks Focus on: a system of seven disks ( four data disks_numer 1-4 and 3 redundant disks_ number 5-7); How to set up this 3*7 matrix ? 1)every column values three 1’s and 0’s except for all three 0’s; 2) column of the redundant disk has single 1’s; 3) column of the data disk has at least two 1’s;

Cont.) Coping with multiple disk crashes Reading: read form the data disks and ignore the redundant disk Writing: Change the data disk change the corresponding bits of all the redundant disks

Cont.) Coping with multiple disk crashes In those system which has 4 data disks and 3 redundant disk, how they can correct up to 2 disk crashes? Suppose disk a and b failed: find some row r in which the column for a and b are different; Compute the correct b by taking modulo-2 sum of the corresponding bits from all the other disks other than b which have 1’s in row r; After getting the correct b, Compute the correct a with all other disks available; Example