DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)

Slides:



Advertisements
Similar presentations
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB 265.
Advertisements

Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
CS 346 – April 4 Mass storage –Disk formatting –Managing swap space –RAID Commitment –Please finish chapter 12.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
RAID: Redundant Array of Inexpensive Disks Supplemental Material not in book.
RAID Redundant Array of Independent Disks
Chapter 16: Recovery System
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
Section Disk Failures Kevin Grant
Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology April 1, 2004 MEDIA FAILURES Lecture based on [GUW, ]
Disks CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Data Representation Recovery from Disk Crashes – 13.4 Presented By: Deepti Bhardwaj Roll No. 223_103 SJSU ID:
RAID Systems CS Introduction to Operating Systems.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
COSC 3213: Computer Networks I Instructor: Dr. Amir Asif Department of Computer Science York University Section M Topics: 1. Error Detection Techniques:
Chapter 12 – Mass Storage Structures (Pgs )
- Disk failure ways and their mitigation - Priya Gangaraju(Class Id-203)
The concept of RAID in Databases By Junaid Ali Siddiqui.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
CS399 New Beginnings Jonathan Walpole. Disk Technology & Secondary Storage Management.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Part IV I/O System Chapter 12: Mass Storage Structure.
Disk Failures Skip. Index 13.4 Disk Failures Intermittent Failures Organizing Data by Cylinders Stable Storage Error- Handling.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CS Introduction to Operating Systems
Disk Failures Xiaqing He ID: 204 Dr. Lin.
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
Vladimir Stojanovic & Nicholas Weaver
CS 554: Advanced Database System Notes 02: Hardware
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Overview Continuation from Monday (File system implementation)
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID RAID Mukesh N Tekwani April 23, 2019
Disk Failures Disk failure ways and their mitigation
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Presentation transcript:

DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)

Index Index 13.4 Disk Failures Intermittent Failures Organizing Data by Cylinders Stable Storage Error- Handling Capabilities of Stable Storage Recovery from Disk Crashes Mirroring as a Redundancy Technique Parity Blocks An Improving: RAID Coping With Multiple Disk Crashers

Intermittent Failures If we try to read the sector but the correct content of that sector is not delivered to the disk controller Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the read operation

Checksums Each sector has some additional bits, called the checksums Checksums are set depending on the values of the data bits stored in that sector Probability of reading bad sector is less if we use checksums For Odd parity: Odd number of 1s, add a parity bit 1 For Even parity: Even number of 1s, add a parity bit 0 So, number of 1s becomes always even

Intermittent Failure: Parity Check Media Decay And Write Failure: Stable Storage Disk Crash: RAID Example: 1. Sequence : > odd no of 1s parity bit: 1 -> Sequence : >even no of 1s parity bit: 0 ->

Stable Storage Correct Errors Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by subsituting spare sector of Xl and Xr until the good value is returned

Error Handling Capabilities of Stable Storage Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr 2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X

Recovery from Disk Crashes: Ways to recover the data The most serious mode of failure for disks is head crash where data permanently destroyed. So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes. Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk.

Mirroring as a Redundancy Technique Mirroring Scheme is referred as RAID level 1 protection against data loss scheme. In this scheme we mirror each disk. One of the disk is called as data disk and other redundant disk. In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.

Parity Blocks RAID level 4 scheme uses only one redundant disk no matter how many data disks there are. In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks. It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1s and redundant disk bit is used to make this condition true.

Parity Block – Reading disk Reading data disk is same as reading block from any disk. We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum. disk 2: disk 3: disk 4: If we take the modulo-2 sum of the bits in each column, we get -disk 1:

Parity Block - Writing When we write a new block of a data disk, we need to change that block of the redundant disk as well. One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk. But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os Better approach will require only four disk I/Os 1. Read the old value of the data block being changed. 2. Read the corresponding block of the redundant disk. 3. Write the new data block. 4. Recalculate and write the block of the redundant disk.

Parity Blocks – Failure Recovery If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk. Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like: disk 1: disk 2: ???????? disk 3: disk 4: If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is :

An Improvement: RAID 5 RAID 4 is effective in preserving data unless there are two simultaneous disk crashes. Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk. However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.

Thank You