Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.

Slides:



Advertisements
Similar presentations
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
Advertisements

RAID Redundant Array of Inexpensive Disks Presented by Greg Briggs.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
CSCE430/830 Computer Architecture
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
RAID Technology. Use Arrays of Small Disks? 14” 10”5.25”3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Katz and Patterson.
CS252/Patterson Lec 6.1 2/2/01 CS252 Graduate Computer Architecture Lecture 6: I/O 2: Failure Terminology, Examples, Gray Paper and a little Queueing Theory.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
First magnetic disks, the IBM 305 RAMAC (2 units shown) introduced in One platter shown top right. A RAMAC stored 5 million characters on inch.
J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)
S.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CS 61C L41 I/O Disks (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
CSE 502 Graduate Computer Architecture Lec – Disk Storage Larry Wittie Computer Science, StonyBrook University and.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
1 Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance Pattara Leelaprute Computer Engineering Department Kasetsart University
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
Storage & Peripherals Disks, Networks, and Other Devices.
Lecture 4 1 Reliability vs Availability Reliability: Is anything broken? Availability: Is the system still available to the user?
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
RAID Storage EEL4768, Fall 2009 Dr. Jun Wang Slides Prepared based on D&P Computer Architecture textbook, etc.
Redundant Array of Independent Disks
CSE431 Chapter 6A.1Irwin, PSU, 2008 Chapter 6A: Disk Systems Mary Jane Irwin ( ) [Adapted from Computer Organization.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
CSI-09 COMMUNICATION TECHNOLOGY FAULT TOLERANCE AUTHOR: V.V. SUBRAHMANYAM.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
CSE 502 Graduate Computer Architecture Lec 21 – Disk Storage Larry Wittie Computer Science, StonyBrook University and.
Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Storage. 10/20/20152 Case for Storage Shift in focus from computation to communication and storage of information –E.g., Cray Research/Thinking Machines.
I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:
CS 136, Advanced Architecture Storage. CS136 2 Case for Storage Shift in focus from computation to communication and storage of information –E.g., Cray.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
CSC 7080 Graduate Computer Architecture Lecture 13 - Storage Dr. Khalaf Notes adapted from: David Patterson Electrical Engineering and Computer Sciences.
CS 505: Thu D. Nguyen Rutgers University, Spring CS 505: Computer Structures Fault Tolerance Thu D. Nguyen Spring 2005 Computer Science Rutgers.
Group 2 Bernard Smith Thomas Laborde Hannah Prather Fault Tolerance Environment Power Topology and Connectivity Servers Hurricane Preparedness Network.
Install, configure and test ICT Networks
CS 6290 I/O and Storage Milos Prvulovic. Storage Systems I/O performance (bandwidth, latency) –Bandwidth improving, but not as fast as CPU –Latency improving.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
W4118 Operating Systems Instructor: Junfeng Yang.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Video Security Design Workshop:
Multiple Platters.
Vladimir Stojanovic & Nicholas Weaver
IT 251 Computer Organization and Architecture
Introduction I/O devices can be characterized by I/O bus connections
Lecture 13 I/O.
RAID RAID Mukesh N Tekwani
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Appendix D– Storage Systems
Storage Systems Disk, RAID, Dependability
Storage Systems Disk, RAID, Dependability
RAID RAID Mukesh N Tekwani April 23, 2019
Seminar on Enterprise Software
Presentation transcript:

Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department Computer System Architecture ESGD2204 Saturday, 8 th May 2010 Chapter 8 Lecture 15

Chapter 7 Storage Systems

Summary  Four components of disk access time: l Seek Time: advertised to be 3 to 14 ms but lower in real systems l Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at RPM l Transfer Time: 30 to 80 MB/s => just to ms / 512B-sector l Controller Time: typically less than 0.2 ms  RAIDS can be used to improve availability l RAID 1 and RAID 5 – widely used in servers, one estimate is that 80% of disks in servers are RAIDs l RAID 0+1 (mirroring) – EMC, Tandem, IBM l RAID 3 – Storage Concepts l RAID 4 – Network Appliance  RAIDS have enough redundancy to allow continuous operation, but not hot swapping

Example p = 5 Row diagonal parity starts by recovering one of the 4 blocks on the failed disk using diagonal parity –Since each diagonal misses one disk, and all diagonals miss a different disk, 2 diagonals are only missing 1 block Once the data for those blocks is recovered, then the standard RAID recovery scheme can be used to recover two more blocks in the standard RAID 4 stripes Process continues until two failed disks are restored Data Disk 0 Data Disk 1 Data Disk 2 Data Disk 3 Row Parity Diagona l Parity

Berkeley History: RAID-I RAID-I (1989) –Consisted of a Sun 4/280 workstation with 128 MB of DRAM, four dual-string SCSI controllers, inch SCSI disks and specialized disk striping software Today RAID is $24 billion dollar industry, 80% of non- PC disks are sold in RAIDs

Summary: RAID Methods: Goal Was Performance. Highly Popular Since Reliable Storage Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes

Definitions Examples on why precise definitions so important for reliability Is a programming mistake a fault, error, or failure? –Are we talking about the time it was designed or the time the program is run? –If the running program doesn’t exercise the mistake, is it still a fault/error/failure? If an alpha particle hits a DRAM memory cell, is it a fault/error/failure if it does not change the value? –Is it a fault/error/failure if the memory doesn’t access the changed bit? –Did a fault/error/failure still occur if the memory had error correction and delivered the corrected value to the CPU?

IFIP Standard terminology Computer system dependability: quality of delivered service such that reliance can be placed on service Service is observed actual behavior as perceived by other system(s) interacting with this system’s users Each module has ideal specified behavior, where service specification is agreed description of expected behavior A system failure occurs when the actual behavior deviates from the specified behavior failure occurred because an error, a defect in a module The cause of an error is a fault When a fault occurs it creates a latent error, which becomes effective when it is activated When error actually affects the delivered service, a failure occurs (time from error to failure is error latency)

Fault v. (Latent) Error v. Failure An error is manifestation in the system of a fault, a failure is manifestation on the service of an error Is If an alpha particle hits a DRAM memory cell, is it a fault/error/failure if it doesn’t change the value? –Is it a fault/error/failure if the memory doesn’t access the changed bit? –Did a fault/error/failure still occur if the memory had error correction and delivered the corrected value to the CPU? An alpha particle hitting a DRAM can be a fault if it changes the memory, it creates an error error remains latent until effected memory word is read if the effected word error affects the delivered service, a failure occurs

Fault Categories 1.Hardware faults: Devices that fail, such as an alpha particle hitting a memory cell 2.Design faults: Faults in software (usually) and hardware design (occasionally) 3.Operation faults: Mistakes by operations and maintenance personnel 4.Environmental faults: Fire, flood, earthquake, power failure, and sabotage Also by duration: 1.Transient faults exist for limited time and not recurring 2.Intermittent faults cause a system to oscillate between faulty and fault-free operation 3.Permanent faults do not correct themselves over time

Fault Tolerance vs Disaster Tolerance Fault-Tolerance (or more properly, Error- Tolerance): mask local faults (prevent errors from becoming failures) –RAID disks –Uninterruptible Power Supplies –Cluster Failover Disaster Tolerance: masks site errors (prevent site errors from causing service failures) –Protects against fire, flood, sabotage,.. –Redundant system and service at remote site. –Use design diversity

Case Studies - Tandem Trends Reported MTTF by Component SOFTWARE Years HARDWARE Years MAINTENANCE Years OPERATIONS Years ENVIRONMENT Years SYSTEM82021Years Problem: Systematic Under-reporting Minor Problems. Major Problems

VAX crashes ‘85, ‘93 [Murp95]; extrap. to ‘01 Sys. Man.: N crashes/problem, SysAdmin action –Actions: set params bad, bad config, bad app install HW/OS 70% in ‘85 to 28% in ‘93. In ‘01, 10%? Rule of Thumb: Maintenance 10X HW –Over 5 year product life, ~ 95% of cost is maintenance Is Maintenance the Key?

HW Failures in Real Systems: Tertiary Disks A cluster of 20 PCs in seven 7-foot high, 19-inch wide racks with GB, 7200 RPM, 3.5-inch IBM disks. The PCs are P6-200MHz with 96 MB of DRAM each. They run FreeBSD 3.0 and the hosts are connected via switched 100 Mbit/second Ethernet

How Realistic is "5 Nines"? HP claims HP-9000 server HW and HP-UX OS can deliver % availability guarantee “in certain pre-defined, pre-tested customer environments” –Application faults? –Operator faults? –Environmental faults? Collocation sites (lots of computers in 1 building on Internet) have –1 network outage per year (~1 day) –1 power failure per year (~1 day) Microsoft Network unavailable recently for a day due to problem in Domain Name Server: if only outage per year, 99.7% or 2 Nines