IT 251 Computer Organization and Architecture

Slides:



Advertisements
Similar presentations
1 Lecture 22: I/O, Disk Systems Todays topics: I/O overview Disk basics RAID Reminder: Assignment 8 due Tue 11/21.
Advertisements

IT253: Computer Organization
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
Magnetic Disk Magnetic disks are the foundation of external memory on virtually all computer systems. A disk is a circular platter constructed of.
CSE431 Chapter 6A.1Irwin, PSU, 2008 Chapter 6A: Disk Systems Mary Jane Irwin ( ) [Adapted from Computer Organization.
CPE 442 io.1 Introduction To Computer Architecture CpE 442 I/O Systems.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
1 Lecture 27: Disks, Reliability, SSDs, Processors Topics: HDDs, SSDs, RAID, Intel and IBM case studies Final exam stats:  Highest 91, 18 scores of 82+
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
S.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
1 Database Systems Storage Media Asma Ahmad 21 st Apr, 11.
Storage & Peripherals Disks, Networks, and Other Devices.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
CSE431 Chapter 6A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 6A: Disk Systems Mary Jane Irwin ( )
CSE431 Chapter 6A.1Irwin, PSU, 2008 Chapter 6A: Disk Systems Mary Jane Irwin ( ) [Adapted from Computer Organization.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
2.1 Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General.
I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:
Auxiliary Memory Magnetic Disk:
Csci 136 Computer Architecture II – IO and Storage Systems Xiuzhen Cheng
CS 6290 I/O and Storage Milos Prvulovic. Storage Systems I/O performance (bandwidth, latency) –Bandwidth improving, but not as fast as CPU –Latency improving.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 25 I/O Systems May 2,
W4118 Operating Systems Instructor: Junfeng Yang.
LECTURE 13 I/O. I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
CMSC 611: Advanced Computer Architecture I/O & Storage Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
LECTURE 13 I/O. I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
CS Introduction to Operating Systems
Storage HDD, SSD and RAID.
Storage Overview of Physical Storage Media Magnetic Disks RAID
A Case for Redundant Arrays of Inexpensive Disks (RAID) -1988
Chapter 2: Computer-System Structures
Multiple Platters.
Module: Storage Systems
Vladimir Stojanovic & Nicholas Weaver
CS 554: Advanced Database System Notes 02: Hardware
Disks and Files DBMS stores information on (“hard”) disks.
Introduction I/O devices can be characterized by I/O bus connections
Lecture 13 I/O.
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Lecture 28: Reliability Today’s topics: GPU wrap-up Disk basics RAID
Persistence: hard disk drive
Mass-Storage Systems.
RAID RAID Mukesh N Tekwani April 23, 2019
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Presentation transcript:

IT 251 Computer Organization and Architecture I/O: Disks & RAIDs Chia-Chi Teng

Review: Major Components of a Computer Processor Devices Control Output Memory Datapath Input Cache Main Memory Secondary Memory (Disk)

Magnetic Disk Purpose General structure Typical numbers Long term, nonvolatile storage Lowest level in the memory hierarchy slow, large, inexpensive General structure A rotating platter coated with a magnetic surface A moveable read/write head to access the information on the disk Typical numbers 1 to 4 (1 or 2 surface) platters per disk of 1” to 5.25” in diameter (3.5” dominate in 2004) Rotational speeds of 5,400 to 15,000 RPM 10,000 to 50,000 tracks per surface cylinder - all the tracks under the head at a given point on all surfaces 100 to 500 sectors per track the smallest unit that can be read/written (typically 512B) Track Sector The purpose of the magnetic disk is to provide long term, non-volatile storage. Disks are large in terms of capacity, inexpensive, but slow so they reside at the lowest level in the memory hierarchy. Each surface on a platter is divided into tracks and each track is further divided into sectors. A sector is the smallest unit that can be read or written. By simple geometry you know the outer track have more area and you would think the outer tack will have more sectors. This, however, is not the case in traditional disk design where all tracks have the same number of sectors. By keeping the number of sectors the same, the disk controller hardware and software can be dumb and does not have to know which track has how many sectors. With more intelligent disk controller hardware and software, it is getting more popular to record more sectors on the outer tracks with zone bit recording (ZBR). ZBR increases drive capacity.

Magnetic Disk Characteristic Disk read/write components Seek time: position the head over the proper track (3 to 14 ms avg) due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number Rotational latency: wait for the desired sector to rotate under the head (½ of 1/RPM converted to ms) 0.5/5400RPM = 5.6ms to 0.5/15000RPM = 2.0ms Transfer time: transfer a block of bits (one or more sectors) under the head to the disk controller’s cache (30 to 80 MB/s are typical disk transfer rates) the disk controller’s “cache” takes advantage of spatial locality in disk accesses cache transfer rates are much faster (e.g., 320 MB/s) Controller time: the overhead the disk controller imposes in performing a disk I/O access (typically < .2 ms) Track Controller + Cache Sector Cylinder Platter Head To read write information into a sector, a movable arm containing a read/write head is located over each surface. To access data, the operating system must direct the disk through a 3-stage process. (a) The first step is to position the arm over the proper track.This is the seek operation and the time to complete this operation is called the seek time. (b) Once the head has reached the correct track, we must wait for the desired sector to rotate under the read/write head. This is referred to as the rotational latency. (c) Finally, once the desired sector is under the read/write head, the data transfer can begin. The average seek time as reported by the manufacturer is in the range of 12 ms to 20ms and is calculated as the sum of the time for all possible seeks divided by the number of possible seeks. This number is usually on the pessimistic side because due to locality of disk reference, the actual average seek time may only be 25 to 33% of the number published. As far as rotational latency is concerned, most disks rotate at 5400 to 15000 RPM. On average, the information you desired is half way around the disk. The transfer time is a function of transfer size, rotation speed, and recording density. Notice that the transfer time is much faster than the rotational latency and seek time.

Typical Disk Access Time The average time to read or write a 512B sector for a disk rotating at 10,000RPM with average seek time of 6ms, a 50MB/sec transfer rate, and a 0.2ms controller overhead If the measured average seek time is 25% of the advertised average seek time, then The rotational latency is usually the largest component of the access time For class handout

Typical Disk Access Time The average time to read or write a 512B sector for a disk rotating at 10,000RPM with average seek time of 6ms, a 50MB/sec transfer rate, and a 0.2ms controller overhead Avg disk read/write = 6.0ms + 0.5*(60000ms/min / 10000RPM) + 0.5KB/(50KB/ms) + 0.2ms = 6.0 + 3.0 + 0.01 + 0.2 = 9.21ms If the measured average seek time is 25% of the advertised average seek time, then For lecture Avg disk read/write = 1.5 + 3.0 + 0.01 + 0.2 = 4.71ms The rotational latency is usually the largest component of the access time

Magnetic Disk Examples (www.seagate.com) Characteristic Seagate ST37 Seagate ST32 Seagate ST94 Disk diameter (inches) 3.5 2.5 Capacity (GB) 73.4 200 40 # of surfaces (heads) 8 4 2 Rotation speed (RPM) 15,000 7,200 5,400 Transfer rate (MB/sec) 57-86 32-58 34 Minimum seek (ms) 0.2r-0.4w 1.0r-1.2w 1.5r-2.0w Average seek (ms) 3.6r-3.9w 8.5r-9.5w 12r-14w MTTF (hours@25oC) 1,200,000 600,000 330,000 Dimensions (inches) 1”x4”x5.8” 0.4”x2.7”x3.9” GB/cu.inch 3 9 10 Power: op/idle/sb (watts) 20?/12/- 12/8/1 2.4/1/0.4 GB/watt 16 17 Weight (pounds) 1.9 1.4 0.2 The MTTF is the mean time to failure which is a common measure of reliability. Disk lifetimes are much shorter if temperature and vibration are not controlled.

Disk Latency & Bandwidth Milestones CDC Wren SG ST41 SG ST15 SG ST39 SG ST37 RSpeed (RPM) 3600 5400 7200 10000 15000 Year 1983 1990 1994 1998 2003 Capacity (Gbytes) 0.03 1.4 4.3 9.1 73.4 Diameter (inches) 5.25 3.5 3.0 2.5 Interface ST-412 SCSI Bandwidth (MB/s) 0.6 4 9 24 86 Latency (msec) 48.3 17.1 12.7 8.8 5.7 Patterson, CACM Vol 47, #10, 2004 Disk latency is one average seek time plus the rotational latency. Disk bandwidth is the peak transfer time of formatted data from the media (not from the cache).

Latency & Bandwidth Improvements In the time that the disk bandwidth doubles the latency improves by a factor of only 1.2 to 1.4

Aside: Media Bandwidth/Latency Demands Bandwidth requirements High quality video Digital data = (30 frames/s) × (640 x 480 pixels) × (24-b color/pixel) = 221 Mb/s (27.625 MB/s) High quality audio Digital data = (44,100 audio samples/s) × (16-b audio samples) × (2 audio channels for stereo) = 1.4 Mb/s (0.175 MB/s) Compression reduces the bandwidth requirements considerably Latency issues How sensitive is your eye (ear) to variations in video (audio) rates? How can you ensure a constant rate of delivery? How important is synchronizing the audio and video streams? 15 to 20 ms early to 30 to 40 ms late is tolerable

Dependability, Reliability, Availability Reliability – measured by the mean time to failure (MTTF). Service interruption is measured by mean time to repair (MTTR) Availability – a measure of service accomplishment Availability = MTTF/(MTTF + MTTR) To increase MTTF, either improve the quality of the components or design the system to continue operating in the presence of faulty components Fault avoidance: preventing fault occurrence by construction Fault tolerance: using redundancy to correct or bypass faulty components (hardware) Fault detection versus fault correction Permanent faults versus transient faults

RAIDs: Disk Arrays Redundant Array of Inexpensive Disks Arrays of small and inexpensive disks Increase potential throughput by having many disk drives Data is spread over multiple disk Multiple accesses are made to several disks at a time Reliability is lower than a single disk But availability can be improved by adding redundant disks (RAID) Lost information can be reconstructed from redundant information MTTR: mean time to repair is in the order of hours MTTF: mean time to failure of disks is tens of years The discussion of reliability and availability brings us to a new organization of disk storage where arrays of small and inexpensive disks are used to increase the potential throughput. This is how it works: Data is spread over multiple disk so multiple accesses can be made to several disks either via interleaving or done in parallel. While disk arrays improve throughput, latency is not necessary improved. Also with N disks in the disk array, its reliability is only 1 over N the reliability of a single disk. But availability can be improved by adding redundant disks so lost information can be reconstructed from redundant information. Since mean time to repair is measured in hours and MTTF is measured in years, redundancy can make the availability of disk arrays much higher than that of a single disk.

RAID: Level 0 (No Redundancy; Striping) blk1 blk2 blk3 blk4 Multiple smaller disks as opposed to one big disk Spreading the blocks over multiple disks – striping – means that multiple blocks can be accessed in parallel increasing the performance A 4 disk system gives four times the throughput of a 1 disk system Same cost as one big disk – assuming 4 small disks cost the same as one big disk No redundancy, so what if one disk fails? Failure of one or more disks is more likely as the number of disks in the system increases

RAID: Level 1 (Redundancy via Mirroring) blk1.1 blk1.2 blk1.3 blk1.4 blk1.1 blk1.2 blk1.3 blk1.4 redundant (check) data Uses twice as many disks as RAID 0 (e.g., 8 smaller disks with second set of 4 duplicating the first set) so there are always two copies of the data # redundant disks = # of data disks so twice the cost of one big disk writes have to be made to both sets of disks, so writes would be only 1/2 the performance of RAID 0 What if one disk fails? If a disk fails, the system just goes to the “mirror” for the data

RAID: Level 0+1 (Striping with Mirroring) blk1 blk2 blk3 blk4 blk1 blk2 blk3 blk4 redundant (check) data Combines the best of RAID 0 and RAID 1, data is striped across four disks and mirrored to four disks Four times the throughput (due to striping) # redundant disks = # of data disks so twice the cost of one big disk writes have to be made to both sets of disks, so writes would be only 1/2 the performance of RAID 0 What if one disk fails? If a disk fails, the system just goes to the “mirror” for the data

RAID: Level 3 (Bit-Interleaved Parity) blk1,b0 blk1,b1 blk1,b2 blk1,b3 1 1 1 (odd) bit parity disk disk fails Cost of higher availability is reduced to 1/N where N is the number of disks in a protection group # redundant disks = 1 × # of protection groups writes require writing the new data to the data disk as well as computing the parity, meaning reading the other disks, so that the parity disk can be updated Can tolerate limited disk failure, since the data can be reconstructed reads require reading all the operational data disks as well as the parity disk to calculate the missing data that was stored on the failed disk For lecture. Assumes taking longer to recover from failure but spending less on redundant storage is a good tradeoff.

RAID: Level 4 (Block-Interleaved Parity) blk1 blk2 blk3 blk4 block parity disk Cost of higher availability still only 1/N but the parity is stored as blocks associated with sets of data blocks N times the throughput (striping) # redundant disks = 1 × # of protection groups Supports “small reads” and “small writes” (reads and writes that go to just one (or a few) data disk in a protection group) by watching which bits change when writing new information, need only to change the corresponding bits on the parity disk the parity disk must be updated on every write, so it is a bottleneck for back-to-back writes Can tolerate limited disk failure, since the data can be reconstructed Think of the parity information as the parity of blk1 data (1 bit) and bk2 data (1 bit) etc. Could then take the parity of these four bits (N in general) and just store 1 parity bit associated with the parity of all the blocks in the protection group.

RAID: Level 5 (Distributed Block-Interleaved Parity) one of these assigned as the block parity disk Cost of higher availability still only 1/N but the parity block can be located on any of the disks so there is no single bottleneck for writes Still N times the throughput (striping) # redundant disks = 1 × # of protection groups Supports “small reads” and “small writes” (reads and writes that go to just one (or a few) data disk in a protection group) Allows multiple simultaneous writes as long as the accompanying parity blocks are not located on the same disk Can tolerate limited disk failure, since the data can be reconstructed Assuming even parity sets parity disk to 1 Assumes taking longer to recover from failure but spending less on redundant storage is a good tradeoff.

Distributing Parity Blocks RAID 4 RAID 5 1 2 3 4 P0 1 2 3 4 P0 5 6 7 8 P1 5 6 7 P1 8 9 10 11 12 P2 9 10 P2 11 12 13 14 15 16 P3 13 P3 14 15 16 The key point is that in the RAID 4 (on the left) the small writes to 6 and 9 have to take place serially. While in the RAID 5 (on the left) they can happen in parallel. By distributing parity blocks to all disks, some small writes can be performed in parallel

Summary Four components of disk access time: Seek Time: advertised to be 3 to 14 ms but lower in real systems Rotational Latency: 5.6 ms at 5400 RPM and 2.0 ms at 15000 RPM Transfer Time: 30 to 80 MB/s Controller Time: typically less than .2 ms RAIDS can be used to improve availability RAID 0 and RAID 5 – widely used in servers, one estimate is that 80% of disks in servers are RAIDs RAID 1 (mirroring) – EMC, Tandem, IBM RAID 3 – Storage Concepts (Not used in practice) RAID 4 – Network Appliance RAIDS have enough redundancy to allow continuous operation, but not hot swapping