CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage.

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
CS 6560: Operating Systems Design
Storing Data: Disks and Files: Chapter 9
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
The Memory Hierarchy fastest, perhaps 1Mb
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Disks.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
1 Database Systems Storage Media Asma Ahmad 21 st Apr, 11.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Lecture 11: DMBS Internals
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
CS 6502 Operating Systems Dr. J.. Garrido Device Management (Lecture 7b) CS5002 Operating Systems Dr. Jose M. Garrido.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Chapter 2. Data Storage Chapter 2.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS399 New Beginnings Jonathan Walpole. Disk Technology & Secondary Storage Management.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Data Storage and Querying in Various Storage Devices.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
CS522 Advanced database Systems
Database Applications (15-415) DBMS Internals- Part I Lecture 11, February 16, 2016 Mohammad Hammoud.
Storage and Disks.
Database Management Systems (CS 564)
CS 554: Advanced Database System Notes 02: Hardware
CPSC-608 Database Systems
Lecture 11: DMBS Internals
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Overview Continuation from Monday (File system implementation)
Jonathan Walpole Computer Science Portland State University
CS333 Intro to Operating Systems
Disk Failures Disk failure ways and their mitigation
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 30 The Memory Hierarchy Disk Tertiary Storage: Tape, Network Backup Tertiary Storage: Tape, Network Backup Main Memory L1/L2-Cache (256KB–4MB) Disk-Cache (2–16MB) 300 MB/s (SATA-300) 16 GB/s 6,400 MB/s – 12,800 MB/s (DDR2, dual channel, 800MHz) CPU-to-L1-Cache: ~5 cycles initial latency, then “burst” mode CPU Virtual Memory File System CPU-to-Main-Memory: ~200 cycles latency 3,200 MB/s Swapping, Main-memory DBMS’s

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 31 The Memory Hierarchy Cache Data and instructions in cache when needed by CPU. On-board (L1) cache on same chip as CPU, L2 cache on separate chip. Capacity ~ 1MB, access time a few nanoseconds. Main memory All active programs and data need to be in main memory. Capacity ~ 1 GB, access time nanoseconds.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 32 The Memory Hierarchy Secondary storage Secondary storage is used for permanent storage of large amounts of data, typically a magnetic disk. Capacity up to 1 TB, access time ~ 10 milliseconds. Tertiary storage To store data collections that do not fit onto secondary storage, e.g. magnetic tapes or optical disks. Capacity ~ 1 PB, access time seconds / minutes.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 33 The Memory Hierarchy Trade-off The larger the capacity of a storage device, the slower the access (and vice versa). A volatile storage device forgets its contents when power is switched off, a n on-volatile device remembers its content. Secondary storage and tertiary storage is non- volatile, all others are volatile. DBS needs non-volatile (secondary) storage devices to store data permanently.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 34 The Memory Hierarchy RAM (main memory) for subset of database used by current transactions. Disk to store current version of entire database (secondary storage). Tapes for archiving older versions of the database (tertiary storage).

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 35 The Memory Hierarchy Typically programs are executed in virtual memory of size equal to the address space of the processor. Virtual memory is managed by the operating system, which keeps the most relevant part in the main memory and the rest on disk. A DBS manages the data itself and does not rely on the virtual memory. However, main memory DBS do manage their data through virtual memory.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 36 Moore’s Law Gordon Moore in 1965 observed that the density of integrated circuits (i.e., number of transistors per unit) increased at an exponential rate, thus roughly doubles every 18 months. Parameters that follow Moore‘s law: - number of instructions per second that can be exceuted for unit cost, - number of main memory bits that can be bought for unit cost, - number of bytes on a disk that can be bought for unit cost.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 37 Moore’s Law Number of transistors on an integrated circuit

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 38 Moore’s Law Disk capacity

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 39 Moore’s Law But some other important hardware parameters do not follow Moore’s law and grow much slower. Theses are, in particular, - speed of main memory access, and - speed of disk access. For example, disk latencies (seek times) have almost stagnated for past 5 years. Thus, moving data from one level of the memory hierarchy to the next becomes progressively larger.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 40 Disks Secondary storage device of choice. Data is stored and retrieved in units called disk blocks or pages. Main advantage over tapes: random access vs. sequential access. Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Therefore, relative placement of pages on disk has major impact on DBMS performance!

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 41 Disks Disk consists of two main, moving parts: disk assembly and head assembly. Disk assembly stores information, head assembly reads and writes information. Platters Spindle Disk head Arm movement Arm assembly Tracks

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 42 Disks The platters rotate around central spindle. Upper and lower platter surfaces are covered with magnetic material, which is used to store bits. The arm assembly is moved in or out to position a head on a desired track. All tracks under heads at the same time make a cylinder (imaginary!). Only one head reads/writes at any one time.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 43 Disks Track Sector Gap Top view of a platter surface

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 44 Disks Block size is a multiple of sector size (which is fixed). Time to access (read/write) a disk block ( disk latency ) consists of three components: - seek time : moving arms to position disk head on track, - rotational delay (waiting for block to rotate under head), and - transfer time (actually moving data to/from disk surface). Seek time and rotational delay dominate.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 45 Disks 3 or 5x x 1N Cylinders Traveled Time Seek time

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 46 Disks Average seek time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i Typical average seek time = 5 ms

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 47 Disks Average rotational delay Average rotational delay R = 1/2 revolution Typical R = 5 ms Head Here Block I Want

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 48 Disks Transfer time Typical transfer rate: 100 MB/sec Typical block size: 16KB Transfer time: block size transfer rate Typical transfer time = 0.16 ms

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 49 Disks Typical average disk latency is 10 ms, maximum latency 20 ms. In 10 ms, a modern microprocessor can execute millions of instructions. Thus, the time for a block access by far dominates the time typically needed for processing the data in memory. The number of disk I/Os (block accesses) is a good approximation for the cost of a database operation.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 50 Accelerating Disk Access Organize data by cylinders to minimize the seek time and rotational delay. ‘ Next ’ block concept: - blocks on same track, followed by - blocks on same cylinder, followed by - blocks on adjacent cylinder. Blocks in a file are placed sequentially on disk (by ‘next’). Disk latency can approach the transfer rate.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 51 Accelerating Disk Access Example Assuming 10 ms average seek time, no rotational delay, 40 MB/s transfer rate. Read a single 4 KB Block –Random I/O:  10 ms –Sequential I/O:  10 ms Read 4 MB in 4 KB Blocks (amortized) –Random I/O:  10 s –Sequential I/O:  0.1 s  Speedup factor of 100

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 52 Accelerating Disk Access Block size selection Bigger blocks  amortize I/O cost. Bigger blocks  read in more useless stuff and takes longer to read. Good trade-off block size from 4KB to 16 KB. With decreasing memory costs, blocks are becoming bigger!

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 53 Accelerating Disk Access Using multiple disks Replace one disk (with one independent head) by many disks (with many independent heads). Striping a relation R: divide its blocks over n disks in a round robin fashion. Assuming that disk controller, bus and main memory can handle n times the transfer rate, striping a relation across n disks can lead to a speedup factor of up to n.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 54 Accelerating Disk Access Disk scheduling For I/O requests from different processes, let the disk controller choose the processing order. According to the elevator algorithm, the disk controller keeps sweeping from the innermost to the outermost cylinder, stopping at a cylinder for which there is an I/O request. Can reverse sweep direction as soon as there is no I/O request ahead in the current direction. Optimizes the throughput and average response time.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 55 Accelerating Disk Access Double buffering In some scenarios, we can predict the order in which blocks will be requested from disk by some process. Prefetching ( double buffering ) is the method of fetching the necessary blocks into the buffer in advance. Requires enough buffer space. Speedup factor up to n, where n is the number of blocks requested by a process.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 56 Accelerating Disk Access Single buffering (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer... Execution time = n(P+R) where P = time to process one block R = time to read in one block n = # blocks read.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 57 Accelerating Disk Access Double buffering (1) Read B1,..., Bn  Buffer (2) Process B1 in Buffer (3) Process B2 in Buffer... Execution time = R + nP as opposed to n(P+R).  remember that R >> P

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 58 Disk Failures In an intermittent failure, a read or write operation is unsuccessful, but succeeds with repeated tries.  parity checks to detect intermittent failures Media decay is a permanent corruption of one or more bits which make the corresponding sector impossible to read / write.  stable storage to recover from media decay A disk crash makes the entire disk permanently unreadable.  RAID to recover from disk crashes

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 59 Disk Failures Checksums Add n parity bits every m data bits. The number of 1’s among a collection of bits and their parity bit is always even. The parity bit is the modulo-2 sum of its data bits. m=8, n=1 Block A: :1 (odd # of 1’s) Block B: :0(even # of 1’s) If Block A instead contains Block A’: :1 (has odd # of 1’s)  error detected

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 60 Disk Failures Checksums But what if multiple bits are corrupted? E.g., if Block A instead contains Block A’’: :1 (has even # of 1’s)  error cannot be detected Probability that a single parity bit cannot detect a corrupt block is ½. This is assuming that the probability of disk failures involving an odd / even number of bits is identical.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 61 Disk Failures Checksums More parity bits decrease the probability of an undetected failure. With n ≤ m independent parity bits, this probability is only 1/2 n. E.g., we can have eight parity bits, one for the first bit of every byte, the second one for the second bit of every byte... The chance for not detecting a disk failure is then only 1/256.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 62 Disk Failures Stable storage Sectors are paired, and information X is written both on sectors X l and X r. Assume that both copies are written with a sufficient number of parity bits so that bad sectors can be detected. If sector is bad (according to checksum), write to alternative sector. Alternate reading X l and X r until a good value is returned. Probability of X l and X r both failing is very low.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 63 Disk Failures Disk arrays So far, we cannot recover from disk crashes. To address this problem, use Redundant Arrays of Independent Disks ( RAID ), arrangements of several disks that gives abstraction of a single, large disk. Goals: Increase reliability (and performance). Redundant information allows reconstruction of data if a disk fails. Data striping improves the disk performance.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 64 Disk Failures Failure Models for Disks What is the expected time until disk crash? We assume uniform distribution of failures over time. Mean time to failure : time period by which 50% of a population of disks have failed (crashed). Typical mean time to failure is 10 years. In this case, 5% of disks crash in the first year, 5% crash in the second year,..., 5% crash in the tenth year,..., 5% crash in the twentieth year.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 65 Disk Failures Failure Models for Disks Given the mean time to failure ( mtf ) in years, we can derive the probability p of a particular disk failing in a given year. p = 1 / (2 * mtf) Ex.: mtf = 10, p = 1/20 = 5% Mean time to data loss : time period by which 50% of a population of disks have had a crash that resulted in data loss. The mean time to disk failure is not necessarily the same as the mean time to data loss.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 66 Disk Failures Failure Models for Disks Failure rate : percentage of disks of a population that have failed until a certain point of time. Survival rate : percentage of disks of a population that have not failed until a certain point of time. While it simplifies the analysis, the assumption of uniform distribution of failures is unrealistic. Disks tend to fail early (manufacturing defects that have not been detected) or late (wear-and- tear).

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 67 Disk Failures Failure Models for Disks Survival rate (realistic) time

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 68 Disk Failures Mirroring The data disk is copied unto a second disk, the mirror disk. When one of the disk crashes, we replace it by a new disk and copy the other disk to the new one. Data loss can only occur if the second disk crashes while the first one is being replaced. This probability is negligible. Mirroring is referred to as RAID level 1.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 69 Disk Failures Parity blocks Mirroring doubles the number of disks needed. The parity block approach needs only one redundant disk for n (arbitray) data disks. In the redundant disk, the i th block stores parity checks for the i th blocks of all the n data disks. Parity block approach is called RAID level 4. A A B B C C P P

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 70 Disk Failures Parity blocks Reading blocks is the same as without parity blocks. When writing a block on a data disk, we also need to update the corresponding block of the redundant disk. This can be done using four (three additional) disk I/O: read old value of data disk block, read corresponding block of redundant (parity) disk, write new data block, recompute and write new redundant block.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 71 Disk Failures Parity blocks If one of the disks crashes, we bring in a new disk. The content of this disk can be computed, bit by bit, using the remaining n disks. No difference between data disks and parity disk. Computation based on the definition of parity, i.e. total number of ones is even.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 72 Disk Failures Example n = 3 data disks Disk 1, block 1: Disk 2, block 1: Disk 3, block 1: … and one parity disk Disk 4, block 1:  Sum over each column is always an even number of 1’s  Mod-2 sum can recover any missing single row

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 73 Disk Failures Example Suppose we have: Disk 1, block 1: Disk 2, block 1: ???????? Disk 3, block 1: Disk 4, block 1: (parity) Use mod-2 sums for block 1 over disks 1,3,4 to recover block 1 of failed disk 2:  Disk 2, block 1:

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 74 Disk Failures RAID level 5 In the RAID 4 scheme, the parity disk is the bottleneck. On average, n -times as many writes on the parity disk as on the data disks. However, the failure recovery method does not distinguish the types of the n + 1 disks. RAID level 5 does not use a fixed parity disk, but use block i of disk j as redundant if i MOD n +1 = j. D D C C B B A A