13.3 Accelerating Access to Secondary Storage
13.3 Accelerating Access to Secondary Storage Section Overview 13.3.1: The I/O Model of Computation 13.3.2: Organizing Data by Cylinders 13.3.3: Using Multiple Disks 13.3.4: Mirroring Disks 13.3.5: Disk Scheduling and the Elevator Algorithm 13.3.6: Prefetching and Large-Scale Buffering
13.3 Introduction Average block access is ~10ms. Disks may be busy. Requests may outpace access delays, leading to infinite scheduling latency. There are various strategies to increase disk throughput. The “I/O Model” is the correct model to determine speed of database operations
13.3 Introduction (Contd.) Actions that improve database access speed: Place blocks closer, within the same cylinder Increase the number of disks Mirror disks Use an improved disk-scheduling algorithm Use prefetching
13.3.1 The I/O Model of Computation If we have a computer running a DBMS that: Is trying to serve a number of users Has 1 processor, 1 disk controller, and 1 disk Each user is accessing different parts of the DB It can be assumed that: Time required for disk access is much larger than access to main memory; and as a result: The number of block accesses is a good approximation of time required by a DB algorithm
13.3.2 Organizing Data by Cylinders It is more efficient to store data that might be accessed together in the same or adjacent cylinder(s). In a relational database, related data should be stored in the same cylinder.
13.3.3 Using Multiple Disks If the disk controller supports the addition of multiple disks and has efficient scheduling, using multiple disks can improve performance significantly By striping a relation across multiple disks, each chunk of data can be retrieved in a parallel fashion, improving performance by up to a factor of n, where n is the total number of disks the data is striped over
13.3.4 Mirroring Disks A drawback of striping data across multiple disks is that you increase your chances of disk failure. To mitigate this risk, some DBMS use a disk mirroring configuration Disk mirroring makes each disk a copy of the other disks, so that if any disk fails, the data is not lost Since all the data is in multiple places, access speedup can be increased by more than n since the disk with the head closest to the requested block can be chosen
13.3.4 Mirroring Disks Advantages Disadvantages Striping Read/Write speedup ~n Capacity increased by ~n Higher risk of failure Mirroring Read speedup ~n Reduced failure risk Fast initial access High cost per bit Slow writes compared to striping
1 13.3.5 Disk Scheduling One way to improve disk throughput is to improve disk scheduling, prioritizing requests such that they are more efficient The elevator algorithm is a simple yet effective disk scheduling algorithm The algorithm makes the heads of a disk oscillate back and forth similar to how an elevator goes up and down The access requests closest to the heads current position are processed first
13.3.5 Disk Scheduling When sweeping outward, the direction of head movement changes only after the largest cylinder request has been processed When sweeping inward, the direction of head movement changes only after the smallest cylinder request has been processed Example: Cylinder Time Requested (ms) 8000 24000 56000 16000 10 64000 20 40000 30 Cylinder Time Completed (ms) 8000 4.3 24000 13.6 56000 26.9 64000 34.2 40000 45.5 16000 56.8
13.3.6 Prefetching and Large-Scale Buffering In some cases we can anticipate what data will be needed We can take advantage of this by prefetching data from the disk before the DBMS requests it Since the data is already in memory, the DBMS receives it instantly
? Questions ?